Proximal Policy Optimization in StarCraft

Access: Use of this item is restricted to the UNT Community
Deep reinforcement learning is an area of research that has blossomed tremendously in recent years and has shown remarkable potential in computer games. Real-time strategy game has become an important field of artificial intelligence in game for several years. This paper is about to introduce a kind of algorithm that used to train agents to fight against computer bots. Not only because games are excellent tools to test deep reinforcement learning algorithms for their valuable insight into how well an algorithm can perform in isolated environments without the real-life consequences, but also real-time strategy games are a very complex genre that challenges artificial intelligence agents in both short-term or long-term planning. In this paper, we introduce some history of deep learning and reinforcement learning. Then we combine them with StarCraft. PPO is the algorithm which have some of the benefits of trust region policy optimization (TRPO), but it is much simpler to implement, more general for environment, and have better sample complexity. The StarCraft environment: Blood War Application Programming Interface (BWAPI) is open source to test. The results show that PPO can work well in BWAPI and train units to defeat the opponents. The algorithm presented in the thesis is …
Date: May 2019
Creator: Liu, Yuefan
System: The UNT Digital Library

Quantile Regression Deep Q-Networks for Multi-Agent System Control

Access: Use of this item is restricted to the UNT Community
Training autonomous agents that are capable of performing their assigned job without fail is the ultimate goal of deep reinforcement learning. This thesis introduces a dueling Quantile Regression Deep Q-network, where the network learns the state value quantile function and advantage quantile function separately. With this network architecture the agent is able to learn to control simulated robots in the Gazebo simulator. Carefully crafted reward functions and state spaces must be designed for the agent to learn in complex non-stationary environments. When trained for only 100,000 timesteps, the agent is able reach asymptotic performance in environments with moving and stationary obstacles using only the data from the inertial measurement unit, LIDAR, and positional information. Through the use of transfer learning, the agents are also capable of formation control and flocking patterns. The performance of agents with frozen networks is improved through advice giving in Deep Q-networks by use of normalized Q-values and majority voting.
Date: May 2019
Creator: Howe, Dustin
System: The UNT Digital Library

Deep Learning Approach for Sensing Cognitive Radio Channel Status

Access: Use of this item is restricted to the UNT Community
Cognitive Radio (CR) technology creates the opportunity for unlicensed users to make use of the spectral band provided it does not interfere with any licensed user. It is a prominent tool with spectrum sensing functionality to identify idle channels and let the unlicensed users avail them. Thus, the CR technology provides the consumers access to a very large spectrum, quality spectral utilization, and energy efficiency due to spectral load balancing. However, the full potential of the CR technology can be realized only with CRs equipped with accurate mechanisms to predict/sense the spectral holes and vacant spectral bands without any prior knowledge about the characteristics of traffic in a real-time environment. Multi-layered perception (MLP), the popular neural network trained with the back-propagation (BP) learning algorithm, is a keen tool for classification of the spectral bands into "busy" or "idle" states without any a priori knowledge about the user system features. In this dissertation, we proposed the use of an evolutionary algorithm, Bacterial Foraging Optimization Algorithm (BFOA), for the training of the MLP NN. We have compared the performance of the proposed system with the traditional algorithm and with the Hybrid GA-PSO method. With the results of a simulation experiment that this …
Date: December 2019
Creator: Gottapu, Srinivasa Kiran
System: The UNT Digital Library