Proximal Policy Optimization in StarCraft

Access: Use of this item is restricted to the UNT Community
Deep reinforcement learning is an area of research that has blossomed tremendously in recent years and has shown remarkable potential in computer games. Real-time strategy game has become an important field of artificial intelligence in game for several years. This paper is about to introduce a kind of algorithm that used to train agents to fight against computer bots. Not only because games are excellent tools to test deep reinforcement learning algorithms for their valuable insight into how well an algorithm can perform in isolated environments without the real-life consequences, but also real-time strategy games are a very complex genre that challenges artificial intelligence agents in both short-term or long-term planning. In this paper, we introduce some history of deep learning and reinforcement learning. Then we combine them with StarCraft. PPO is the algorithm which have some of the benefits of trust region policy optimization (TRPO), but it is much simpler to implement, more general for environment, and have better sample complexity. The StarCraft environment: Blood War Application Programming Interface (BWAPI) is open source to test. The results show that PPO can work well in BWAPI and train units to defeat the opponents. The algorithm presented in the thesis is …
Date: May 2019
Creator: Liu, Yuefan
System: The UNT Digital Library