Finally we open-sourced ELF, an extensive, lightweight and flexible framework for game research.
Link to repository: https://github.com/facebookresearch/ELF
Arxiv: https://arxiv.org/abs/1707.01067
Facebook engineering blogpost: https://code.facebook.com/posts/132985767285406/introducing-elf-an-extensive-lightweight-and-flexible-platform-for-game-research/
Game replay: https://youtu.be/YgZyWobkqfw
We have been working on this framework for about half a year. The goal of ELF is to "build an affordable framework for game and reinforcement learning research". ELF gives an end-to-end solution from game simulators to training paradigms. It reduces the requirement of resources (CPU, GPU and memory) and at the same time increases the readability of code via better engineering design. Furthermore, it provides a miniature and fast real-time strategy (RTS) engine, on which future feature can be built on.
In terms of contributions, I (Yuandong Tian) lead the framework design and finish the code of ELF framework and RTS engine; Qucheng Gong built two extensions (Capture the Flag and Tower Defense) and web-based visualization; Yuxin Wu plugged in Atari emulator into the framework and finished the speed test; Wendy (Wenling) Shang added LeakyReLU and Batch Normalization to improve the performance of trained AI. Finally, Larry Zitnick gave a lot of important suggestions.
The design of ELF has changed for multiple times and now converges to the current version, which is reasonable. The main idea is to use C++ multi-threading for concurrent game execution, so that in each iteration of Python interface, it always receives a batch of game states with predefined size and randomized orders. This batch can directly send to reinforcement learning algorithm for forward/backward pass. Compared to existing frameworks that wrap a single game instance into one Python interface, this design does not require a custom-made Python framework for inter-process communication, and thus makes code much cleaner, more readable and perform better.
In ELF, we use PyTorch as the training backend. Python dictionary is used as the interface between models and algorithms. Any algorithm or model read the entries it needs via predefined keys, and returns key-entry pairs it produces. This decouples training algorithms from models and enhances ELF's flexibility and readability. This design has another benefit: different models can be used depending on different game instance and their current game states. This unifies many paradigms that require topology changes between game instance and models, such as Monte-Carlo Tree Search (MCTS) and Self-Play. People who have tried DarkForest might be annoyed by invoking two separate programs, one for MCTS and the other for policy/value evaluation using GPU. In ELF, one is sufficient.
Furthermore, ELF is not limited to game environments. Any environment/simulator with C/C++ interface, e.g., physics engine or discrete/continuous control system, can be incorporated into ELF. The framework will automatically handle synchronization and return batch states. In this sense, ELF is very general.
Under ELF, we have implemented a miniature RTS engine, and three concrete environments (MiniRTS, Capture the Flag and Tower Defense). MiniRTS is a miniature RTS game that captures its key dynamics: gather resources, build troops and buildings, defend and attack the enemy, fog-of-war (unknown region outside player's sight), etc. Units on MiniRTS can move continuously on the map with collision check and path planning.
When built from scratch, the RTS engine is customized to facilitate the usage of deep and reinforcement learning. MiniRTS, which is built in two weeks, is not as complicated as commercial games built from a large group in several months. However, it runs fast with minimal usage of resources and extensibility. For example, MiniRTS runs 40K FPS per core on a Macbook. It takes only 1.5 minutes to finish evaluations of 10k games. Finally, we also provide an interactive web-based visualization tool that can be used to analyze game replay, and be served as a game play interface between human versus AI. In comparison, if we use commercial games as a research platform, a lot more resource might be needed with limited customizability.
On the three concrete games, we train an actor-critic model with an off-policy extension. When training MiniRTS, we only use the reward that comes from the final consequence of the games. No reward shaping is used, e.g., provide auxiliary rewards when a tank is built or minerals are gathered. The action space in RTS games is generally exponential, therefore we discretize it into 9 global, or strategic actions (e.g., build workers/troops, all attack/defend, etc), so that existing methods can be used. The model we have trained on MiniRTS can beat our rule-based AI 70% of the time.
For people who are interested in game and reinforcement learning research, as a shameless advertisement, I would strongly recommend this framework.
Enjoy!