RL agent to play μRTS with Stable-Baselines3

Last update: Nov 11, 2022

Overview

Gym-μRTS with Stable-Baselines3/PyTorch

This repo contains an attempt to reproduce Gridnet PPO with invalid action masking algorithm to play μRTS using Stable-Baselines3 library. Apart from reproducibility, this might open access to a diverse set of well tested algorithms, and toolings for training, evaluations, and more.

Original paper: Gym-μRTS: Toward Affordable Deep Reinforcement Learning Research in Real-time Strategy Games.

Original code: gym-microrts-paper.

Install

Prerequisites:

Python 3.7+
Java 8.0+
FFmpeg (for video capturing)

git clone https://github.com/kachayev/gym-microrts-paper-sb3
cd gym-microrts-paper-sb3
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt

Note that I use newer version of gym-microrts compared to the one that was originally used for the paper.

Training

To traing an agent:

$ python ppo_gridnet_diverse_encode_decode_sb3.py

If everything is setup correctly, you'll see typicall SB3 verbose logging:

Using cpu device
---------------------------------
| rollout/           |          |
|    ep_len_mean     | 2e+03    |
|    ep_rew_mean     | 0.0      |
| time/              |          |
|    fps             | 179      |
|    iterations      | 1        |
|    time_elapsed    | 11       |
|    total_timesteps | 2048     |
---------------------------------
------------------------------------------
| rollout/                |              |
|    ep_len_mean          | 1.72e+03     |
|    ep_rew_mean          | -5.0         |
| time/                   |              |
|    fps                  | 55           |
|    iterations           | 2            |
|    time_elapsed         | 74           |
|    total_timesteps      | 4096         |
| train/                  |              |
|    approx_kl            | 0.0056759235 |
|    clip_fraction        | 0.0861       |
|    clip_range           | 0.2          |
|    entropy_loss         | -5.65        |
|    explained_variance   | 0.412        |
|    learning_rate        | 0.0003       |
|    loss                 | -0.024       |
|    n_updates            | 10           |
|    policy_gradient_loss | -0.00451     |
|    value_loss           | 0.00413      |
------------------------------------------

As soon as correctness of the implementation is verified, I will provide details on how to use RL Baselines3 Zoo for training and evaluations.

Implementational Caveats

A few notes / pain points regarding the implementation of the alrogithms, and the process of integrating it with stable-baselines3:

Gym does not ship a space for "array of multidiscrete" use case (let's be honest, it's not very common). But it gives an option for defining your space when necessary. A new space, when defined, is not easy to integrate into SB3. In a few different places SB3 raises NotImplementedError facing unknown space (example 1, example 2).
Seems like switching to fully rolled out MutliDiscrete space definition has a significant performance penalty. Still investigating if this can be improved.
Invalid masking is implemented by passing masks into observations from the wrapper (the observation space is replaced with gym.spaces.Dict to hold both observations and masks). By doing it this way, masks are now available for policy, and fit rollout buffer layout. Masking is implemented by setting logits into -inf (or to a rather small number).

Look for xxx(hack) comments in the code for more details.

RL agent to play μRTS with Stable-Baselines3

Related tags

Overview

Gym-μRTS with Stable-Baselines3/PyTorch

Install

Training

Implementational Caveats

Owner

Oleksii Kachaiev

This project contains an implemented version of Face Detection using OpenCV and Mediapipe. This is a code snippet and can be used in projects.

Luminaire is a python package that provides ML driven solutions for monitoring time series data.

Notebooks for my "Deep Learning with TensorFlow 2 and Keras" course

Incorporating Transformer and LSTM to Kalman Filter with EM algorithm

Linear image-to-image translation

Re-implementation of 'Grokking: Generalization beyond overfitting on small algorithmic datasets'

joint detection and semantic segmentation, based on ultralytics/yolov5,

A memory-efficient implementation of DenseNets

PyTorch reimplementation of the Smooth ReLU activation function proposed in the paper "Real World Large Scale Recommendation Systems Reproducibility and Smooth Activations" [arXiv 2022].

Morphable Detector for Object Detection on Demand

A Simple Framwork for CV Pre-training Model (SOCO, VirTex, BEiT)

This repository contains the source code of Auto-Lambda and baselines from the paper, Auto-Lambda: Disentangling Dynamic Task Relationships.

this is a lite easy to use virtual keyboard project for anyone to use

TensorFlow implementation of Adaptive Information Transfer Multi-task (AITM) framework. Code for the paper submitted to KDD21: Modeling the Sequential Dependence among Audience Multi-step Conversions with Multi-task Learning for Customer Acquisition.

Emulation and Feedback Fuzzing of Firmware with Memory Sanitization

Cweqgen - The CW Equation Generator

TransGAN: Two Transformers Can Make One Strong GAN

FCN (Fully Convolutional Network) is deep fully convolutional neural network architecture for semantic pixel-wise segmentation

Multi-objective constrained optimization for energy applications via tree ensembles

Introduction to CPM