Off-policy continuous control in PyTorch, with RDPG, RTD3 & RSAC

Overview

offpcc_logo

arXiv technical report soon available.

we are updating the readme to be as comprehensive as possible

Please ask any questions in Issues, thanks.

Introduction

This PyTorch repo implements off-policy RL algorithms for continuous control, including:

  • Standard algorithms: DDPG, TD3, SAC
  • Image-based algorithm: ConvolutionalSAC
  • Recurrent algorithms: RecurrentDPG, RecurrentTD3, RecurrentSAC, RecurrentSACSharing (see report)

where recurrent algorithms are generally not available in other repos.

Structure of codebase

Here, we talk about the organization of this code. In particular, we will talk about

  • Folder: where are certain files located?
  • Classes: how are classes designed to interact with each other?
  • Training/evaluation loop: how environment interaction, learning and evaluation alternate?

A basic understanding of these will make other details easy to understand from code itself.

Folders

  • file
    • containing plots reproducing stable-baselines3; you don’t need to touch this
  • offpcc (the good stuff; you will be using this)
    • algorithms (where DDPG, TD3 and SAC are implemented)
    • algorithms_recurrent (where RDPG, RTD3 and RSAC are implemented)
    • basics (abstract classes, stuff shared by algorithms or algorithms_recurrent, code for training)
    • basics_sb3 (you don’t need to touch this)
    • configs (gin configs)
    • domains (all custom domains are stored within and registered properly)
  • pics_for_readme
    • random pics; you don’t need to touch this
  • temp
    • potentially outdated stuff; you don’t need to touch this

Relationships between classes

There are three core classes in this repo:

  • Any environment written using OpenAI’s API would have:
    • reset method outputs the current state
    • step method takes in an action, outputs (reward, next state, done, info)
  • OffPolicyRLAlgorithm and RecurrentOffPolicyRLAlgorithm are the base class for all algorithms listed in introduction. You should think about them as neural network (e.g., actors, critics, CNNs, RNNs) wrappers that are augmented with methods to help these networks interact with other stuff:
    • act method takes in state from env, outputs action back to env
    • update_networks method takes in batch from buffer
  • The replay buffers ReplayBuffer and RecurrentReplayBuffer are built to interact with the environment and the algorithm classes
    • push method takes in a transition from env
    • sample method outputs a batch for algorithm’s update_networks method

Their relationships are best illustrated by a diagram:

offpcc_steps

Structure of training/evaluation loop

In this repo, we follow the training/evaluation loop style in spinning-up (this is essentially the script: basics/run_fns and the function train). It follows this basic structure, with details added for tracking stats and etc:

state = env.reset()
for t range(total_steps):  # e.g., 1 million
    # environment interaction
    if t >= update_after:
        action = algorithm.act(state)
    else:
        action = env.action_space.sample()
    next_state, reward, done, info = env.step(action)
   	# learning
    if t >= update_after and (t + 1) % update_every == 0:
        for j in range(update_every):
            batch = buffer.sample()
            algorithm.update_networks(batch)
    # evaluation
    if (t + 1) % num_steps_per_epoch == 0:
        ep_len, ep_ret = test_for_one_episode(test_env, algorithm)

Dependencies

Dependencies are available in requirements.txt; although there might be one or two missing dependencies that you need to install yourself.

Train an agent

Setup (wandb & GPU)

Add this to your bashrc or bash_profile and source it.

You should replace “account_name” with whatever wandb account that you want to use.

export OFFPCC_WANDB_ENTITY="account_name"

From the command line:

cd offpcc
CUDA_VISIBLE_DEVICES=3 OFFPCC_WANDB_PROJECT=project123 python launch.py --env <env-name> --algo <algo-name> --config <config-path> --run_id <id>

For DDPG, TD3, SAC

On pendulum-v0:

python launch.py --env pendulum-v0 --algo sac --config configs/test/template_short.gin --run_id 1

Commands and plots for benchmarking on Pybullet domains are in a Issue called “Performance check against SB3”.

For RecurrentDDPG, RecurrentTD3, RecurrentSAC

On pendulum-p-v0:

python launch.py --env pendulum-p-v0 --algo rsac --config configs/test/template_recurrent_100k.gin --run_id 1

Reproducing paper results

To reproduce paper results, simply run the commands in the previous section with the appropriate env name (listed below) and config files (their file names are highly readable). Mapping between env names used in code and env names used in paper:

  • pendulum-v0: pendulum
  • pendulum-p-v0: pendulum-p
  • pendulum-va-v0: pendulum-v
  • dmc-cartpole-balance-v0: cartpole-balance
  • dmc-cartpole-balance-p-v0: cartpole-balance-p
  • dmc-cartpole-balance-va-v0: cartpole-balance-v
  • dmc-cartpole-swingup-v0: cartpole-swingup
  • dmc-cartpole-swingup-p-v0: cartpole-swingup-p
  • dmc-cartpole-swingup-va-v0: cartpole-swingup-v
  • reacher-pomdp-v0: reacher-pomdp
  • water-maze-simple-pomdp-v0: watermaze
  • bumps-normal-test-v0: push-r-bump

Render learned policy

Create a folder in the same directory as offpcc, called results. In there, create a folder with the name of the environment, e.g., pendulum-p-v0. Within that env folder, create a folder with the name of the algorithm, e.g., rsac. You can get an idea of the algorithms available from the algo_name2class diectionary defined in offpcc/launch.py. Within that algorithm folder, create a folder with the run_id, e.g., 1. Simply put the saved actor (also actor summarizer for recurrent algorithms) into that inner most foler - they can be downloaded from the wandb website after your run finishes. Finally, go back into offpcc, and call

python launch.py --env pendulum-v0 --algo sac --config configs/test/template_short.gin --run_id 1 --render

For bumps-normal-test-v0, you need to modify the test_for_one_episode function within offpcc/basics/run_fns.py because, for Pybullet environments, the env.step must only appear once before the env.reset() call.

Owner
Zhihan
Zhihan
Python implementation of NARS (Non-Axiomatic-Reasoning-System)

Python implementation of NARS (Non-Axiomatic-Reasoning-System)

Bowen XU 11 Dec 20, 2022
Code for T-Few from "Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning"

T-Few This repository contains the official code for the paper: "Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learni

220 Dec 31, 2022
Fully Convolutional Networks for Semantic Segmentation by Jonathan Long*, Evan Shelhamer*, and Trevor Darrell. CVPR 2015 and PAMI 2016.

Fully Convolutional Networks for Semantic Segmentation This is the reference implementation of the models and code for the fully convolutional network

Evan Shelhamer 3.2k Jan 08, 2023
Imitating Deep Learning Dynamics via Locally Elastic Stochastic Differential Equations

Imitating Deep Learning Dynamics via Locally Elastic Stochastic Differential Equations This repo contains official code for the NeurIPS 2021 paper Imi

Jiayao Zhang 2 Oct 18, 2021
HarDNeXt: Official HarDNeXt repository

HarDNeXt-Pytorch HarDNeXt: A Stage Receptive Field and Connectivity Aware Convolution Neural Network HarDNeXt-MSEG for Medical Image Segmentation in 0

5 May 26, 2022
MohammadReza Sharifi 27 Dec 13, 2022
To prepare an image processing model to classify the type of disaster based on the image dataset

Disaster Classificiation using CNNs bunnysaini/Disaster-Classificiation Goal To prepare an image processing model to classify the type of disaster bas

Bunny Saini 1 Jan 24, 2022
Code for Motion Representations for Articulated Animation paper

Motion Representations for Articulated Animation This repository contains the source code for the CVPR'2021 paper Motion Representations for Articulat

Snap Research 851 Jan 09, 2023
A Simple Long-Tailed Rocognition Baseline via Vision-Language Model

BALLAD This is the official code repository for A Simple Long-Tailed Rocognition Baseline via Vision-Language Model. Requirements Python3 Pytorch(1.7.

Teli Ma 4 Jan 20, 2022
[TIP 2020] Multi-Temporal Scene Classification and Scene Change Detection with Correlation based Fusion

Multi-Temporal Scene Classification and Scene Change Detection with Correlation based Fusion Code for Multi-Temporal Scene Classification and Scene Ch

Lixiang Ru 33 Dec 12, 2022
3D ResNet Video Classification accelerated by TensorRT

Activity Recognition TensorRT Perform video classification using 3D ResNets trained on Kinetics-400 dataset and accelerated with TensorRT P.S Click on

Akash James 39 Nov 21, 2022
Code release for General Greedy De-bias Learning

General Greedy De-bias for Dataset Biases This is an extention of "Greedy Gradient Ensemble for Robust Visual Question Answering" (ICCV 2021, Oral). T

4 Mar 15, 2022
Intel® Neural Compressor is an open-source Python library running on Intel CPUs and GPUs

Intel® Neural Compressor targeting to provide unified APIs for network compression technologies, such as low precision quantization, sparsity, pruning, knowledge distillation, across different deep l

Intel Corporation 846 Jan 04, 2023
This repository contains the code for EMNLP-2021 paper "Word-Level Coreference Resolution"

Word-Level Coreference Resolution This is a repository with the code to reproduce the experiments described in the paper of the same name, which was a

79 Dec 27, 2022
Offical implementation for "Trash or Treasure? An Interactive Dual-Stream Strategy for Single Image Reflection Separation".

Trash or Treasure? An Interactive Dual-Stream Strategy for Single Image Reflection Separation (NeurIPS 2021) by Qiming Hu, Xiaojie Guo. Dependencies P

Qiming Hu 31 Dec 20, 2022
This repository is a series of notebooks that show solutions for the projects at Dataquest.io.

Dataquest Project Solutions This repository is a series of notebooks that show solutions for the projects at Dataquest.io. Of course, there are always

Dataquest 1.1k Dec 30, 2022
A tool to visualise the results of AlphaFold2 and inspect the quality of structural predictions

AlphaFold Analyser This program produces high quality visualisations of predicted structures produced by AlphaFold. These visualisations allow the use

Oliver Powell 3 Nov 13, 2022
CHERRY is a python library for predicting the interactions between viral and prokaryotic genomes

CHERRY is a python library for predicting the interactions between viral and prokaryotic genomes. CHERRY is based on a deep learning model, which consists of a graph convolutional encoder and a link

Kenneth Shang 12 Dec 15, 2022
Multi-Template Mouse Brain MRI Atlas (MBMA): both in-vivo and ex-vivo

Multi-template MRI mouse brain atlas (both in vivo and ex vivo) Mouse Brain MRI atlas (both in-vivo and ex-vivo) (repository relocated from the origin

8 Nov 18, 2022
Generative code template for PixelBeasts 10k NFT project.

generator-template Generative code template for combining transparent png attributes into 10,000 unique images. Used for the PixelBeasts 10k NFT proje

Yohei Nakajima 9 Aug 24, 2022