On the model-based stochastic value gradient for continuous reinforcement learning

Last update: Dec 15, 2022

Related tags

Overview

On the model-based stochastic value gradient for continuous reinforcement learning

This repository is by Brandon Amos, Samuel Stanton, Denis Yarats, and Andrew Gordon Wilson and contains the PyTorch source code to reproduce the experiments in our L4DC 2021 paper On model-based stochastic value gradient for continuous reinforcement learning. Videos of our agents are available here.

Setup and dependencies

After cloning this repository and installing PyTorch on your system, you can set up the code with:

python3 setup.py develop

A basic run and analysis

You can start a single local run on the humanoid with:

./train.py env=mbpo_humanoid

This will create an experiment directory in exp/local/<date>/ with models and logging info. Once that has saved out the first model, you can plot a video of the agent with some diagnostic information with the command:

./eval-vis-model.py exp/local/2021.05.07

Reproducing our main experimental results

We have the default hyper-parameters in this repo set to the best ones we found with a hyper-parameter search. The following command reproduces our final results using 10 seeds with the optimal hyper-parameter:

./train.py -m experiment=mbpo_final env=mbpo_cheetah,mbpo_hopper,mbpo_walker2d,mbpo_humanoid,mbpo_ant seed=$(seq -s, 10)

The results from this experiment can be plotted with our notebook nbs/mbpo.ipynb, which can also serve as a starting point for analyzing and developing further methods.

Reproducing our sweeps and ablations

Our main hyper-parameter sweeps are run with hydra's multi-tasking mode and can be launched with the following command after uncommenting the hydra/sweeper line in config/train.yaml:

./train.py -m experiment=full_poplin_sweep

The results from this experiment can be plotted with our notebook nbs/poplin.ipynb.

Citations

If you find this repository helpful for your publications, please consider citing our paper:

@inproceedings{amos2021svg,
  title={On the model-based stochastic value gradient for continuous reinforcement learning},
  author={Amos, Brandon and Stanton, Samuel and Yarats, Denis and Wilson, Andrew Gordon},
  booktitle={L4DC},
  year={2021}
}

Licensing

This repository is licensed under the CC BY-NC 4.0 License.

On the model-based stochastic value gradient for continuous reinforcement learning

Related tags

Overview

On the model-based stochastic value gradient for continuous reinforcement learning

Setup and dependencies

A basic run and analysis

Reproducing our main experimental results

Reproducing our sweeps and ablations

Citations

Licensing

Owner

Facebook Research

EMNLP'2021: SimCSE: Simple Contrastive Learning of Sentence Embeddings

GAN-generated image detection based on CNNs

A Protein-RNA Interface Predictor Based on Semantics of Sequences

[ICLR 2021 Spotlight Oral] "Undistillable: Making A Nasty Teacher That CANNOT teach students", Haoyu Ma, Tianlong Chen, Ting-Kuei Hu, Chenyu You, Xiaohui Xie, Zhangyang Wang

Supervised 3D Pre-training on Large-scale 2D Natural Image Datasets for 3D Medical Image Analysis

Synthesize photos from PhotoDNA using machine learning 🌱

Code for "Unsupervised Layered Image Decomposition into Object Prototypes" paper

YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone

Detecting and Tracking Small and Dense Moving Objects in Satellite Videos: A Benchmark

Official code implementation for "Personalized Federated Learning using Hypernetworks"

VSR-Transformer - This paper proposes a new Transformer for video super-resolution (called VSR-Transformer).

ICLR 2021: Pre-Training for Context Representation in Conversational Semantic Parsing

Thermal Control of Laser Powder Bed Fusion using Deep Reinforcement Learning

Tgbox-bench - Simple TGBOX upload speed benchmark

I will implement Fastai in each projects present in this repository.

This MVP data web app uses the Streamlit framework and Facebook's Prophet forecasting package to generate a dynamic forecast from your own data.

Learning Domain Invariant Representations in Goal-conditioned Block MDPs

When are Iterative GPs Numerically Accurate?

The official implementation of paper "Finding the Task-Optimal Low-Bit Sub-Distribution in Deep Neural Networks" (IJCV under review).

Main repository for the HackBio'2021 Virtual Internship Experience for #Team-Greider ❤️