The Unsupervised Reinforcement Learning Benchmark (URLB)

Last update: Dec 26, 2022

Related tags

Overview

The Unsupervised Reinforcement Learning Benchmark (URLB)

URLB provides a set of leading algorithms for unsupervised reinforcement learning where agents first pre-train without access to extrinsic rewards and then are finetuned to downstream tasks.

Requirements

We assume you have access to a GPU that can run CUDA 10.2 and CUDNN 8. Then, the simplest way to install all required dependencies is to create an anaconda environment by running

conda env create -f conda_env.yml

After the instalation ends you can activate your environment with

conda activate urlb

Implemented Agents

Agent	Command	Implementation Author(s)	Paper
ICM	`agent=icm`	Denis	paper
ProtoRL	`agent=proto`	Denis	paper
DIAYN	`agent=diayn`	Misha	paper
APT(ICM)	`agent=icm_apt`	Hao, Kimin	paper
APT(Ind)	`agent=ind_apt`	Hao, Kimin	paper
APS	`agent=aps`	Hao, Kimin	paper
SMM	`agent=smm`	Albert	paper
RND	`agent=rnd`	Kevin	paper
Disagreement	`agent=disagreement`	Catherine	paper

Available Domains

We support the following domains.

Domain	Tasks
`walker`	`stand`, `walk`, `run`, `flip`
`quadruped`	`walk`, `run`, `stand`, `jump`
`jaco`	`reach_top_left`, `reach_top_right`, `reach_bottom_left`, `reach_bottom_right`

Domain observation mode

Each domain supports two observation modes: states and pixels.

Model	Command
states	`obs_type=states`
pixels	`obs_type=pixels`

Instructions

Pre-training

To run pre-training use the pretrain.py script

python pretrain.py agent=icm domain=walker

or, if you want to train a skill-based agent, like DIAYN, run:

python pretrain.py agent=diayn domain=walker

This script will produce several agent snapshots after training for 100k, 500k, 1M, and 2M frames. The snapshots will be stored under the following directory:

./pretrained_models/<obs_type>/<domain>/<agent>/

For example:

./pretrained_models/states/walker/icm/

Fine-tuning

Once you have pre-trained your method, you can use the saved snapshots to initialize the DDPG agent and fine-tune it on a downstream task. For example, let's say you have pre-trained ICM, you can fine-tune it on walker_run by running the following command:

python finetune.py pretrained_agent=icm task=walker_run snapshot_ts=1000000 obs_type=states

This will load a snapshot stored in ./pretrained_models/states/walker/icm/snapshot_1000000.pt, initialize DDPG with it (both the actor and critic), and start training on walker_run using the extrinsic reward of the task.

For methods that use skills, include the agent, and the reward_free tag to false.

python finetune.py pretrained_agent=smm task=walker_run snapshot_ts=1000000 obs_type=states agent=smm reward_free=false

Monitoring

Logs are stored in the exp_local folder. To launch tensorboard run:

tensorboard --logdir exp_local

The console output is also available in a form:

| train | F: 6000 | S: 3000 | E: 6 | L: 1000 | R: 5.5177 | FPS: 96.7586 | T: 0:00:42

a training entry decodes as

F  : total number of environment frames
S  : total number of agent steps
E  : total number of episodes
R  : episode return
FPS: training throughput (frames per second)
T  : total training time

The Unsupervised Reinforcement Learning Benchmark (URLB)

Related tags

Overview

The Unsupervised Reinforcement Learning Benchmark (URLB)

Requirements

Implemented Agents

Available Domains

Domain observation mode

Instructions

Pre-training

Fine-tuning

Monitoring

Owner

Official code for Score-Based Generative Modeling through Stochastic Differential Equations

RetinaFace: Deep Face Detection Library in TensorFlow for Python

DeepSpeed is a deep learning optimization library that makes distributed training easy, efficient, and effective.

Official repository for "Action-Based Conversations Dataset: A Corpus for Building More In-Depth Task-Oriented Dialogue Systems"

A minimal TPU compatible Jax implementation of NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis

The code for our paper "AutoSF: Searching Scoring Functions for Knowledge Graph Embedding"

clustering moroccan stocks time series data using k-means with dtw (dynamic time warping)

Cross-platform CLI tool to generate your Github profile's stats and summary.

SNE-RoadSeg in PyTorch, ECCV 2020

Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition in CVPR19

Python Library for Signal/Image Data Analysis with Transport Methods

Image-to-image translation with conditional adversarial nets

Prompts - Read a textfile of prompts and import into anki via ankiconnect

Python implementation of a live deep learning based age/gender/expression recognizer

A PyTorch library and evaluation platform for end-to-end compression research

Pixel-wise segmentation on VOC2012 dataset using pytorch.

CVPR2022 paper "Dense Learning based Semi-Supervised Object Detection"

OstrichRL: A Musculoskeletal Ostrich Simulation to Study Bio-mechanical Locomotion.

BigDetection: A Large-scale Benchmark for Improved Object Detector Pre-training

coldcuts is an R package to automatically generate and plot segmentation drawings in R