Conservative Q Learning for Offline Reinforcement Reinforcement Learning in JAX

Last update: Nov 07, 2022

Related tags

Overview

CQL-JAX

This repository implements Conservative Q Learning for Offline Reinforcement Reinforcement Learning in JAX (FLAX). Implementation is built on top of the SAC base of JAX-RL.

Usage

Install Dependencies-

pip install -r requirements.txt
pip install "jax[cuda111]<=0.21.1" -f https://storage.googleapis.com/jax-releases/jax_releases.html

Run CQL-

python train_offline.py --env_name=hopper-expert-v0 --min_q_weight=5

Please use the following values of min_q_weight on MuJoCo tasks to reproduce CQL results from IQL paper-

Domain	medium	medium-replay	medium-expert
walker	10	1	10
hopper	5	5	1
cheetah	90	80	100

For antmaze tasks min_q_weight=10 is found to work best.

In case of Out-Of Memory errors in JAX, try running with the following env variables-

XLA_PYTHON_CLIENT_MEM_FRACTION=0.80 python ...
XLA_FLAGS=--xla_gpu_force_compilation_parallelism=1 python ...

Performance & Runtime

Returns are more or less same as the torch implementation and comparable to IQL-

Task	CQL(PyTorch)	CQL(JAX)	IQL
hopper-medium-v2	58.5	74.6	66.3
hopper-medium-replay-v2	95.0	92.1	94.7
hopper-medium-expert-v2	105.4	83.2	91.5
antmaze-umaze-v0	74.0	69.5	87.5
antmaze-umaze-diverse-v0	84.0	78.7	62.2
antmaze-medium-play-v0	61.2	14.2	71.2
antmaze-medium-diverse-v0	53.7	10.7	70.2
antmaze-large-play-v0	15.8	0.0	39.6
antmaze-large-diverse-v0	14.9	0.0	47.5

Wall-clock time averages to ~50 mins, improving over IQL paper's 80 min CQL and closing the gap with IQL's 20 min.

Task	CQL(JAX)	IQL
hopper-medium-v2	52	27
hopper-medium-replay-v2	54	30
hopper-medium-expert-v2	57	29

Time efficiency over the original torch implementation is more than 4 times.

For more offline RL algorithm implementations, check out the JAX-RL, IQL and rlkit repositories.

Citation

In case you use CQL-JAX for your research, please cite the following-

@misc{cqljax,
  author = {Suri, Karush},
  title = {{Conservative Q Learning in JAX.}},
  url = {https://github.com/karush17/cql-jax},
  year = {2021}
}

Conservative Q Learning for Offline Reinforcement Reinforcement Learning in JAX

Related tags

Overview

CQL-JAX

Usage

Performance & Runtime

Citation

References

Owner

Karush Suri

Code for binary and multiclass model change active learning, with spectral truncation implementation.

BirdCLEF 2021 - Birdcall Identification 4th place solution

Stereo Hybrid Event-Frame (SHEF) Cameras for 3D Perception, IROS 2021

공공장소에서 눈만 돌리면 CCTV가 보인다는 말이 과언이 아닐 정도로 CCTV가 우리 생활에 깊숙이 자리 잡았습니다.

g9.py - Torch interactive graphics

ADOP: Approximate Differentiable One-Pixel Point Rendering

Official Keras Implementation for UNet++ in IEEE Transactions on Medical Imaging and DLMIA 2018

Lipstick ain't enough: Beyond Color-Matching for In-the-Wild Makeup Transfer (CVPR 2021)

Contrastive Multi-View Representation Learning on Graphs

Offical code for the paper: "Growing 3D Artefacts and Functional Machines with Neural Cellular Automata" https://arxiv.org/abs/2103.08737

Efficient Sparse Attacks on Videos using Reinforcement Learning

2021搜狐校园文本匹配算法大赛分比我们低的都是帅哥队

Collection of NLP model explanations and accompanying analysis tools

Official repository for the paper "GN-Transformer: Fusing AST and Source Code information in Graph Networks".

Simple implementation of OpenAI CLIP model in PyTorch.

PyTorch implementation of "ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context" (INTERSPEECH 2020)

Emotional conditioned music generation using transformer-based model.

Keqing Chatbot With Python

Deep Learning Models for Causal Inference

AirPose: Multi-View Fusion Network for Aerial 3D Human Pose and Shape Estimation

Conservative Q Learning for Offline Reinforcement Reinforcement Learning in JAX

Related tags

Overview

CQL-JAX

Usage

Performance & Runtime

Citation

References

Owner

Karush Suri

Code for binary and multiclass model change active learning, with spectral truncation implementation.

BirdCLEF 2021 - Birdcall Identification 4th place solution

Stereo Hybrid Event-Frame (SHEF) Cameras for 3D Perception, IROS 2021

공공장소에서 눈만 돌리면 CCTV가 보인다는 말이 과언이 아닐 정도로 CCTV가 우리 생활에 깊숙이 자리 잡았습니다.

g9.py - Torch interactive graphics

ADOP: Approximate Differentiable One-Pixel Point Rendering

Official Keras Implementation for UNet++ in IEEE Transactions on Medical Imaging and DLMIA 2018

Lipstick ain't enough: Beyond Color-Matching for In-the-Wild Makeup Transfer (CVPR 2021)

Contrastive Multi-View Representation Learning on Graphs

Offical code for the paper: "Growing 3D Artefacts and Functional Machines with Neural Cellular Automata" https://arxiv.org/abs/2103.08737

Efficient Sparse Attacks on Videos using Reinforcement Learning

2021搜狐校园文本匹配算法大赛 分比我们低的都是帅哥队

Collection of NLP model explanations and accompanying analysis tools

Official repository for the paper "GN-Transformer: Fusing AST and Source Code information in Graph Networks".

Simple implementation of OpenAI CLIP model in PyTorch.

PyTorch implementation of "ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context" (INTERSPEECH 2020)

Emotional conditioned music generation using transformer-based model.

Keqing Chatbot With Python

Deep Learning Models for Causal Inference

AirPose: Multi-View Fusion Network for Aerial 3D Human Pose and Shape Estimation

2021搜狐校园文本匹配算法大赛分比我们低的都是帅哥队