Code for NeurIPS 2021 paper: Invariant Causal Imitation Learning for Generalizable Policies

Last update: Dec 01, 2022

Overview

Invariant Causal Imitation Learning for Generalizable Policies

Ioana Bica, Daniel Jarrett, Mihaela van der Schaar

Neural Information Processing Systems (NeurIPS) 2021

Dependencies

The code was implemented in Python 3.6 and the following packages are needed for running it:

gym==0.17.2
numpy==1.18.2
pandas==1.0.4
tensorflow==1.15.0
torch==1.6.0
tqdm==4.32.1
scipy==1.1.0
scikit-learn==0.22.2
stable-baselines==2.10.1

Running and evaluating the model:

The control tasks used for experiments are from OpenAI gym [1]. Each control task is associated with a true reward function (unknown to the imitation algorithm). In each case, the “expert” demonstrator can be obtained by using a pre-trained and hyperparameter-optimized agent from the RL Baselines Zoo [2] in Stable OpenAI Baselines [3].

In this implementation we provide the expert demonstrations for 2 environments for CartPole-v1 in 'volume/CartPole-v1'. Note that the code in 'contrib/baselines_zoo' was taken from [2].

To train and evaluate ICIL on CartPole-v1, run the following command with the chosen command line arguments. For reference, the expert performance is 500.

python testing/il.py

Options :
   --env                  # Environment name. 
   --num_trajectories	  # Number of expert trajectories used for training the imitation learning algorithm. 
   --trial                # Trial number.

Outputs:

Average reward for 10 repetitions of running ICIL.

Example usage

python testing/il.py  --env='CartPole-v1' --num_trajectories=20 --trial=0

References

[1] Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. Openai gym. OpenAI, 2016

[2] Antonin Raffin. Rl baselines zoo. https://github.com/araffin/rl-baselines-zoo, 2018

[3] Ashley Hill, Antonin Raffin, Maximilian Ernestus, Adam Gleave, Anssi Kanervisto, Rene Traore, Prafulla Dhariwal, Christopher Hesse, Oleg Klimov, Alex Nichol, Matthias Plappert, Alec Radford, John Schulman, Szymon Sidor, and Yuhuai Wu. Stable baselines. https://github.com/hill-a/stable-baselines, 2018.

Citation

If you use this code, please cite:

@inproceedings{bica2021invariant,
  title={Invariant Causal Imitation Learning for Generalizable Policies},
  author={Bica, Ioana and Jarrett, Daniel and van der Schaar, Mihaela},
  booktitle={Thirty-Fifth Conference on Neural Information Processing Systems},
  year={2021}
}

Code for NeurIPS 2021 paper: Invariant Causal Imitation Learning for Generalizable Policies

Related tags

Overview

Invariant Causal Imitation Learning for Generalizable Policies

Ioana Bica, Daniel Jarrett, Mihaela van der Schaar

Neural Information Processing Systems (NeurIPS) 2021

Dependencies

Running and evaluating the model:

Example usage

References

Citation

Owner

Ioana Bica

Learning from Guided Play: A Scheduled Hierarchical Approach for Improving Exploration in Adversarial Imitation Learning Source Code

Keyword-BERT: Keyword-Attentive Deep Semantic Matching

Code for the paper "There is no Double-Descent in Random Forests"

BuildingNet: Learning to Label 3D Buildings

PyTorch code for Composing Partial Differential Equations with Physics-Aware Neural Networks

Implicit MLE: Backpropagating Through Discrete Exponential Family Distributions

Setup and customize deep learning environment in seconds.

Table-Extractor 表格抽取

Multi-Scale Aligned Distillation for Low-Resolution Detection (CVPR2021)

An End-to-End Machine Learning Library to Optimize AUC (AUROC, AUPRC).

McGill Physics Hackathon 2021: Reaction-Diffusion Models for the Generation of Biological Patterns

Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking

Image Lowpoly based on Centroid Voronoi Diagram via python-opencv and taichi

Monocular Depth Estimation Using Laplacian Pyramid-Based Depth Residuals

Bayesian optimization in PyTorch

This repository contains the source code of an efficient 1D probabilistic model for music time analysis proposed in ICASSP2022 venue.

Projecting interval uncertainty through the discrete Fourier transform

This repository contains the source code for the paper Tutorial on amortized optimization for learning to optimize over continuous domains by Brandon Amos

A resource for learning about deep learning techniques from regression to LSTM and Reinforcement Learning using financial data and the fitness functions of algorithmic trading

Connecting Java/ImgLib2 + Python/NumPy