(NeurIPS '21 Spotlight) IQ-Learn: Inverse Q-Learning for Imitation

Last update: Dec 20, 2022

Related tags

Overview

Inverse Q-Learning (IQ-Learn)

Official code base for IQ-Learn: Inverse soft-Q Learning for Imitation, NeurIPS '21 Spotlight

IQ-Learn is an easy-to-use algorithm that's a drop-in replacement to methods like Behavior Cloning and GAIL, to boost your imitation learning pipelines!
Update: IQ-Learn was recently used to create the best AI agent for playing Minecraft. Placing #1 in NeurIPS MineRL Basalt Challenge using only human demos (Overall Leaderboard Rank #2)

[Project Page]

We introduce Inverse Q-Learning (IQ-Learn), a state-of-the-art novel framework for Imitation Learning (IL), that directly learns soft-Q functions from expert data. IQ-Learn enables non-adverserial imitation learning, working on both offline and online IL settings. It is performant even with very sparse expert data, and scales to complex image-based environments, surpassing prior methods by more than 3x. It is very simple to implement requiring ~15 lines of code on top of existing RL methods.

Inverse Q-Learning is theoretically equivalent to Inverse Reinforcement learning, i.e. learning rewards from expert data. However, it is much more powerful in practice. It admits very simple non-adverserial training and works on complete offline IL settings (without any access to the environment), greatly exceeding Behavior Cloning.

IQ-Learn is the successor to Adversarial Imitation Learning methods like GAIL (coming from the same lab).
It extends the theoretical framework for Inverse RL to non-adverserial and scalable learning, for the first-time showing guaranteed convergence.

Citation

@inproceedings{garg2021iqlearn,
title={IQ-Learn: Inverse soft-Q Learning for Imitation},
author={Divyansh Garg and Shuvam Chakraborty and Chris Cundy and Jiaming Song and Stefano Ermon},
booktitle={Thirty-Fifth Conference on Neural Information Processing Systems},
year={2021},
url={https://openreview.net/forum?id=Aeo-xqtb5p}
}

Key Advantages

✅ Drop-in replacement to Behavior Cloning
✅ Non-adverserial online IL (Successor to GAIL & AIRL)
✅ Simple to implement
✅ Performant with very sparse data (single expert demo)
✅ Scales to Complex Image Envs (SOTA on Atari and playing Minecraft)
✅ Recover rewards from envs

Usage

To install and use IQ-Learn check the instructions provided in the iq_learn folder.

Imitation

Reaching human-level performance on Atari with pure imitation:

Rewards

Recovering environment rewards on GridWorld:

Questions

Please feel free to email us if you have any questions.

Div Garg ([email protected])

(NeurIPS '21 Spotlight) IQ-Learn: Inverse Q-Learning for Imitation

Related tags

Overview

Inverse Q-Learning (IQ-Learn)

Citation

Key Advantages

Usage

Imitation

Rewards

Questions

Owner

Divyansh Garg

Videocaptioning.pytorch - A simple implementation of video captioning

Model serving at scale

Sequence to Sequence Models with PyTorch

[CVPR'22] Weakly Supervised Semantic Segmentation by Pixel-to-Prototype Contrast

ReSSL: Relational Self-Supervised Learning with Weak Augmentation

[ICCV 2021] Code release for "Sub-bit Neural Networks: Learning to Compress and Accelerate Binary Neural Networks"

CCPD: a diverse and well-annotated dataset for license plate detection and recognition

Realtime Face Anti Spoofing with Face Detector based on Deep Learning using Tensorflow/Keras and OpenCV

Official implementation of VQ-Diffusion

Iterative Training: Finding Binary Weight Deep Neural Networks with Layer Binarization

Unofficial pytorch implementation of 'Image Inpainting for Irregular Holes Using Partial Convolutions'

This is the official implementation of the paper "Object Propagation via Inter-Frame Attentions for Temporally Stable Video Instance Segmentation".

Alternatives to Deep Neural Networks for Function Approximations in Finance

Repository for the NeurIPS 2021 paper: "Exploiting Domain-Specific Features to Enhance Domain Generalization".

Official PyTorch Implementation of "AgentFormer: Agent-Aware Transformers for Socio-Temporal Multi-Agent Forecasting".

The source code and dataset for the RecGURU paper (WSDM 2022)

Recurrent Variational Autoencoder that generates sequential data implemented with pytorch

BLEURT is a metric for Natural Language Generation based on transfer learning.

This is a repository with the code for the ACL 2019 paper

Image inpainting using Gaussian Mixture Models