Active Offline Policy Selection With Python

Last update: Oct 15, 2022

Related tags

Overview

Active Offline Policy Selection

This is supporting example code for NeurIPS 2021 paper Active Offline Policy Selection by Ksenia Konyushkova*, Yutian Chen*, Tom Le Paine, Caglar Gulcehre, Cosmin Paduraru, Daniel J Mankowitz, Misha Denil, Nando de Freitas.

To simulate the active offline policy selection for a set of policies, one needs to provide a number of files. We provide the files for 76 policies on cartpole_swingup environemnt.

Sampled episodic returns for all policies on a number of evalauation episodes (full-reward-samples-dict.pkl), or a way of sampling a new episode of evaluation upon request for any policy. The file full-reward-samples-dict.pkl contains a dictionary that maps a policy by its string representation to a numpy.ndarray of of shape (5000,) (number of reward samples).
Off-policy evaluation score, such as fitted Q-evaluation (FQE) for all policies (ope_values.pkl). The file ope_values.pkl contains dictionary that maps policy info into OPE estimates. We provide FQE scores for the policies.
Actions that policies take on 1000 randomly sampled states from the offline dataset (actions.pkl). The file actions.pkl contains a dictionary with keys actions and policy_keys. actions is a list of 1000 ( number of states used to compute the kernel) elements of numpy.ndarray type of dimensionality 76x1 (number of policies by the dimensionality of the actions). policy_keys contains a dictionary mapping from string representation of a policy to the index of that policy in actions.

Installation

To set up the virtual environment, run the following commands. From within the active_ops directory:

python3 -m venv active_ops_env
source active_ops_env/bin/activate

pip install --upgrade pip
pip install -r requirements.txt

To run the demo with colab, enable the jupyter_http_over_ws extension:

jupyter serverextension enable --py jupyter_http_over_ws

Finally, start a server:

jupyter notebook \
  --NotebookApp.allow_origin='https://colab.research.google.com' \
  --port=8888 \
  --NotebookApp.port_retries=0

Usage

To run the code refer to Active_ops_experiment.ipynb colab notebook. Execute blocks of code one by one to reproduce the final plot. You can modify various parameters maked by @param to test various baselines in modified settings. This code loads the example of data for cartpole_environment provided in the data folder. Using this data, we reproduce the results of Figure 14 of the paper.

Citing this work

@inproceedings{konyushkovachen2021aops,
    title = "Active Offline Policy Selection",
    author = "Ksenia Konyushkova, Yutian Chen, Tom Le Paine, Caglar Gulcehre, Cosmin Paduraru, Daniel J Mankowitz, Misha Denil, Nando de Freitas",
    booktitle = NeurIPS,
    year = 2021
}

Disclaimer

This is not an official Google product.

The datasets in this work are licensed under the Creative Commons Attribution 4.0 International License. To view a copy of this license, visit [http://creativecommons.org/licenses/by/4.0/] (http://creativecommons.org/licenses/by/4.0/).

Active Offline Policy Selection With Python

Related tags

Overview

Active Offline Policy Selection

Installation

Usage

Citing this work

Disclaimer

Owner

DeepMind

PyTorch-Geometric Implementation of MarkovGNN: Graph Neural Networks on Markov Diffusion

Contrastive Learning Inverts the Data Generating Process

TensorFlow implementation of "Attention is all you need (Transformer)"

Causal Imitative Model for Autonomous Driving

This repository provides the official implementation of 'Learning to ignore: rethinking attention in CNNs' accepted in BMVC 2021.

CALVIN - A benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks

Fast, flexible and fun neural networks.

Accurate Phylogenetic Inference with Symmetry-Preserving Neural Networks

Joint Detection and Identification Feature Learning for Person Search

Your interactive network visualizing dashboard

Testing and Estimation of structural breaks in Stata

GUI for a Vocal Remover that uses Deep Neural Networks.

Tools for investing in Python

🤖 A Python library for learning and evaluating knowledge graph embeddings

Improved Fitness Optimization Landscapes for Sequence Design

Here is the diagnostic tool for BMVC 2021 paper Diagnosing Errors in Video Relation Detectors.

simple_pytorch_example project is a toy example of a python script that instantiates and trains a PyTorch neural network on the FashionMNIST dataset

FPSAutomaticAiming——基于YOLOV5的FPS类游戏自动瞄准AI

Single-Stage 6D Object Pose Estimation, CVPR 2020

Benchmarks for semi-supervised domain generalization.