This repo contains the implementation of the algorithm proposed in Off-Belief Learning, ICML 2021.

Last update: Jan 05, 2023

Related tags

Deep Learning off-belief-learning

Overview

Off-Belief Learning

Introduction

This repo contains the implementation of the algorithm proposed in Off-Belief Learning, ICML 2021.

Environment Setup

We have been using pytorch-1.5.1, cuda-10.1, and cudnn-v7.6.5 in our development environment. Other settings may also work but we have not tested it extensively under different configurations. We also use conda/miniconda to manage environments.

There are known issues when using this repo with newer versions of pytorch, such as this illegal move issue.

conda create -n hanabi python=3.7
conda activate hanabi

# install pytorch 1.5.1
# note that newer versions may cause compilation issues
pip install torch==1.5.1+cu101 torchvision==0.6.1+cu101 -f https://download.pytorch.org/whl/torch_stable.html

# install other dependencies
pip install psutil

# install a newer cmake if the current version is < 3.15
conda install -c conda-forge cmake

To help cmake find the proper libraries (e.g. libtorch), please either add the following lines to your .bashrc, or add it to a separate file and source it before you start working on the project.

# activate the conda environment
conda activate hanabi

# set path
CONDA_PREFIX=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
export CPATH=${CONDA_PREFIX}/include:${CPATH}
export LIBRARY_PATH=${CONDA_PREFIX}/lib:${LIBRARY_PATH}
export LD_LIBRARY_PATH=${CONDA_PREFIX}/lib:${LD_LIBRARY_PATH}

# avoid tensor operation using all cpu cores
export OMP_NUM_THREADS=1

Finally, to compile this repo:

# under project root
mkdir build
cd build
cmake ..
make -j10

Code Structure

For an overview of how the training infrastructure, please refer to Figure 5 of the Off-Belief Learning paper.

hanabi-learning-environment is a modified version of the original HLE from Deepmind.

Notable modifications includes:

Card knowledge part of the observation encoding is changed to v0-belief, i.e. card knowledge normalized by the remaining public card count.
Functions to reset the game state with sampled hands.

rela (REinforcement Learning Assemly) is a set of tools for efficient batched neural network inference written in C++ with multi-threading.

rlcc implements the core of various algorithms. For example, the logic of fictitious transitions are implemented in r2d2_actor.cc. It also contains implementations of baselines such as other-play, VDN and IQL.

pyhanabi is the main entry point of the repo. It contains implementations for Q-network, recurrent DQN training, belief network and training, as well as some tools to analyze trained models.

Run the Code

Please refer to the README in pyhanabi for detailed instruction on how to train a model.

Download Models

To download the trained models used in the paper, go to models folder and run

sh download.sh

Due to agreement with BoardGameArena and Facebook policies, we are unable to release the "Clone Bot" models trained on the game data nor the datasets themselves.

Copyright

This source code is licensed under the license found in the LICENSE file in the root directory of this source tree.

This repo contains the implementation of the algorithm proposed in Off-Belief Learning, ICML 2021.

Related tags

Overview

Off-Belief Learning

Introduction

Environment Setup

Code Structure

Run the Code

Download Models

Copyright

Owner

Facebook Research

Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking

Gated-Shape CNN for Semantic Segmentation (ICCV 2019)

MMdet2-based reposity about lightweight detection model: Nanodet, PicoDet.

💊 A 3D Generative Model for Structure-Based Drug Design (NeurIPS 2021)

LinkNet - This repository contains our Torch7 implementation of the network developed by us at e-Lab.

Implementation of paper "Decision-based Black-box Attack Against Vision Transformers via Patch-wise Adversarial Removal"

本步态识别系统主要基于GaitSet模型进行实现

A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

Surrogate-Assisted Genetic Algorithm for Wrapper Feature Selection

Code for Subgraph Federated Learning with Missing Neighbor Generation (NeurIPS 2021)

UmlsBERT: Clinical Domain Knowledge Augmentation of Contextual Embeddings Using the Unified Medical Language System Metathesaurus

Softlearning is a reinforcement learning framework for training maximum entropy policies in continuous domains. Includes the official implementation of the Soft Actor-Critic algorithm.

NaijaSenti is an open-source sentiment and emotion corpora for four major Nigerian languages

Code corresponding to The Introspective Agent: Interdependence of Strategy, Physiology, and Sensing for Embodied Agents

chen2020iros: Learning an Overlap-based Observation Model for 3D LiDAR Localization.

TensorFlow-LiveLessons - "Deep Learning with TensorFlow" LiveLessons

YouRefIt: Embodied Reference Understanding with Language and Gesture

xitorch: differentiable scientific computing library

EPSANet：An Efficient Pyramid Split Attention Block on Convolutional Neural Network

Densely Connected Search Space for More Flexible Neural Architecture Search (CVPR2020)