batch-bandits

Implementation of popular bandit algorithms in batch environments.

Source code to our paper "The Impact of Batch Learning in Stochastic Bandits" accepted at the workshop on the Ecological Theory of Reinforcement Learning, NeurIPS 2021.

Overview

The repository provides an opportunuty to run simulations or replay logged datasets in sequential batch manner - sequential interaction with the environment when responses are grouped in batches and observed by the agent only at the end of each batch. Broadly speaking, sequential batch learning is a more generalized way of learning which covers both offline and online settings as special cases bringing together their advantages.

Framework

Two particularly useful versions of the multi-armed bandit problem are implemented: Stochastic Multi-Armed Bandit (MAB) and Contextual Multi-Armed Bandit (CMAB). The key feature of the project is that both versions support parameter batch_size - a certain period of time when the agent interacts with the environment "blindly". Despite the batch setting is a property of the environment, this limitation is considered from a policy perspective. With this, it is assumed that it is not the online agent who works with the batch environment, but the batch policy interacts with the online environment.

The project is built upon RL-GLue framework, which provides an interface to connect agents, environments, and experiment programs. Note, that MAB/rl_glue.py and CMAB/rl_glue.py were adapted to make batch interaction possible.

Implemented algorithms

Version	Algorithm	Comment
MAB	ε - greedy	-
MAB	Thompson Sampling	-
MAB	UCB	-
CMAB	LinTS	see link (and references therein) for more details
CMAB	LinUCB	see article for theoretical description
CMAB	Offline evaluator	policy evaluation technique; see article for theoretical quarantees

Implementation of popular bandit algorithms in batch environments.

Related tags

Overview

batch-bandits

Overview

Framework

Implemented algorithms

Owner

Danil Provodin

Generating synthetic mobility data for a realistic population with RNNs to improve utility and privacy

A Joint Video and Image Encoder for End-to-End Retrieval

PyTorch implementation of Decoupling Value and Policy for Generalization in Reinforcement Learning

Codes for "Template-free Prompt Tuning for Few-shot NER".

Code of U2Fusion: a unified unsupervised image fusion network for multiple image fusion tasks, including multi-modal, multi-exposure and multi-focus image fusion.

PyTorch implementation of UPFlow (unsupervised optical flow learning)

GeDML is an easy-to-use generalized deep metric learning library

EvDistill: Asynchronous Events to End-task Learning via Bidirectional Reconstruction-guided Cross-modal Knowledge Distillation (CVPR'21)

This is the code for "HyperNeRF: A Higher-Dimensional Representation for Topologically Varying Neural Radiance Fields".

PyTorch code for the NAACL 2021 paper "Improving Generation and Evaluation of Visual Stories via Semantic Consistency"

:boar: :bear: Deep Learning based Python Library for Stock Market Prediction and Modelling

[Arxiv preprint] Causality-inspired Single-source Domain Generalization for Medical Image Segmentation (code&data-processing pipeline)

A Re-implementation of the paper "A Deep Learning Framework for Character Motion Synthesis and Editing"

PyTorch implementation of NIPS 2017 paper Dynamic Routing Between Capsules

FPGA: Fast Patch-Free Global Learning Framework for Fully End-to-End Hyperspectral Image Classification

DUE: End-to-End Document Understanding Benchmark

Minimal PyTorch implementation of YOLOv3

(Personalized) Page-Rank computation using PyTorch

Multi-Anchor Active Domain Adaptation for Semantic Segmentation (ICCV 2021 Oral)

TFOD-MASKRCNN - Tensorflow MaskRCNN With Python