Independent and minimal implementations of some reinforcement learning algorithms using PyTorch (including PPO, A3C, A2C, ...).

Overview

PyTorch RL Minimal Implementations

There are implementations of some reinforcement learning algorithms, whose characteristics are as follow:

  1. Less packages-based: Only PyTorch and Gym, for building neural networks and testing algorithms' performance respectively, are necessary to install.
  2. Independent implementation: All RL algorithms are implemented in separate files, which facilitates to understand their processes and modify them to adapt to other tasks.
  3. Various expansion configurations: It's convenient to configure various parameters and tools, such as reward normalization, advantage normalization, tensorboard, tqdm and so on.

RL Algorithms List

Name Type Estimator Paper File
Q-Learning Value-based / Off policy TD Watkins et al. Q-Learning. Machine Learning, 1992 q_learning.py
REINFORCE Policy-based On policy MC Sutton et al. Policy Gradient Methods for Reinforcement Learning with Function Approximation. In NeurIPS, 2000. reinforce.py
DQN Value-based / Off policy TD Mnih et al. Human-level control through deep reinforcement learning. Nature, 2015. doing
A2C Actor-Critic / On policy n-step TD Mnih et al. Asynchronous Methods for Deep Reinforcement Learning. In ICML, 2016. a2c.py
A3C Actor-Critic / On policy n-step TD .Mnih et al. Asynchronous Methods for Deep Reinforcement Learning. In ICML, 2016 a3c.py
ACER Actor-Critic / On policy GAE Wang et al. Sample Efficient Actor-Critic with Experience Replay. In ICLR, 2017. doing
ACKTR Actor-Critic / On policy GAE Wu et al. Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation. In NeurIPS, 2017. doing
PPO Actor-Critic / On policy GAE Schulman et al. Proximal Policy Optimization Algorithms. arXiv, 2017. ppo.py

Quick Start

Requirements

pytorch
gym

tensorboard  # for summary writer
tqdm         # for process bar

Abstract Agent

Components / Parameters

Component Description
policy neural network model
gamma discount factor of cumulative reward
lr learning rate. i.e. lr_actor, lr_critic
lr_decay weight decay to schedule the learning rate
lr_scheduler scheduler for the learning rate
coef_critic_loss coefficient of critic loss
coef_entropy_loss coefficient of entropy loss
writer summary writer to record information
buffer replay buffer to store historical trajectories
use_cuda use GPU
clip_grad gradients clipping
max_grad_norm maximum norm of gradients clipped
norm_advantage advantage normalization
open_tb open summary writer
open_tqdm open process bar

Methods

Methods Description
preprocess_obs() preprocess observation before input into the neural network
select_action() use actor network to select an action based on the policy distribution.
estimate_obs() use critic network to estimate the value of observation
update() update the parameter by calculate losses and gradients
train() set the neural network to train mode
eval() set the neural network to evaluate mode
save() save the model parameters
load() load the model parameters

Update & To-do & Limitations

Update History

  • 2021-12-09 ADD TRICK:norm_critic_loss in PPO
  • 2021-12-09 ADD PARAM: coef_critic_loss, coef_entropy_loss, log_step
  • 2021-12-07 ADD ALGO: A3C
  • 2021-12-05 ADD ALGO: PPO
  • 2021-11-28 ADD ALGO: A2C
  • 2021-11-20 ADD ALGO: Q learning, Reinforce

To-do List

  • ADD ALGO DQN, Double DQN, Dueling DQN, DDPG
  • ADD NN RNN Mode

Current Limitations

  • Unsupport Vectorized environments
  • Unsupport Continuous action space
  • Unsupport RNN-based model
  • Unsupport Imatation learning

Reference & Acknowledgements

Owner
Gemini Light
Gemini Light
Keyword-BERT: Keyword-Attentive Deep Semantic Matching

project discription An implementation of the Keyword-BERT model mentioned in my paper Keyword-Attentive Deep Semantic Matching (Plz cite this github r

1 Nov 14, 2021
Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning (ICLR 2021)

Understanding and Improving Encoder Layer Fusion in Sequence-to-Sequence Learning (ICLR 2021) Citation Please cite as: @inproceedings{liu2020understan

Sunbow Liu 22 Nov 25, 2022
Pytorch version of SfmLearner from Tinghui Zhou et al.

SfMLearner Pytorch version This codebase implements the system described in the paper: Unsupervised Learning of Depth and Ego-Motion from Video Tinghu

Clément Pinard 909 Dec 22, 2022
A PyTorch implementation of Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks

SVHNClassifier-PyTorch A PyTorch implementation of Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks If

Potter Hsu 182 Jan 03, 2023
Pytorch implementation of the AAAI 2022 paper "Cross-Domain Empirical Risk Minimization for Unbiased Long-tailed Classification"

[AAAI22] Cross-Domain Empirical Risk Minimization for Unbiased Long-tailed Classification We point out the overlooked unbiasedness in long-tailed clas

PatatiPatata 28 Oct 18, 2022
Unofficial implementation of HiFi-GAN+ from the paper "Bandwidth Extension is All You Need" by Su, et al.

HiFi-GAN+ This project is an unoffical implementation of the HiFi-GAN+ model for audio bandwidth extension, from the paper Bandwidth Extension is All

Brent M. Spell 134 Dec 30, 2022
public repo for ESTER dataset and modeling (EMNLP'21)

Project / Paper Introduction This is the project repo for our EMNLP'21 paper: https://arxiv.org/abs/2104.08350 Here, we provide brief descriptions of

PlusLab 19 Oct 27, 2022
Self-Supervised Learning

Self-Supervised Learning Features self_supervised offers features like modular framework support for multi-gpu training using PyTorch Lightning easy t

Robin 1 Dec 14, 2021
NaijaSenti is an open-source sentiment and emotion corpora for four major Nigerian languages

NaijaSenti is an open-source sentiment and emotion corpora for four major Nigerian languages. This project was supported by lacuna-fund initiatives. Jump straight to one of the sections below, or jus

Hausa Natural Language Processing 14 Dec 20, 2022
Neural Articulated Radiance Field

Neural Articulated Radiance Field NARF Neural Articulated Radiance Field Atsuhiro Noguchi, Xiao Sun, Stephen Lin, Tatsuya Harada ICCV 2021 [Paper] [Co

Atsuhiro Noguchi 144 Jan 03, 2023
PyArmadillo: an alternative approach to linear algebra in Python

PyArmadillo is a linear algebra library for the Python language, with an emphasis on ease of use.

Terry Zhuo 58 Oct 11, 2022
Automatically creates genre collections for your Plex media

Plex Auto Genres Plex Auto Genres is a simple script that will add genre collection tags to your media making it much easier to search for genre speci

Shane Israel 63 Dec 31, 2022
Interpretable and Generalizable Person Re-Identification with Query-Adaptive Convolution and Temporal Lifting

QAConv Interpretable and Generalizable Person Re-Identification with Query-Adaptive Convolution and Temporal Lifting This PyTorch code is proposed in

Shengcai Liao 166 Dec 28, 2022
End-to-End Speech Processing Toolkit

ESPnet: end-to-end speech processing toolkit system/pytorch ver. 1.3.1 1.4.0 1.5.1 1.6.0 1.7.1 1.8.1 1.9.0 ubuntu20/python3.9/pip ubuntu20/python3.8/p

ESPnet 5.9k Jan 04, 2023
Code from Daniel Lemire, A Better Alternative to Piecewise Linear Time Series Segmentation

PiecewiseLinearTimeSeriesApproximation code from Daniel Lemire, A Better Alternative to Piecewise Linear Time Series Segmentation, SIAM Data Mining 20

Daniel Lemire 21 Oct 27, 2022
Repo for the Tutorials of Day1-Day3 of the Nordic Probabilistic AI School 2021 (https://probabilistic.ai/)

ProbAI 2021 - Probabilistic Programming and Variational Inference Tutorial with Pryo Day 1 (June 14) Slides Notebook: students_PPLs_Intro Notebook: so

PGM-Lab 46 Nov 01, 2022
Face Library is an open source package for accurate and real-time face detection and recognition

Face Library Face Library is an open source package for accurate and real-time face detection and recognition. The package is built over OpenCV and us

52 Nov 09, 2022
cisip-FIRe - Fast Image Retrieval

Fast Image Retrieval (FIRe) is an open source image retrieval project release by Center of Image and Signal Processing Lab (CISiP Lab), Universiti Malaya. This project implements most of the major bi

CISiP Lab 39 Nov 25, 2022
A GPT, made only of MLPs, in Jax

MLP GPT - Jax (wip) A GPT, made only of MLPs, in Jax. The specific MLP to be used are gMLPs with the Spatial Gating Units. Working Pytorch implementat

Phil Wang 53 Sep 27, 2022
An Object Oriented Programming (OOP) interface for Ontology Web language (OWL) ontologies.

Enabling a developer to use Ontology Web Language (OWL) along with its reasoning capabilities in an Object Oriented Programming (OOP) paradigm, by pro

TheEngineRoom-UniGe 7 Sep 23, 2022