Deep reinforcement learning library built on top of Neural Network Libraries

Last update: Dec 14, 2022

Related tags

Overview

Deep Reinforcement Learning Library built on top of Neural Network Libraries

NNablaRL is a deep reinforcement learning library built on top of Neural Network Libraries that is intended to be used for research, development and production.

Installation

Installing NNablaRL is easy!

$ pip install nnabla-rl

NNablaRL only supports Python version >= 3.6 and NNabla version >= 1.17.

Enabling GPU accelaration (Optional)

NNablaRL algorithms run on CPU by default. To run the algorithm on GPU, first install nnabla-ext-cuda as follows. (Replace [cuda-version] depending on the CUDA version installed on your machine.)

$ pip install nnabla-ext-cuda[cuda-version]

# Example installation. Supposing CUDA 11.0 is installed on your machine.
$ pip install nnabla-ext-cuda110

After installing nnabla-ext-cuda, set the gpu id to run the algorithm on through algorithm's configuration.

import nnabla_rl.algorithms as A

config = A.DQNConfig(gpu_id=0) # Use gpu 0. If negative, will run on CPU.
dqn = A.DQN(env, config=config)
...

Features

Friendly API

NNablaRL has friendly Python APIs which enables to start training with only 3 lines of python code.

import nnabla_rl
import nnabla_rl.algorithms as A
from nnabla_rl.utils.reproductions import build_atari_env

env = build_atari_env("BreakoutNoFrameskip-v4") # 1
dqn = A.DQN(env)  # 2
dqn.train(env)  # 3

To get more details about NNablaRL, see documentation and examples.

Many builtin algorithms

Most of famous/SOTA deep reinforcement learning algorithms, such as DQN, SAC, BCQ, GAIL, etc., are implemented in NNablaRL. Implemented algorithms are carefully tested and evaluated. You can easily start training your agent using these verified implementations.

For the list of implemented algorithms see here.

You can also find the reproduction and evaluation results of each algorithm here.
Note that you may not get completely the same results when running the reproduction code on your computer. The result may slightly change depending on your machine, nnabla/nnabla-rl's package version, etc.

Seemless switching of online and offline training

In reinforcement learning, there are two main training procedures, online and offline, to train the agent. Online training is a training procedure that executes both data collection and network update alternately. Conversely, offline training is a training procedure that updates the network using only existing data. With NNablaRL, you can switch these two training procedures seemlessly. For example, as shown below, you can easily train a robot's controller online using simulated environment and finetune it offline with real robot dataset.

import nnabla_rl
import nnabla_rl.algorithms as A

simulator = get_simulator() # This is just an example. Assuming that simulator exists
dqn = A.DQN(simulator)
# train online for 1M iterations
dqn.train_online(simulator, total_iterations=1000000)

real_data = get_real_robot_data() # This is also an example. Assuming that you have real robot data
# fine tune the agent offline for 10k iterations using real data
dqn.train_offline(real_data, total_iterations=10000)

Getting started

Try below interactive demos to get started.
You can run it directly on Colab from the links in the table below.

Title	Notebook	Target RL task
Simple reinforcement learning training to get started		Pendulum
Learn how to use training algorithms		Pendulum
Learn how to use customized network model for training		Mountain car
Learn how to use different network solver for training		Pendulum
Learn how to use different replay buffer for training		Pendulum
Learn how to use your own environment for training		Customized environment
Atari game training example		Atari games

Documentation

Full documentation is here.

Contribution guide

Any kind of contribution to NNablaRL is welcome! See the contribution guide for details.

License

NNablaRL is provided under the Apache License Version 2.0 license.

Comments

Update cem function interface

Updated interface of cross entropy function methods. The args, pop_size is now changed to sample_size. In addition, the given objective function to CEM function will be called with variable x which has (batch_size, sample_size, x_dim). This is different from previous interface. If you want to know the details, please see the function docs.

opened by sbsekiguchi 1
Add implementation for RNN support and DRQN algorithm
Add RNN model support and DRQN algorithm.

Following trainers will support RNN-model.

Q value-based trainers

Deterministic gradient and Soft policy trainers

Other trainers can support RNN models in future but is not implemented in the initial release.

See this paper for the details of the DRQN algorithm.
opened by ishihara-y 1

Implement SACD

This PR implements SAC-D algorithm. https://arxiv.org/abs/2206.13901

These changes have been made:

New environments with factored reward functions have been added
- FactoredLunarLanderContinuousV2NNablaRL-v1
- FactoredAntV4NNablaRL-v1
- FactoredHopperV4NNablaRL-v1
- FactoredHalfCheetahV4NNablaRL-v1
- FactoredWalker2dV4NNablaRL-v1
- FactoredHumanoidV4NNablaRL-v1
SACD algorithms has been added
SoftQDTrainer has been added
_InfluenceMetricsEvaluator has been added
reproduction script has been added (not benchmarked yet)

visualizing influence metrics

import gym

import numpy as np
import matplotlib.pyplot as plt

import nnabla_rl.algorithms as A
import nnabla_rl.hooks as H
import nnabla_rl.writers as W
from nnabla_rl.utils.evaluator import EpisodicEvaluator

env = gym.make("FactoredLunarLanderContinuousV2NNablaRL-v1")
eval_env = gym.make("FactoredLunarLanderContinuousV2NNablaRL-v1")

evaluation_hook = H.EvaluationHook(
    eval_env,
    EpisodicEvaluator(run_per_evaluation=10),
    timing=5000,
    writer=W.FileWriter(outdir="logdir", file_prefix='evaluation_result'),
)
iteration_num_hook = H.IterationNumHook(timing=100)

config = A.SACDConfig(gpu_id=0, reward_dimension=9)
sacd = A.SACD(env, config=config)
sacd.set_hooks([iteration_num_hook, evaluation_hook])
sacd.train_online(env, total_iterations=100000)

influence_history = []

state = env.reset()
while True:
    action = sacd.compute_eval_action(state)
    influence = sacd.compute_influence_metrics(state, action)
    influence_history.append(influence)
    state, _, done, _ = env.step(action)
    if done:
        break

influence_history = np.array(influence_history)
for i, label in enumerate(["position", "velocity", "angle", "left_leg", "right_leg", "main_eingine", "side_engine", "failure", "success"]):
    plt.plot(influence_history[:, i], label=label)
plt.xlabel("step")
plt.ylabel("influence metrics")
plt.legend()
plt.show()

sample animation

sample

opened by ishihara-y 0

Add gmm and Update gaussian

Added gmm and gaussian of the numpy models. In addition, updated the gaussian distribution's API.

The API change is like following:

batch_size = 10
output_dim = 10
input_shape = (batch_size, output_dim)
mean = np.zeros(shape=input_shape)
sigma = np.ones(shape=input_shape) * 5.
ln_var = np.log(sigma) * 2.
distribution = D.Gaussian(mean, ln_var)
# return nn.Variable
assert isinstance(distribution.sample(), nn.Variable)

Updated:

batch_size = 10
output_dim = 10
input_shape = (batch_size, output_dim)
mean = np.zeros(shape=input_shape)
sigma = np.ones(shape=input_shape) * 5.
ln_var = np.log(sigma) * 2.
# You have to pass the nn.Variable if you want to get nn.Variable as all class method's return.
distribution = D.Gaussian(nn.Variable.from_numpy_array(mean), nn.Variable.from_numpy_array(ln_var))
assert isinstance(distribution.sample(), nn.Variable)

# If you pass np.ndarray, then all class methods return np.ndarray
# Currently, only support without batch shape (i.e. mean.shape = (dims,), ln_var.shape = (dims, dims)).
distribution = D.Gaussian(mean[0], np.diag(ln_var[0]))  # without batch
assert isinstance(distribution.sample(), np.ndarray)

opened by sbsekiguchi 0

Support nnabla-browser

[x] add MonitorWriter
[x] save computational graph as nntxt

example

import gym

import nnabla_rl.algorithms as A
import nnabla_rl.hooks as H
import nnabla_rl.writers as W
from nnabla_rl.utils.evaluator import EpisodicEvaluator

# save training computational graph
training_graph_hook = H.TrainingGraphHook(outdir="test")

# evaluation hook with nnabla's Monitor
eval_env = gym.make("Pendulum-v0")
evaluator = EpisodicEvaluator(run_per_evaluation=10)
evaluation_hook = H.EvaluationHook(
    eval_env,
    evaluator,
    timing=10,
    writer=W.MonitorWriter(outdir="test", file_prefix='evaluation_result'),
)

env = gym.make("Pendulum-v0")
sac = A.SAC(env)
sac.set_hooks([training_graph_hook, evaluation_hook])

sac.train_online(env, total_iterations=100)

opened by ishihara-y 0

Add iLQR and LQR

Implementation of Linear Quadratic Regulator (LQR) and iterative LQR algorithms.

Co-authored-by: Yu Ishihara [email protected] Co-authored-by: Shunichi Sekiguchi [email protected]

opened by ishihara-y 0
Check np_random instance and use correct randint alternative
I am not sure when this change was made but in some environment, gym.unwrapped.np_random returns Generator instead of RandomState.

# in case of RandomState # this line works gym.unwrapped.np_random.rand_int(...) # in case of Generator # rand_int does not exist and we must use integers as an alternative gym.unwrapped.np_random.integers(...)

This PR will fix this issue and chooses correct function for sampling integers.
opened by ishihara-y 0
Add icra2018 qtopt

Add QtOpt algorithm proposed by Deirdre Quillen et al. in the paper Deep Reinforcement Learning for Vision-Based Robotic Grasping: A Simulated Comparative Evaluation of Off-Policy Methods.

opened by sbsekiguchi 0

Releases(v0.12.0)

v0.12.0(Oct 7, 2022)
special notes

This version does NOT support the version v0.26.0 and greater of openai gym.

We're going to support openai gym version v0.26.0 and greater in the next release of nnablaRL. nnablaRL will stop officially supporting version less than v0.26.0 of openai gym from the next release.

Only support python 3.7 or greater

Python 3.6 is not supported from this new release

release-note-bugfix

Fix algos. Properly apply grad clip and weight decay

Correct variable to use during rnn training

Check np_random instance and use correct randint alternative

Fix pendulum-env render

Fix ScreenRenderEnv to support gym 0.25.0

release-note-algorithm

Run PPO on single process when actor num is 1

Add qrsac algorithm

Add REDQ algorithm

Update to support discrete tuple

Add icra2018 qtopt

Add goal_env module

Add PPO tuple state support

Add iLQR and LQR

Add mppi

Add ddp

release-note-distributions

Add gmm and Update gaussian

release-note-utility

Support nnabla-browser

release-note-docs

Fix module path of sac

Improve README with graph visulization feature with nnabla-browser

release-note-build

Extend github build timelimit to 5 minutes

Install the latest nnablaRL by:

pip install nnabla-rl
Source code(tar.gz)
Source code(zip)
v0.11.0(Mar 17, 2022)
release-note-bugfix

Fix readme of reproduction

Fix cem test

Fix README samples and add prerequisites for Atari reproduction codes

Fix tutorial-model

Fix add workaround to avoid gym error

release-note-algorithm

Add ATRPO

Add implementation for RNN support and DRQN algorithm, Support RNN models on DQN and DQN inherited algorithms, Follow DRQN author's implementation and update results

Expand RNN support to dist rl algorithms

Add rnn support to actor critic algorithms

Support n-step q learning in ddpg, td3, her, sac and ICML2018SAC

Stop back propagating to target v function

Add MME-SAC algorithm and Sparse/Delayed mujoco environment and Add Disentangled version of MME-SAC

release-note-functions

Add stop gradient function

Add random shooting

Update cem function interface

release-note-distributions

Add Bernoulli distribution

Enable sampling from multidimensional logits

Add one hot softmax

release-note-utility

Support batched states for evaluation

Add convenient episode result env

Add profile function

release-note-docs

Update version in algorithm catalog

Add readthedocs yaml and Fixed yaml file

Add HER and IQN to algorithm catalog

Install the latest nnablaRL by:

pip install nnabla-rl
Source code(tar.gz)
Source code(zip)
v0.10.0(Oct 20, 2021)
release-note-bugfix

Fix interactive-demos used in colab and Fix interactive-demos used in colab about gpu id

release-note-algorithm

Add HER

Add Rainbow

Fix algorithm reproduction directory path

Add rank-based prioritized replay

Add Double Dqn

Move algorithms reproduction dir to reproductions/algorithms

Enable injecting explorer to algorithm

Support multi-step Q learning

Add Categorical Double Dqn

Add c51 all atari game results

Support Tuple State and Update compute_v_target_and_advantage to support tuple state

release-note-parametric_functions

Add spatial_softmax function and Add spatial softmax docs

Add noisy net

release-note-functions

Add batch_flatten function

Add triangular_matrix function

release-note-utility

Fix load_snapshot

release-note-docs

Fix docs typo

Fix typo in readme

Display correct version

Fix numpy array typing to np.ndarray

Add function docs

Fix docstring of algorithms

Update NNablaRL to nnablaRL

Fix typo seemless -> seamless

Fix build badge URL

Install the latest nnablaRL by:

pip install nnabla-rl
Source code(tar.gz)
Source code(zip)
v0.9.0(Jun 14, 2021)
We are happy to announce the release of nnablaRL, a deep reinforcement learning (RL) library built on top of nnabla. Reinforcement learning is one of the cutting edge machine learning technology that achieves super human performance in the field of gaming, robotics, etc.. We hope that this new library, nnablaRL, helps RL experts and also non-RL experts using reinforcement learning algorithms easily among our nnabla ecosystem.

Features of nnablaRL is the following.

Friendly API

nnablaRL has friendly Python APIs which enables to start training with only 3 lines of python code.

import nnabla_rl import nnabla_rl.algorithms as A from nnabla_rl.utils.reproductions import build_atari_env env = build_atari_env("BreakoutNoFrameskip-v4") # 1 dqn = A.DQN(env) # 2 dqn.train(env) # 3

You can also customize the algorithm's hyper parameters easily. For example, you can change the batch size of training data as follows.

import nnabla_rl import nnabla_rl.algorithms as A from nnabla_rl.utils.reproductions import build_atari_env env = build_atari_env("BreakoutNoFrameskip-v4") config = A.DQNConfig(batch_size=100) dqn = A.DQN(env, config=config) dqn.train(env)

In addition to algorithm hyper parameters, you can also flexibly change the training component such as neural network models and model solvers. For details, see sample codes and API documents.

Many builtin algorithms

Most of famous/SoTA deep reinforcement learning algorithms, such as DQN, SAC, BCQ, GAIL, etc., is already implemented in nnablaRL. Implemented algorithms are carefully tested and evaluated. You can easily start training your agent using these verified implementations. Please check the sample codes and document for detail usage of each algorithm. You can find the list of implemented algorithms here.

Seemless switching of online and offline training

In reinforcement learning, there are two main training procedures, online and offline, to train the agent. Online training is a training procedure that executes both data collection and network update alternately. Conversely, offline training is a training procedure that updates the network using only existing data. With nnablaRL, you can switch these two training procedures seemlessly. For example, as shown below, you can easily train a robot's controller online using simulated environment and finetune it offline with real robot dataset.

import nnabla_rl import nnabla_rl.algorithms as A simulator = get_simulator() # This is just an example. Assuming that simulator exists dqn = A.DQN(simulator, config=config) dqn.train_online(simulator) real_data = get_real_data() # This is also an example. Assuming that you have real robot data dqn.train_offline(real_data)

Getting started

You can find both notebook style interactive demos and raw python scripts as a sample code to get started. If you are unfamiliar with reinforcement learning, we recommend trying the notebook as a starting point. You can immediately launch and start training through google colaboratory! Check the list of notebooks here.

Development of nnablaRL has just started. We will continue adding new reinforcement learning algorithms and SoTA techniques to nnablaRL. Feedbacks, feature requests and contributions are welcome! Check the contribution guide for details.
Source code(tar.gz)
Source code(zip)

Owner

Sony

Sony Group Corporation

GitHub Repository

A Discord bot that generates inspirational quotes & motivating messages whenever a user is sad

Encourage bot is a discord bot that allows users to randomly get Inspirational quotes messages and gives motivational encouragements whenever someone says that he's sad/depressed.

1 Nov 25, 2021

Crypto-trading-simulator - Cryptocurrency trading simulator using Python, Streamlit

Crypto Trading Simulator Run streamlit run main.py Dependency Python 3 streamli

12 Jul 02, 2022

AWS Serverless Application Model (SAM) is an open-source framework for building serverless applications

AWS Serverless Application Model (AWS SAM) The AWS Serverless Application Model (SAM) is an open-source framework for building serverless applications

8.9k Dec 31, 2022

Herramienta para transferir eventos de Sucuri WAF hacia Azure Data Tables.

Transfiere eventos de Sucuri hacia Azure Data Tables Script para transferir eventos del Sucuri Web Application Firewall (WAF) hacia Azure Data Tables,

1 Dec 22, 2021

HackZ-Token-Grabber-V2 - HackZ Token Grabber V2

HackZ-Token-Grabber-V2 was made by Love ❌ code ✅ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ ‎ 🌟

2 Mar 01, 2022

Code for generating Tiktok X-Gorgon, X-Khronos and etc. parameters

TikTok-Algorithm I found this python file from a source which was later deleted. Although the test api functions no longer seem to work, surprisingly

0 Dec 09, 2021

Automatically send commands to send Twitch followers to any Twitch account.

Automatically send commands to send Twitch followers to any Twitch account. You just need to be in a Twitch follow bot Discord server!

6 Nov 27, 2022

The program for obtaining a horoscope in Python using API from rapidapi.com site.

Python horoscope The program allows you to get a horoscope for your zodiac sign and immediately translate it into almost any language. Step 1 The firs

0 Dec 25, 2021

Gclone-Discord-Utilities - A Pycord bot for running GClone, an RClone mod that allows multiple Google Service Account configuration

Gclone Discord Utilities Features Clone - Clone a public/private google drive fi

5 Oct 19, 2022

ShadowClone allows you to distribute your long running tasks dynamically across thousands of serverless functions and gives you the results within seconds where it would have taken hours to complete

240 Jan 06, 2023

API kumpulan doa-doa sesuai al-qur'an dan as-sunnah

4 Nov 26, 2022

An open source, multipurpose, configurable discord bot that does it all

Spacebot - Discord Bot Music, Moderation, Fun, Utilities, Games and Fully Configurable. Overview • Contributing • Self hosting • Documentation (not re

41 Dec 10, 2022

Deep reinforcement learning library built on top of Neural Network Libraries

Related tags

Overview

Deep Reinforcement Learning Library built on top of Neural Network Libraries

Installation

Enabling GPU accelaration (Optional)

Features

Friendly API

Many builtin algorithms

Seemless switching of online and offline training

Getting started

Documentation

Contribution guide

License

Comments

visualizing influence metrics

sample animation

Releases(v0.12.0)

v0.12.0(Oct 7, 2022)

v0.11.0(Mar 17, 2022)

v0.10.0(Oct 20, 2021)

v0.9.0(Jun 14, 2021)

Getting started

Owner

Sony

A Discord bot that generates inspirational quotes & motivating messages whenever a user is sad

Crypto-trading-simulator - Cryptocurrency trading simulator using Python, Streamlit

AWS Serverless Application Model (SAM) is an open-source framework for building serverless applications

Herramienta para transferir eventos de Sucuri WAF hacia Azure Data Tables.

HackZ-Token-Grabber-V2 - HackZ Token Grabber V2

Code for generating Tiktok X-Gorgon, X-Khronos and etc. parameters

Automatically send commands to send Twitch followers to any Twitch account.

The program for obtaining a horoscope in Python using API from rapidapi.com site.

Gclone-Discord-Utilities - A Pycord bot for running GClone, an RClone mod that allows multiple Google Service Account configuration

Github Workflows üzerinde Çalışan A101 Aktüel Telegam Bot

An incomplete add-on extension to Pyrogram, to create telegram bots a bit more easily

An all-in-one discord bot!

Python interface to the World Bank Indicators and Climate APIs

Python Discord Server Nuker

Ig-Crackv2 - Crack Instagram Version 2.9

Unauthenticated enumeration of services, roles, and users in an AWS account or in every AWS account in existence.

Python based Algo trading bot for Nifty / Banknifty futures and options

ShadowClone allows you to distribute your long running tasks dynamically across thousands of serverless functions and gives you the results within seconds where it would have taken hours to complete

API kumpulan doa-doa sesuai al-qur'an dan as-sunnah

An open source, multipurpose, configurable discord bot that does it all