Multi-objective gym environments for reinforcement learning.

Last update: Jan 03, 2023

Overview

MO-Gym: Multi-Objective Reinforcement Learning Environments

Gym environments for multi-objective reinforcement learning (MORL). The environments follow the standard gym's API, but return vectorized rewards as numpy arrays.

For details on multi-objective MPDS (MOMDP's) and other MORL definitions, see A practical guide to multi-objective reinforcement learning and planning.

Install

git clone https://github.com/LucasAlegre/mo-gym.git
cd mo-gym
pip install -e .

Usage

import gym
import mo_gym

env = gym.make('minecart-v0') # It follows the original gym's API ...

obs = env.reset()
next_obs, vector_reward, done, info = env.step(your_agent.act(obs))  # but vector_reward is a numpy array!

# Optionally, you can scalarize the reward function with the LinearReward wrapper
env = mo_gym.LinearReward(env, weight=np.array([0.8, 0.2, 0.2]))

Environments

Env	Obs/Action spaces	Objectives	Description
`deep-sea-treasure-v0`	Discrete / Discrete	`[treasure, time_penalty]`	Agent is a submarine that must collect a treasure while taking into account a time penalty. Treasures values taken from Yang et al. 2019.
`resource-gathering-v0`	Discrete / Discrete	`[enemy, gold, gem]`	Agent must collect gold or gem. Enemies have a 10% chance of killing the agent. From Barret & Narayanan 2008.
`four-room-v0`	Discrete / Discrete	`[item1, item2, item3]`	Agent must collect three different types of items in the map and reach the goal.
`mo-mountaincar-v0`	Continuous / Discrete	`[time_penalty, reverse_penalty, forward_penalty]`	Classic Mountain Car env, but with extra penalties for the forward and reverse actions. From Vamplew et al. 2011.
`mo-reacher-v0`	Continuous / Discrete	`[target_1, target_2, target_3, target_4]`	Reacher robot from PyBullet, but there are 4 different target positions.
`minecart-v0`	Continuous or Image / Discrete	`[ore1, ore2, fuel]`	Agent must collect two types of ores and minimize fuel consumption. From Abels et al. 2019.
`mo-supermario-v0`	Image / Discrete	`[x_pos, time, death, coin, enemy]`	Multi-objective version of SuperMarioBrosEnv. Objectives are defined similarly as in Yang et al. 2019.

Citing

If you use this repository in your work, please cite:

@misc{mo-gym,
  author = {Lucas N. Alegre},
  title = {MO-Gym: Multi-Objective Reinforcement Learning Environments},
  year = {2022},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/LucasAlegre/mo-gym}},
}

Acknowledgments

The minecart-v0 env is a refactor of https://github.com/axelabels/DynMORL.
The deep-sea-treasure-v0 and mo-supermario-v0 are based on https://github.com/RunzheYang/MORL.
The four-room-v0 is based on https://github.com/mike-gimelfarb/deep-successor-features-for-transfer.

Comments

Adds the breakable bottles environment

Adds the breakable bottles environment which is used in Vamplew et al. 2021 as a toy model for irreversible change in stochastic environments.

I wasn't really planning for creating a pull request, so the commit history is a bit messy...

opened by rk1a 4
A few bug fixes
DST:

The bounds of the rewards were hardcoded for the convex map.

The way to fix the seed is deprecated. From what I saw in the official gym envs, the seed is now fixed just using the reset method. (e.g. https://github.com/openai/gym/blob/master/gym/envs/classic_control/cartpole.py#L198)

setup.py:

Gym 0.25.0 introduces breaking changes. So I fixed the version to 0.24.1.
opened by ffelten 2
Consider using info field for reward vector

Hello,

Thanks for this repository, it will be very useful to the MORL community :-).

I was just wondering if you think it would be a good idea to enforce gym compatibility by specifying rewards as scalar and giving the vectorial rewards elsewhere. The idea would be to use a field in the info dictionary as they do in PGMORL. This would allow to use existing RL algorithms and logging libraries out of box (e.g. stable-baselines, tensorboard logs, ...).

For example: In a DST env, if you return the treasure reward only in the reward field, you can use the DQN implementation from baselines and have insights on the average reward, as well as the episode length in the tensorboard logs. Of course, you can extract the full vectorial reward from the info dictionary in order to learn with MORL :-).

With kind regards,

Florian

opened by ffelten 2
Add MO reward wrappers

I added two wrappers commonly used: normalize and clip.

The idea is to provide the index of the reward component you want to normalize or clip, and leave the other components as they are. Of course, wrappers can be wrapped inside others to normalize all rewards (see tests).

opened by ffelten 1

Fix notebook

There are still issues with the video recorder :(

/usr/local/lib/python3.9/site-packages/gym/wrappers/monitoring/video_recorder.py:59: UserWarning: WARN: Disabling video recorder because environment <TimeLimit<OrderEnforcing<MOMountainCar<mo-mountaincar-v0>>>> was not initialized with any compatible video mode between `rgb_array` and `rgb_array_list`
  logger.warn(

opened by ffelten 0

Add fishwood env

Code was provided by Denis Steckelmacher, I did a bit of refactoring and migrated it to 0.26.

I didn't bother making the render with the images, but I did upload them in case somebody gets motivated, the env is super simple.

opened by ffelten 0
Add wrapper to help logging episode returns

The implementation is mostly a copy paste of the original gym. I had to copy paste instead of override and call to super because the way the return is a numpy array, which is mutable, and the original implementation resets it to 0. Hence, if we kept the original, the return will always be a vector of zeros (because resetted)

opened by ffelten 0

Releases(0.2.1)

0.2.1(Dec 9, 2022)
5 new environments: fishwood-v0 (ESR), mo-MountainCarContinuous-v0, water-reservoir-v0, mo-highway-v0 and mo-highway-fast-v0;

Revamped README file;

Linting and automatic imports optimization;

Updated bib file and citation;

Few bugfixes.

Source code(tar.gz)
Source code(zip)
0.2.0(Sep 25, 2022)

Support for new Gym>=0.26 API
Source code(tar.gz)
Source code(zip)
0.1.2(Sep 25, 2022)

Source code(tar.gz)
Source code(zip)
0.1.1(Aug 24, 2022)

Source code(tar.gz)
Source code(zip)

Owner

Lucas Alegre

PhD student at Institute of Informatics - UFRGS. Interested in reinforcement learning, machine learning and artificial (neuro-inspired) intelligence.

GitHub Repository

Snapchat-filters-app-opencv-python - Here we used opencv and other inbuilt python modules to create filter application like snapchat

Snapchat like filter App using opencv python Backend : opencv and python Library

2 Jul 19, 2022

A large-scale face dataset for face parsing, recognition, generation and editing.

CelebAMask-HQ [Paper] [Demo] CelebAMask-HQ is a large-scale face image dataset that has 30,000 high-resolution face images selected from the CelebA da

1.7k Dec 26, 2022

Code for Understanding Pooling in Graph Neural Networks

Select, Reduce, Connect This repository contains the code used for the experiments of: "Understanding Pooling in Graph Neural Networks" Setup Install

37 Dec 13, 2022

A small library for creating and manipulating custom JAX Pytree classes

Treeo A small library for creating and manipulating custom JAX Pytree classes Light-weight: has no dependencies other than jax. Compatible: Treeo Tree

58 Nov 23, 2022

ALBERT-pytorch-implementation - ALBERT pytorch implementation

ALBERT-pytorch-implementation developing... 모델의 개념이해를 돕기 위한 구현물로 현재 변수명을 상세히 적었고

3 Oct 06, 2022

Public implementation of "Learning from Suboptimal Demonstration via Self-Supervised Reward Regression" from CoRL'21

Self-Supervised Reward Regression (SSRR) Codebase for CoRL 2021 paper "Learning from Suboptimal Demonstration via Self-Supervised Reward Regression "

19 Dec 12, 2022

Reproduction of Vision Transformer in Tensorflow2. Train from scratch and Finetune.

Vision Transformer(ViT) in Tensorflow2 Tensorflow2 implementation of the Vision Transformer(ViT). This repository is for An image is worth 16x16 words

42 Dec 27, 2022

Neural Surface Maps

Neural Surface Maps Official implementation of Neural Surface Maps - Luca Morreale, Noam Aigerman, Vladimir Kim, Niloy J. Mitra [Paper] [Project Page]

49 Dec 13, 2022

Unofficial PyTorch implementation of MobileViT.

MobileViT Overview This is a PyTorch implementation of MobileViT specified in "MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Tr

348 Dec 23, 2022

K-FACE Analysis Project on Pytorch

Installation Setup with Conda # create a new environment conda create --name insightKface python=3.7 # or over conda activate insightKface #install t

7 Nov 10, 2022

PyTorch implementation of the Crafting Better Contrastive Views for Siamese Representation Learning

Crafting Better Contrastive Views for Siamese Representation Learning This is the official PyTorch implementation of the ContrastiveCrop paper: @artic

249 Dec 28, 2022

Implementation for Homogeneous Unbalanced Regularized Optimal Transport

HUROT: An Homogeneous formulation of Unbalanced Regularized Optimal Transport. This repository provides code related to this preprint. This is an alph

1 Feb 17, 2022

1st-in-MICCAI2020-CPM - Combined Radiology and Pathology Classification

Combined Radiology and Pathology Classification MICCAI 2020 Combined Radiology a

22 Dec 08, 2022

Official PyTorch implementation of paper: Standardized Max Logits: A Simple yet Effective Approach for Identifying Unexpected Road Obstacles in Urban-Scene Segmentation (ICCV 2021 Oral Presentation)

SML (ICCV 2021, Oral) : Official Pytorch Implementation This repository provides the official PyTorch implementation of the following paper: Standardi

61 Dec 27, 2022

The implementation of the algorithm in the paper "Safe Deep Semi-Supervised Learning for Unseen-Class Unlabeled Data" published in ICML 2020.

DS3L This is the code for paper "Safe Deep Semi-Supervised Learning for Unseen-Class Unlabeled Data" published in ICML 2020. Setups The code is implem

36 Oct 19, 2022

PyTorch implementation of SwAV (Swapping Assignments between Views)

Unsupervised Learning of Visual Features by Contrasting Cluster Assignments This code provides a PyTorch implementation and pretrained models for SwAV

1.7k Jan 04, 2023

Auxiliary Raw Net (ARawNet) is a ASVSpoof detection model taking both raw waveform and handcrafted features as inputs, to balance the trade-off between performance and model complexity.

Overview This repository is an implementation of the Auxiliary Raw Net (ARawNet), which is ASVSpoof detection system taking both raw waveform and hand

6 Jul 08, 2022

Multi-objective gym environments for reinforcement learning.

Related tags

Overview

MO-Gym: Multi-Objective Reinforcement Learning Environments

Install

Usage

Environments

Citing

Acknowledgments

Comments

Adds the breakable bottles environment

A few bug fixes

Consider using info field for reward vector

Add MO reward wrappers