[ICML 2020] Prediction-Guided Multi-Objective Reinforcement Learning for Continuous Robot Control

Overview

PG-MORL

This repository contains the implementation for the paper Prediction-Guided Multi-Objective Reinforcement Learning for Continuous Robot Control (ICML 2020).

In this paper, we propose an evolutionary learning algorithm to compute a high-quality and dense Pareto solutions for multi-objective continuous robot control problems. We also design seven multi-objective continuous control benchmark problems based on Mujoco, which are also included in this repository. This repository also contains the code for the baseline algorithms in the paper.

teaser

Installation

Prerequisites

  • Operating System: tested on Ubuntu 16.04 and Ubuntu 18.04.
  • Python Version: >= 3.7.4.
  • PyTorch Version: >= 1.3.0.
  • MuJoCo : install mujoco and mujoco-py of version 2.0 by following the instructions in mujoco-py.

Install Dependencies

You can either install the dependencies in a conda virtual env (recomended) or manually.

For conda virtual env installation, simply create a virtual env named pgmorl by:

conda env create -f environment.yml

If you prefer to install all the dependencies by yourself, you could open environment.yml in editor to see which packages need to be installed by pip.

Run the Code

The training related code are in the folder morl. We provide the scripts in scrips folder to run our algorithm/baseline algorithms on each problem described in the paper, and also provide several visualization scripts in scripts/plot folder for you to visualize the computed Pareto policies and the training process.

Precomputed Pareto Results

While you can run the training code the compute the Pareto policies from scratch by following the training steps below, we also provide the precomputed Pareto results for each problem. You can download them for each problem separately in this google drive link and directly visualize them with the visualization instructions to play with the results. After downloading the precomputed results, you can unzip it, create a results folder under the project root directory, and put the downloaded file inside.

Benchmark Problems

We design seven multi-objective continuous control benchmark problems based on Mujoco simulation, including Walker2d-v2, HalfCheetah-v2, Hopper-v2, Ant-v2, Swimmer-v2, Humanoid-v2, and Hopper-v3. A suffix of -v3 indicates a three-objective problem. The reward (i.e. objective) functions in each problem are designed to have similar scales. All environments code can be found in environments/mujoco folder. To avoid conflicting to the original mujoco environment names, we add a MO- prefix to the name of each environment. For example, the environment name for Walker2d-v2 is MO-Walker2d-v2.

Train

The main entrance of the training code is at morl/run.py. We provide a training script in scripts folder for each problem for you to easily start with. You can just follow the following steps to see how to run the training for each problem by each algorithm (our algorithm and baseline algorithms).

  • Enter the project folder

    cd PGMORL
    
  • Activate the conda env:

    conda activate pgmorl
    
  • To run our algorithm on Walker2d-v2 for a single run:

    python scripts/walker2d-v2.py --pgmorl --num-seeds 1 --num-processes 1
    

    You can also set other flags as arguments to run the baseline algorithms (e.g. --ra, --moead, --pfa, --random). Please refer to the python scripts for more details about the arguments.

  • By default, the results are stored in results/[problem name]/[algorithm name]/[seed idx].

Visualization

  • We provide a script to visualize the computed/downloaded Pareto results.

    python scripts/plot/ep_obj_visualize_2d.py --env MO-Walker2d-v2 --log-dir ./results/Walker2d-v2/pgmorl/0/
    

    You can replace MO-Walker2d-v2 to your problem name, and replace the ./results/Walker2d-v2/pgmorl/0 by the path to your stored results.

    It will show a plot of the computed Pareto policies in the performance space. By double-click the point in the plot, it will automatically open a new window and render the simulation for the selected policy.

  • We also provide a script to help you visualize the evolution process of the policy population.

    python scripts/plot/training_visualize_2d.py --env MO-Walker2d-v2 --log-dir ./results/Walker2d-v2/pgmorl/0/
    

    It will plot the policy population (gray points) in each generation with some other useful information. The black points are the policies on the Pareto front, the green circles are the selected policies to be optimized in next generation, the red points are the predicted offsprings and the green points are the real offsprings. You can interact with the plot with the keyboard. For example, be pressing left/right, you can evolve the policy population by generation. You can refer to the plot scripts for the full description of the allowable operations.

Reproducibility

We run all our experiments on VM instances with 96 Intel Skylake vCPUs and 86.4G memory on Google Cloud Platform without GPU.

Acknowledgement

We use the implementation of pytorch-a2c-ppo-acktr-gail as the underlying PPO implementation and modify it into our Multi-Objective Policy Gradient algorithm.

Citation

If you find our paper or code is useful, please consider citing:

@inproceedings{xu2020prediction,
  title={Prediction-Guided Multi-Objective Reinforcement Learning for Continuous Robot Control},
  author={Xu, Jie and Tian, Yunsheng and Ma, Pingchuan and Rus, Daniela and Sueda, Shinjiro and Matusik, Wojciech},
  booktitle={Proceedings of the 37th International Conference on Machine Learning},
  year={2020}
}
basic tutorial on pytorch

Quick Tutorial on PyTorch PyTorch Basics Linear Regression Logistic Regression Artificial Neural Networks Convolutional Neural Networks Recurrent Neur

7 Sep 15, 2022
FCN (Fully Convolutional Network) is deep fully convolutional neural network architecture for semantic pixel-wise segmentation

FCN_via_Keras FCN FCN (Fully Convolutional Network) is deep fully convolutional neural network architecture for semantic pixel-wise segmentation. This

Kento Watanabe 48 Aug 30, 2022
Companion code for the paper Theoretical characterization of uncertainty in high-dimensional linear classification

Companion code for the paper Theoretical characterization of uncertainty in high-dimensional linear classification Usage The required packages are lis

0 Feb 07, 2022
Image Deblurring using Generative Adversarial Networks

DeblurGAN arXiv Paper Version Pytorch implementation of the paper DeblurGAN: Blind Motion Deblurring Using Conditional Adversarial Networks. Our netwo

Orest Kupyn 2.2k Jan 01, 2023
Official PyTorch code for the paper: "Point-Based Modeling of Human Clothing" (ICCV 2021)

Point-Based Modeling of Human Clothing Paper | Project page | Video This is an official PyTorch code repository of the paper "Point-Based Modeling of

Visual Understanding Lab @ Samsung AI Center Moscow 64 Nov 22, 2022
LiDAR Distillation: Bridging the Beam-Induced Domain Gap for 3D Object Detection

LiDAR Distillation Paper | Model LiDAR Distillation: Bridging the Beam-Induced Domain Gap for 3D Object Detection Yi Wei, Zibu Wei, Yongming Rao, Jiax

Yi Wei 75 Dec 22, 2022
Codes for CVPR2021 paper "PWCLO-Net: Deep LiDAR Odometry in 3D Point Clouds Using Hierarchical Embedding Mask Optimization"

PWCLO-Net: Deep LiDAR Odometry in 3D Point Clouds Using Hierarchical Embedding Mask Optimization (CVPR 2021) This is the official implementation of PW

Intelligent Robotics and Machine Vision Lab 42 Dec 18, 2022
Code for PhySG: Inverse Rendering with Spherical Gaussians for Physics-based Relighting and Material Editing

PhySG: Inverse Rendering with Spherical Gaussians for Physics-based Relighting and Material Editing CVPR 2021. Project page: https://kai-46.github.io/

Kai Zhang 141 Dec 14, 2022
Label Hallucination for Few-Shot Classification

Label Hallucination for Few-Shot Classification This repo covers the implementation of the following paper: Label Hallucination for Few-Shot Classific

Yiren Jian 13 Nov 13, 2022
An unopinionated replacement for PyTorch's Dataset and ImageFolder, that handles Tar archives

Simple Tar Dataset An unopinionated replacement for PyTorch's Dataset and ImageFolder classes, for datasets stored as uncompressed Tar archives. Just

Joao Henriques 47 Dec 20, 2022
TEA: A Sequential Recommendation Framework via Temporally Evolving Aggregations

TEA: A Sequential Recommendation Framework via Temporally Evolving Aggregations Requirements python 3.6 torch 1.9 numpy 1.19 Quick Start The experimen

DMIRLAB 4 Oct 16, 2022
Namish Khanna 40 Oct 11, 2022
Robust fine-tuning of zero-shot models

Robust fine-tuning of zero-shot models This repository contains code for the paper Robust fine-tuning of zero-shot models by Mitchell Wortsman*, Gabri

224 Dec 29, 2022
Official repository for "Exploiting Session Information in BERT-based Session-aware Sequential Recommendation", SIGIR 2022 short.

Session-aware BERT4Rec Official repository for "Exploiting Session Information in BERT-based Session-aware Sequential Recommendation", SIGIR 2022 shor

Jamie J. Seol 22 Dec 13, 2022
An efficient PyTorch implementation of the evaluation metrics in recommender systems.

recsys_metrics An efficient PyTorch implementation of the evaluation metrics in recommender systems. Overview • Installation • How to use • Benchmark

Xingdong Zuo 12 Dec 02, 2022
Virtual hand gesture mouse using a webcam

NonMouse 日本語のREADMEはこちら This is an application that allows you to use your hand itself as a mouse. The program uses a web camera to recognize your han

Yuki Takeyama 55 Jan 01, 2023
Python/Rust implementations and notes from Proofs Arguments and Zero Knowledge

What is this? This is where I'll be collecting resources related to the Study Group on Dr. Justin Thaler's Proofs Arguments And Zero Knowledge Book. T

Thor 66 Jan 04, 2023
Pyramid Grafting Network for One-Stage High Resolution Saliency Detection. CVPR 2022

PGNet Pyramid Grafting Network for One-Stage High Resolution Saliency Detection. CVPR 2022, CVPR 2022 (arXiv 2204.05041) Abstract Recent salient objec

CVTEAM 109 Dec 05, 2022
Code for "Multi-Time Attention Networks for Irregularly Sampled Time Series", ICLR 2021.

Multi-Time Attention Networks (mTANs) This repository contains the PyTorch implementation for the paper Multi-Time Attention Networks for Irregularly

The Laboratory for Robust and Efficient Machine Learning 68 Dec 17, 2022
MinkLoc3D-SI: 3D LiDAR place recognition with sparse convolutions,spherical coordinates, and intensity

MinkLoc3D-SI: 3D LiDAR place recognition with sparse convolutions,spherical coordinates, and intensity Introduction The 3D LiDAR place recognition aim

16 Dec 08, 2022