Spatial Action Maps for Mobile Manipulation (RSS 2020)

Overview

spatial-action-maps

Update: Please see our new spatial-intention-maps repository, which extends this work to multi-agent settings. It contains many new improvements to the codebase, and while the focus is on multi-agent, it also supports single-agent training.


This code release accompanies the following paper:

Spatial Action Maps for Mobile Manipulation

Jimmy Wu, Xingyuan Sun, Andy Zeng, Shuran Song, Johnny Lee, Szymon Rusinkiewicz, Thomas Funkhouser

Robotics: Science and Systems (RSS), 2020

Project Page | PDF | arXiv | Video

Abstract: Typical end-to-end formulations for learning robotic navigation involve predicting a small set of steering command actions (e.g., step forward, turn left, turn right, etc.) from images of the current state (e.g., a bird's-eye view of a SLAM reconstruction). Instead, we show that it can be advantageous to learn with dense action representations defined in the same domain as the state. In this work, we present "spatial action maps," in which the set of possible actions is represented by a pixel map (aligned with the input image of the current state), where each pixel represents a local navigational endpoint at the corresponding scene location. Using ConvNets to infer spatial action maps from state images, action predictions are thereby spatially anchored on local visual features in the scene, enabling significantly faster learning of complex behaviors for mobile manipulation tasks with reinforcement learning. In our experiments, we task a robot with pushing objects to a goal location, and find that policies learned with spatial action maps achieve much better performance than traditional alternatives.

Installation

We recommend using a conda environment for this codebase. The following commands will set up a new conda environment with the correct requirements (tested on Ubuntu 18.04.3 LTS):

# Create and activate new conda env
conda create -y -n my-conda-env python=3.7
conda activate my-conda-env

# Install pytorch and numpy
conda install -y pytorch==1.2.0 torchvision==0.4.0 cudatoolkit=10.0 -c pytorch
conda install -y numpy=1.17.3

# Install pip requirements
pip install -r requirements.txt

# Install shortest path module (used in simulation environment)
cd spfa
python setup.py install

Quickstart

We provide four pretrained policies, one for each test environment. Use download-pretrained.sh to download them:

./download-pretrained.sh

You can then use enjoy.py to run a trained policy in the simulation environment.

For example, to load the pretrained policy for SmallEmpty, you can run:

python enjoy.py --config-path logs/20200125T213536-small_empty/config.yml

You can also run enjoy.py without specifying a config path, and it will find all policies in the logs directory and allow you to pick one to run:

python enjoy.py

Training in the Simulation Environment

The config/experiments directory contains template config files for all experiments in the paper. To start a training run, you can give one of the template config files to the train.py script. For example, the following will train a policy on the SmallEmpty environment:

python train.py config/experiments/base/small_empty.yml

The training script will create a log directory and checkpoint directory for the new training run inside logs/ and checkpoints/, respectively. Inside the log directory, it will also create a new config file called config.yml, which stores training run config variables and can be used to resume training or to load a trained policy for evaluation.

Simulation Environment

To explore the simulation environment using our proposed dense action space (spatial action maps), you can use the tools_click_agent.py script, which will allow you to click on the local overhead map to select actions and move around in the environment.

python tools_click_agent.py

Evaluation

Trained policies can be evaluated using the evaluate.py script, which takes in the config path for the training run. For example, to evaluate the SmallEmpty pretrained policy, you can run:

python evaluate.py --config-path logs/20200125T213536-small_empty/config.yml

This will load the trained policy from the specified training run, and run evaluation on it. The results are saved to an .npy file in the eval directory. You can then run jupyter notebook and navigate to eval_summary.ipynb to load the .npy files and generate tables and plots of the results.

Running in the Real Environment

We train policies in simulation and run them directly on the real robot by mirroring the real environment inside the simulation. To do this, we first use ArUco markers to estimate 2D poses of robots and cubes in the real environment, and then use the estimated poses to update the simulation. Note that setting up the real environment, particularly the marker pose estimation, can take a fair amount of time and effort.

Vector SDK Setup

If you previously ran pip install -r requirements.txt following the installation instructions above, the anki_vector library should already be installed. Run the following command to set up each robot you plan to use:

python -m anki_vector.configure

After the setup is complete, you can open the Vector config file located at ~/.anki_vector/sdk_config.ini to verify that all of your robots are present.

You can also run some of the official examples to verify that the setup procedure worked. For further reference, please see the Vector SDK documentation.

Connecting to the Vector

The following command will try to connect to all the robots in your Vector config file and keep them still. It will print out a message for each robot it successfully connects to, and can be used to verify that the Vector SDK can connect to all of your robots.

python vector_keep_still.py

Note: If you get the following error, you will need to make a small fix to the anki_vector library.

AttributeError: module 'anki_vector.connection' has no attribute 'CONTROL_PRIORITY_LEVEL'

Locate the anki_vector/behavior.py file inside your installed conda libraries. The full path should be in the error message. At the bottom of anki_vector/behavior.py, change connection.CONTROL_PRIORITY_LEVEL.RESERVE_CONTROL to connection.ControlPriorityLevel.RESERVE_CONTROL.


Sometimes the IP addresses of your robots will change. To update the Vector config file with the new IP addresses, you can run the following command:

python vector_run_mdns.py

The script uses mDNS to find all Vector robots on the local network, and will automatically update their IP addresses in the Vector config file. It will also print out the hostname, IP address, and MAC address of every robot found. Make sure zeroconf is installed (pip install zeroconf) or mDNS may not work well. Alternatively, you can just open the Vector config file at ~/.anki_vector/sdk_config.ini in a text editor and manually update the IP addresses.

Controlling the Vector

The vector_keyboard_controller.py script is adapted from the remote control example in the official SDK, and can be used to verify that you are able to control the robot using the Vector SDK. Use it as follows:

python vector_keyboard_controller.py --robot-index ROBOT_INDEX

The --robot-index argument specifies the robot you wish to control and refers to the index of the robot in the Vector config file (~/.anki_vector/sdk_config.ini). If no robot index is specified, the script will check all robots in the Vector config file and select the first robot it is able to connect to.

3D Printed Parts

The real environment setup contains some 3D printed parts. We used the Sindoh 3DWOX 1 3D printer to print them, but other printers should work too. We used PLA filament. All 3D model files are in the stl directory:

  • cube.stl: 3D model for the cubes (objects)
  • blade.stl: 3D model for the bulldozer blade attached to the front of the robot
  • board-corner.stl: 3D model for the board corners, which are used for pose estimation with ArUco markers

Running Trained Policies on the Real Robot

First see the aruco directory for instructions on setting up pose estimation with ArUco markers.

Once the setup is completed, make sure the pose estimation server is started before proceeding:

cd aruco
python server.py

The vector_click_agent.py script is analogous to tools_click_agent.py, and allows you to click on the local overhead map to control the real robot. The script is also useful for verifying that all components of the real environment setup are working correctly, including pose estimation and robot control. The simulation environment should mirror the real setup with millimeter-level precision. You can start it using the following command:

python vector_click_agent.py --robot-index ROBOT_INDEX

If the poses in the simulation do not look correct, you can restart the pose estimation server with the --debug flag to enable debug visualizations:

cd aruco
python server.py --debug

Once you have verified that manual control with vector_click_agent.py works, you can then run a trained policy using the vector_enjoy.py script. For example, to load the SmallEmpty pretrained policy, you can run:

python vector_enjoy.py --robot-index ROBOT_INDEX --config-path logs/20200125T213536-small_empty/config.yml

Citation

If you find this work useful for your research, please consider citing:

@inproceedings{wu2020spatial,
  title = {Spatial Action Maps for Mobile Manipulation},
  author = {Wu, Jimmy and Sun, Xingyuan and Zeng, Andy and Song, Shuran and Lee, Johnny and Rusinkiewicz, Szymon and Funkhouser, Thomas},
  booktitle = {Proceedings of Robotics: Science and Systems (RSS)},
  year = {2020}
}
ColBERT: Contextualized Late Interaction over BERT (SIGIR'20)

Update: if you're looking for ColBERTv2 code, you can find it alongside a new simpler API, in the branch new_api. ColBERT ColBERT is a fast and accura

Stanford Future Data Systems 637 Jan 08, 2023
An OpenAI Gym environment for Super Mario Bros

gym-super-mario-bros An OpenAI Gym environment for Super Mario Bros. & Super Mario Bros. 2 (Lost Levels) on The Nintendo Entertainment System (NES) us

Andrew Stelmach 1 Jan 05, 2022
DSAC* for Visual Camera Re-Localization (RGB or RGB-D)

DSAC* for Visual Camera Re-Localization (RGB or RGB-D) Introduction Installation Data Structure Supported Datasets 7Scenes 12Scenes Cambridge Landmark

Visual Learning Lab 143 Dec 22, 2022
Tensorflow 2.x based implementation of EDSR, WDSR and SRGAN for single image super-resolution

Single Image Super-Resolution with EDSR, WDSR and SRGAN A Tensorflow 2.x based implementation of Enhanced Deep Residual Networks for Single Image Supe

Martin Krasser 1.3k Jan 06, 2023
Riemannian Convex Potential Maps

Modeling distributions on Riemannian manifolds is a crucial component in understanding non-Euclidean data that arises, e.g., in physics and geology. The budding approaches in this space are limited b

Facebook Research 61 Nov 28, 2022
Code for "Typilus: Neural Type Hints" PLDI 2020

Typilus A deep learning algorithm for predicting types in Python. Please find a preprint here. This repository contains its implementation (src/) and

47 Nov 08, 2022
Prototypical Networks for Few shot Learning in PyTorch

Prototypical Networks for Few shot Learning in PyTorch Simple alternative Implementation of Prototypical Networks for Few Shot Learning (paper, code)

Orobix 835 Jan 08, 2023
OpenMMLab Computer Vision Foundation

English | 简体中文 Introduction MMCV is a foundational library for computer vision research and supports many research projects as below: MMCV: OpenMMLab

OpenMMLab 4.6k Jan 09, 2023
This is a repository for a semantic segmentation inference API using the OpenVINO toolkit

BMW-IntelOpenVINO-Segmentation-Inference-API This is a repository for a semantic segmentation inference API using the OpenVINO toolkit. It's supported

BMW TechOffice MUNICH 34 Nov 24, 2022
Implementation for the EMNLP 2021 paper "Interactive Machine Comprehension with Dynamic Knowledge Graphs".

Interactive Machine Comprehension with Dynamic Knowledge Graphs Implementation for the EMNLP 2021 paper. Dependencies apt-get -y update apt-get instal

Xingdi (Eric) Yuan 19 Aug 23, 2022
Pyramid Pooling Transformer for Scene Understanding

Pyramid Pooling Transformer for Scene Understanding Requirements: torch 1.6+ torchvision 0.7.0 timm==0.3.2 Validated on torch 1.6.0, torchvision 0.7.0

Yu-Huan Wu 119 Dec 29, 2022
NVTabular is a feature engineering and preprocessing library for tabular data designed to quickly and easily manipulate terabyte scale datasets used to train deep learning based recommender systems.

NVTabular is a feature engineering and preprocessing library for tabular data designed to quickly and easily manipulate terabyte scale datasets used to train deep learning based recommender systems.

880 Jan 07, 2023
GazeScroller - Using Facial Movements to perform Hands-free Gesture on the system

GazeScroller Using Facial Movements to perform Hands-free Gesture on the system

2 Jan 05, 2022
This project aims at providing a concise, easy-to-use, modifiable reference implementation for semantic segmentation models using PyTorch.

Semantic Segmentation on PyTorch (include FCN, PSPNet, Deeplabv3, Deeplabv3+, DANet, DenseASPP, BiSeNet, EncNet, DUNet, ICNet, ENet, OCNet, CCNet, PSANet, CGNet, ESPNet, LEDNet, DFANet)

2.4k Jan 08, 2023
This repository contains the code for our paper VDA (public in EMNLP2021 main conference)

Virtual Data Augmentation: A Robust and General Framework for Fine-tuning Pre-trained Models This repository contains the code for our paper VDA (publ

RUCAIBox 13 Aug 06, 2022
An open-source online reverse dictionary.

An open-source online reverse dictionary.

THUNLP 6.3k Jan 09, 2023
Pytorch implementation of OCNet series and SegFix.

openseg.pytorch News 2021/09/14 MMSegmentation has supported our ISANet and refer to ISANet for more details. 2021/08/13 We have released the implemen

openseg-group 1.1k Dec 23, 2022
Project code for weakly supervised 3D object detectors using wide-baseline multi-view traffic camera data: WIBAM.

WIBAM (Work in progress) Weakly Supervised Training of Monocular 3D Object Detectors Using Wide Baseline Multi-view Traffic Camera Data 3D object dete

Matthew Howe 10 Aug 24, 2022
Code for the paper "Attention Approximates Sparse Distributed Memory"

Attention Approximates Sparse Distributed Memory - Codebase This is all of the code used to run analyses in the paper "Attention Approximates Sparse D

Trenton Bricken 14 Dec 05, 2022
The official implementation of paper Siamese Transformer Pyramid Networks for Real-Time UAV Tracking, accepted by WACV22

SiamTPN Introduction This is the official implementation of the SiamTPN (WACV2022). The tracker intergrates pyramid feature network and transformer in

Robotics and Intelligent Systems Control @ NYUAD 29 Jan 08, 2023