PyTorch Implementation of Spatially Consistent Representation Learning(SCRL)

Related tags

Deep Learningscrl
Overview

KakaoBrain pytorch pytorch

Spatially Consistent Representation Learning (CVPR'21)

Abstract

SCRL is a self-supervised learning method that allows you to obtain a spatially consistent dense representation, especially useful for localization tasks such as object detection and segmentation. You might be able to improve your own localization task by simply initializing the backbone model with those parameters trained using our method. Please refer to the paper for more details.

Requirements

We have tested the code on the following environments:

  • Python 3.7.7 / Pytorch 1.6.0 / torchvision 0.7.0 / CUDA 10.1 / Ubuntu 18.04
  • Python 3.8.3 / Pytorch 1.7.1 / torchvision 0.8.2 / CUDA 11.1 / Ubuntu 18.04

Run the following command to install dependencies:

pip install -r requirements.txt

Configuration

The config directory contains predefined YAML configuration files for different learning methods and schedules.

There are two alternative ways to change training configuration:

  • Change the value of the field of interest directly in config/*.yaml files.
  • Pass the value as a command argument to main.py with the name that represents the field's hierarchy using the delimiter /.

Note that the values given by the command argument take precedence over the ones stored in the YAML file.

Refer to the Dataset subsection below for an example.

Model

We officially support ResNet-50 and ResNet-101 backbones. Note that all the hyperparameters have been tuned based on ResNet-50.

Dataset

Currently, only ImageNet dataset is supported. You must specify the dataset path in one of the following ways:

# (option 1) set field `dataset.root` in the YAML configuration file.
data:
  root: your_own_path
  ...
# (option 2) pass `--dataset/root` as an argument of the shell command.
$ python main.py --data/root your_own_path ...

How to Run

Overview

The code consists of two main parts: self-supervised pre-training(upstream) and linear evaluation protocol(downstream). After the pre-training is done, the evaluation protocol is automatically run by the default configurations. Note that, for simplicity, this code does not hold out validation set from train set as in BYOL paper. Although this may not be a rigorous implementation, training the linear classifier itself is not in our main interest here and the protocol still does its job well enough as an evaluation metric. Also note that the protocol may not be exactly the same with the details in other literatures in terms of batch size and learning rate schedule.

Resource & Batchsize

We recommend you run the code in a multi-node environment in order to reproduce the results reported in the paper. We used 4 V100 x 8 nodes = 32 GPUs for training. Under this circumstance, our upstream batch size is 8192 and downstream batch size is 4096. When the number of GPUs dwindled to half, we observed performance degradation. Although BYOL-like methods do not use negative examples in an explicit way, they can still suffer performance drops when their batch size is reduced, as illustrated in Figure 3 in BYOL paper.

Single-node Training

If your YAML configuration file is ./confing/scrl_200ep.yaml, run the command as follows:

$ python main.py --config config/scrl_200_ep.yaml

Multi-node Training

To train a single model with 2 nodes, for instance, run the commands below in sequence:

# on the machine #0
$ python main.py --config config/scrl_200_ep.yaml \
                 --dist_url tcp://{machine_0_ip}:{available_port} \
                 --num_machines 2 \
                 --machine_rank 0
# on the machine #1
$ python main.py --config config/scrl_200_ep.yaml \
                 --dist_url tcp://{machine_1_ip}:{available_port} \
                 --num_machines 2 \
                 --machine_rank 1

If IP address and port number are not known in advance, you can first run the command with --dist_url=auto in the master node. Then check the IP address and available port number that are printed on the command line, to which you should refer to launch the other nodes.

Linear Evaluation Only

You can also run the protocol using any given checkpoint on a stand-alone basis. You can evaluate the latest checkpoint anytime after the very first checkpoint has been dumped. Refer to the following command:

$ python main.py --config config/scrl_200ep.yaml --train/enabled=False --load_dir ...

Saving & Loading Checkpoints

Saved Filenames

  • save_dir will be automatically determined(with sequential number suffixes) unless otherwise designated.
  • Model's checkpoints are saved in ./{save_dir}/checkpoint_{epoch}.pth.
  • Symlinks of the last checkpoints are saved in ./{save_dir}/checkpoint_last.pth.

Automatic Loading

  • SCRLTraniner will automatically load checkpoint_last.pth if it exists in save_dir.
  • By default, save_dir is identical to load_dir. However, you can also set load_dir seperately.

Results

Method Epoch Linear Eval (Acc.) COCO BBox (AP) COCO Segm (AP) Checkpoint
Random -- --  /   --   29.77 / 30.95 28.70 --
IN-sup. 90 --  /  74.3 38.52 / 39.00 35.44 --
BYOL 200 72.90 / 73.14 38.35 / 38.86 35.96 download
BYOL 1000 74.47 / 74.51 40.10 / 40.19 37.16 download
SCRL 200 66.78 / 68.27 40.49 / 41.02 37.50 download
SCRL 1000 70.67 / 70.66 40.92 / 41.40 37.92 download
  • Epoch: self-supervised pre-training epochs
  • Linear Eval(linear evaluation): online / offline
  • COCO Bbox(Object Detection): Faster R-CNN w.FPN / Mask R-CNN w.FPN
  • COCO Segm(Instance Segmentation): Mask R-CNN w.FPN

On/offline Linear Evaluation

The online evaluation is done with a linear classifier attached to the top of the backbone, which is trained simultaneously during pre-training under its own objective but does NOT backpropagate the gradients to the main model. This facilitates monitoring of the learning progress while there is no target performance measure for self-supervised learning. The offline evaluation refers to the commonly known standard protocol.

COCO Localization Tasks

As you can see in the table, our method significantly boosts the performance of the localization downstream tasks. Note that the values in the table can be slightly different from the paper because of different random seeds. For the COCO localization tasks, we used Detectron2 repository publicly available, which is not included in this code. We simply initialized the ResNet-50 backbone with pre-trained parameters by our method and finetuned it under the downstream objective. Note that we used synchronized BN for all the configurable layers and retained the default hyperparameters for everything else including the training schedule(x1).

Downloading Pretrained Checkpoints

You can download the backbone checkpoints via those links in the table. To load them in our code and run the linear evaluation, change the filename to checkpoint_last.pth (or make symlink) and pass the parent directory path to either save_dir or load_dir.

Hyperparameters

For the 1000 epoch checkpoint of SCRL, we simply used the hyperparameters adopted in the official BYOL implementation, which is different from the description in the paper, but still matches the reported performance. As described in the Appendix in the paper, there could be potential room for improvement by extensive hyperparameter search.

Citation

If you use this code for your research, please cite our paper.

@inproceedings{roh2021scrl,
  title={Spatilly Consistent Representation Learning},
  author    = {Byungseok Roh and
               Wuhyun Shin and
               Ildoo Kim and
               Sungwoong Kim},
  booktitle = {CVPR},
  publisher = {IEEE},
  year      = {2021}
}

Contact for Issues

License

This project is licensed under the terms of the Apache License 2.0.

Copyright 2021 Kakao Brain Corp. https://www.kakaobrain.com All Rights Reserved.

Owner
Kakao Brain
Kakao Brain Corp.
Kakao Brain
All course materials for the Zero to Mastery Deep Learning with TensorFlow course.

All course materials for the Zero to Mastery Deep Learning with TensorFlow course.

Daniel Bourke 3.4k Jan 07, 2023
Code for the paper Learning the Predictability of the Future

Learning the Predictability of the Future Code from the paper Learning the Predictability of the Future. Website of the project in hyperfuture.cs.colu

Computer Vision Lab at Columbia University 139 Nov 18, 2022
deep learning for image processing including classification and object-detection etc.

深度学习在图像处理中的应用教程 前言 本教程是对本人研究生期间的研究内容进行整理总结,总结的同时也希望能够帮助更多的小伙伴。后期如果有学习到新的知识也会与大家一起分享。 本教程会以视频的方式进行分享,教学流程如下: 1)介绍网络的结构与创新点 2)使用Pytorch进行网络的搭建与训练 3)使用Te

WuZhe 13.6k Jan 04, 2023
Planar Prior Assisted PatchMatch Multi-View Stereo

ACMP [News] The code for ACMH is released!!! [News] The code for ACMM is released!!! About This repository contains the code for the paper Planar Prio

Qingshan Xu 127 Dec 31, 2022
Codes for NeurIPS 2021 paper "On the Equivalence between Neural Network and Support Vector Machine".

On the Equivalence between Neural Network and Support Vector Machine Codes for NeurIPS 2021 paper "On the Equivalence between Neural Network and Suppo

Leslie 8 Oct 25, 2022
[CVPR 2021] Region-aware Adaptive Instance Normalization for Image Harmonization

RainNet — Official Pytorch Implementation Region-aware Adaptive Instance Normalization for Image Harmonization Jun Ling, Han Xue, Li Song*, Rong Xie,

130 Dec 11, 2022
Code for MSc Quantitative Finance Dissertation

MSc Dissertation Code ReadMe Sector Volatility Prediction Performance Using GARCH Models and Artificial Neural Networks Curtis Nybo MSc Quantitative F

2 Dec 01, 2022
A baseline code for VSPW

A baseline code for VSPW Preparation Download VSPW dataset The VSPW dataset with extracted frames and masks is available here.

28 Aug 22, 2022
A rough implementation of the paper "A Steering Algorithm for Redirected Walking Using Reinforcement Learning"

A rough implementation of the paper "A Steering Algorithm for Redirected Walking Using Reinforcement Learning"

Somnus `Chen 2 Jun 09, 2022
MinkLoc3D-SI: 3D LiDAR place recognition with sparse convolutions,spherical coordinates, and intensity

MinkLoc3D-SI: 3D LiDAR place recognition with sparse convolutions,spherical coordinates, and intensity Introduction The 3D LiDAR place recognition aim

16 Dec 08, 2022
Learning to Adapt Structured Output Space for Semantic Segmentation, CVPR 2018 (spotlight)

Learning to Adapt Structured Output Space for Semantic Segmentation Pytorch implementation of our method for adapting semantic segmentation from the s

Yi-Hsuan Tsai 782 Dec 30, 2022
PyAF is an Open Source Python library for Automatic Time Series Forecasting built on top of popular pydata modules.

PyAF (Python Automatic Forecasting) PyAF is an Open Source Python library for Automatic Forecasting built on top of popular data science python module

CARME Antoine 405 Jan 02, 2023
Embracing Single Stride 3D Object Detector with Sparse Transformer

SST: Single-stride Sparse Transformer This is the official implementation of paper: Embracing Single Stride 3D Object Detector with Sparse Transformer

TuSimple 385 Dec 28, 2022
Company clustering with K-means/GMM and visualization with PCA, t-SNE, using SSAN relation extraction

RE results graph visualization and company clustering Installation pip install -r requirements.txt python -m nltk.downloader stopwords python3.7 main.

Jieun Han 1 Oct 06, 2022
ZSL-KG is a general-purpose zero-shot learning framework with a novel transformer graph convolutional network (TrGCN) to learn class representation from common sense knowledge graphs.

ZSL-KG is a general-purpose zero-shot learning framework with a novel transformer graph convolutional network (TrGCN) to learn class representa

Bats Research 94 Nov 21, 2022
CLIP: Connecting Text and Image (Learning Transferable Visual Models From Natural Language Supervision)

CLIP (Contrastive Language–Image Pre-training) Experiments (Evaluation) Model Dataset Acc (%) ViT-B/32 (Paper) CIFAR100 65.1 ViT-B/32 (Our) CIFAR100 6

Myeongjun Kim 52 Jan 07, 2023
AgML is a comprehensive library for agricultural machine learning

AgML is a comprehensive library for agricultural machine learning. Currently, AgML provides access to a wealth of public agricultural datasets for common agricultural deep learning tasks.

Plant AI and Biophysics Lab 1 Jul 07, 2022
Adaptive Pyramid Context Network for Semantic Segmentation (APCNet CVPR'2019)

Adaptive Pyramid Context Network for Semantic Segmentation (APCNet CVPR'2019) Introduction Official implementation of Adaptive Pyramid Context Network

21 Nov 09, 2022
This repository accompanies our paper “Do Prompt-Based Models Really Understand the Meaning of Their Prompts?”

This repository accompanies our paper “Do Prompt-Based Models Really Understand the Meaning of Their Prompts?” Usage To replicate our results in Secti

Albert Webson 64 Dec 11, 2022
a morph transfer UGATIT for image translation.

Morph-UGATIT a morph transfer UGATIT for image translation. Introduction 中文技术文档 This is Pytorch implementation of UGATIT, paper "U-GAT-IT: Unsupervise

55 Nov 14, 2022