SiT: Self-supervised vIsion Transformer

Last update: Dec 28, 2022

Related tags

Overview

SiT: Self-supervised vIsion Transformer

This repository contains the official PyTorch self-supervised pretraining, finetuning, and evaluation codes for SiT (Self-supervised image Transformer).

The training strategy is adopted from Deit

Usage

Create an environment

conda create -n SiT python=3.8

Activate the environment and install the necessary packages

conda activate SiT

conda install pytorch torchvision torchaudio cudatoolkit=11.0 -c pytorch

pip install -r requirements.txt

Self-supervised pre-training

python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --batch-size 72 --epochs 501 --min-lr 5e-6 --lr 1e-3 --training-mode 'SSL' --data-set 'STL10' --output 'checkpoints/SSL/STL10' --validate-every 10

Finetuning

python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --batch-size 120 --epochs 501 --min-lr 5e-6 --training-mode 'finetune' --data-set 'STL10' --finetune 'checkpoints/SSL/STL10/checkpoint.pth' --output 'checkpoints/finetune/STL10' --validate-every 10

Linear Evaluation

Linear projection Head

python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --batch-size 120 --epochs 501 --lr 1e-3 --weight-decay 5e-4 --min-lr 5e-6 --training-mode 'finetune' --data-set 'STL10' --finetune 'checkpoints/SSL/STL10/checkpoint.pth' --output 'checkpoints/finetune/STL10_LE' --validate-every 10 --SiT_LinearEvaluation 1

2-layer MLP projection Head

python -m torch.distributed.launch --nproc_per_node=4 --use_env main.py --batch-size 120 --epochs 501 --lr 1e-3 --weight-decay 5e-4 --min-lr 5e-6 --training-mode 'finetune' --data-set 'STL10' --finetune 'checkpoints/SSL/STL10/checkpoint.pth' --output 'checkpoints/finetune/STL10_LE_hidden' --validate-every 10 --SiT_LinearEvaluation 1 --representation-size 1024

Note: assign the --dataset_location parameter to the location of the downloaded dataset

If you use this code for a paper, please cite:

@article{atito2021sit,

  title={SiT: Self-supervised vIsion Transformer},

  author={Atito, Sara and Awais, Muhammad and Kittler, Josef},

  journal={arXiv preprint arXiv:2104.03602},

  year={2021}

}

License

This repository is released under the GNU General Public License.

SiT: Self-supervised vIsion Transformer

Related tags

Overview

SiT: Self-supervised vIsion Transformer

Usage

Self-supervised pre-training

Finetuning

Linear Evaluation

License

Owner

Sara Ahmed

Official implementation of the MM'21 paper Constrained Graphic Layout Generation via Latent Optimization

StarGANv2-VC: A Diverse, Unsupervised, Non-parallel Framework for Natural-Sounding Voice Conversion

Code for paper "Vocabulary Learning via Optimal Transport for Neural Machine Translation"

Dark Finix: All in one hacking framework with almost 100 tools

SEJE Pytorch implementation

Emotional conditioned music generation using transformer-based model.

[ICCV'21] Official implementation for the paper Social NCE: Contrastive Learning of Socially-aware Motion Representations

This repository is related to an Arabic tutorial, within the tutorial we discuss the common data structure and algorithms and their worst and best case for each, then implement the code using Python.

How to Train a GAN? Tips and tricks to make GANs work

Revisting Open World Object Detection

Text mining project; Using distilBERT to predict authors in the classification task authorship attribution.

DeiT: Data-efficient Image Transformers

Fast and customizable reconnaissance workflow tool based on simple YAML based DSL.

A PyTorch implementation of "TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?"

An open source bike computer based on Raspberry Pi Zero (W, WH) with GPS and ANT+. Including offline map and navigation.

Bonnet: An Open-Source Training and Deployment Framework for Semantic Segmentation in Robotics.

A flexible and extensible framework for gait recognition.

This repository contains the official MATLAB implementation of the TDA method for reverse image filtering

MINOS: Multimodal Indoor Simulator

meProp: Sparsified Back Propagation for Accelerated Deep Learning (ICML 2017)