TrTr: Visual Tracking with Transformer

Related tags

Deep LearningTrTr
Overview

TrTr: Visual Tracking with Transformer

We propose a novel tracker network based on a powerful attention mechanism called Transformer encoder-decoder architecture to gain global and rich contextual interdependencies. In this new architecture, features of the template image is processed by a self-attention module in the encoder part to learn strong context information, which is then sent to the decoder part to compute cross-attention with the search image features processed by another self-attention module. In addition, we design the classification and regression heads using the output of Transformer to localize target based on shape-agnostic anchor. We extensively evaluate our tracker TrTr, on several benchmarks and our method performs favorably against state-of-the-art algorithms.

Network architecture of TrTr for visual tracking

Installation

Install dependencies

$ ./install.sh ~/anaconda3 trtr 

note1: suppose you have the anaconda installation path under ~/anaconda3.

note2: please select a proper cuda-toolkit version to install Pytorch from conda, the default is 10.1. However, for RTX3090, please select 11.0. Then the above installation command would be $ ./install.sh ~/anaconda3 trtr 11.0.

Activate conda environment

$ conda activate trtr

Quick Start: Using TrTr

Webcam demo

Offline Model

$ python demo.py --tracker.checkpoint networks/trtr_resnet50.pth --use_baseline_tracker

Online Model

$ python demo.py --tracker.checkpoint networks/trtr_resnet50.pth

image sequences (png, jpeg)

add option --video_name ${video_dir}

video (mp4 or avi)

add option --video_name ${video_name}

Benchmarks

Download testing datasets

Please read this README.md to prepare the dataset.

Basic usage

Test tracker

$ cd benchmark
$ python test.py --cfg_file ../parameters/experiment/vot2018/offline.yaml
  • --cfg_file: the yaml file containing the hyper-parameter for each datasets. Please check ./benchmark/parameters/experiment for more yaml files
    • online model for VOT2018: python test.py --cfg_file ../parameters/experiment/vot2018/online.yaml
    • online model for OTB: python test.py --cfg_file ../parameters/experiment/otb/online.yaml
  • --result_path: optional parameter to specify a directory to store the tracking result. Default value is results, which generate ./benchmark/results/${dataset_name}
  • --model_name: optional parameter to specify the name of tracker name under the result path. Default value is trtr, which yield a tracker directory of ./benchmark/results/${dataset_name}/trtr
  • --vis: visualize tracking
  • --repetition: repeat number. For example, you should assign --repetition 15 for VOT benchmark following the official evaluation.

Eval tracker

$ cd benchmark
$ python eval.py
  • --dataset: parameter to specify the benchmark. Default value is VOT2018. Please assign other bench name, e.g., OTB, VOT2019, UAV, etc.
  • --tracker_path: parameter to specify the result directory. Default value is ./benchmark/results. This is a parameter related to --result_path parameter in python test.py.
  • --num: parameter to specify the thread number for evaluation multiple tracker results. Default is 1.

(Option) Hyper-parameter search

$ python hp_search.py --tracker.checkpoint ../networks/trtr_resnet50.pth --tracker.search_sizes 280 --separate --repetition 1  --use_baseline_tracker --tracker.model.transformer_mask True

Train

Download training datasets

Please read this README.md to prepare the training dataset.

Download VOT2018 dataset

  1. Please download VOT2018 dataset following [this REAMDE], which is necessary for testing the model during training.
  2. Or you skip this testing process by assigning several parameter, which are explained later.

Test with single GPU

$ python main.py  --cfg_file ./parameters/train/default.yaml --output_dir train

note1: please check ./parameters/train/default.yaml for the parameters for training note2: --output_dir to assign the path to store the training result. The above commmand genearte ./train note3: maybe you have to modify the file limit: ulimit -n 8192. Write in ~/.bashrc maybe better. note4: you can a larger value for --benchmark_start_epoch than for --epochs to skip benchmark test. e.g., --benchmark_start_epoch 21 and --epochs 20

debug mode for quick checking the training process:

$ python main.py  --cfg_file ./parameters/train/default.yaml  --batch_size 16 --dataset.paths ./datasets/yt_bb/dataset/Curation  ./datasets/vid/dataset/Curation/ --dataset.video_frame_ranges 3 100  --dataset.num_uses 100 100  --dataset.eval_num_uses 100 100  --resume networks/trtr_resnet50.pth --benchmark_start_epoch 0 --epochs 10

Multi GPUs

multi GPUs in single machine

$ python -m torch.distributed.launch --nproc_per_node=2 --use_env main.py --cfg_file ./parameters/train/default.yaml --output_dir train

--nproc_per_node: is the number of GPU to use. The above command means use two GPUs in a machine.

multi GPUs in multi machines

Master Machine

$ python -m torch.distributed.launch --nproc_per_node=2 --nnodes=2 --node_rank=0 --master_addr="${MASTER_IP_ADDRESS}" --master_port=${port} --use_env main.py --cfg_file ./parameters/train/default.yaml --output_dir train  --benchmark_start_epoch 8
  • --nnodes: number of machine to use. The above command means two machines.
  • --node_rank: the id for each machine. Master should be 0.
  • master_addr: assign the IP address of master machine
  • master_port: open port (e.g., 8080)

Slave1 Machine

$ python -m torch.distributed.launch --nproc_per_node=2 --nnodes=2 --node_rank=1 --master_addr="${MASTER_IP_ADDRESS}" --master_port=${port} --use_env main.py --cfg_file ./parameters/train/default.yaml
Owner
趙 漠居(Zhao, Moju)
Project Lecture in the Uiversity of Tokyo.
趙 漠居(Zhao, Moju)
Crossover Learning for Fast Online Video Instance Segmentation (ICCV 2021)

TL;DR: CrossVIS (Crossover Learning for Fast Online Video Instance Segmentation) proposes a novel crossover learning paradigm to fully leverage rich c

Hust Visual Learning Team 79 Nov 25, 2022
Implementation of the master's thesis "Temporal copying and local hallucination for video inpainting".

Temporal copying and local hallucination for video inpainting This repository contains the implementation of my master's thesis "Temporal copying and

David Álvarez de la Torre 1 Dec 02, 2022
KDD CUP 2020 Automatic Graph Representation Learning: 1st Place Solution

KDD CUP 2020: AutoGraph Team: aister Members: Jianqiang Huang, Xingyuan Tang, Mingjian Chen, Jin Xu, Bohang Zheng, Yi Qi, Ke Hu, Jun Lei Team Introduc

96 May 30, 2022
This project aim to create multi-label classification annotation tool to boost annotation speed and make it more easier.

This project aim to create multi-label classification annotation tool to boost annotation speed and make it more easier.

4 Aug 02, 2022
OpenMMLab Video Perception Toolbox. It supports Video Object Detection (VID), Multiple Object Tracking (MOT), Single Object Tracking (SOT), Video Instance Segmentation (VIS) with a unified framework.

English | 简体中文 Documentation: https://mmtracking.readthedocs.io/ Introduction MMTracking is an open source video perception toolbox based on PyTorch.

OpenMMLab 2.7k Jan 08, 2023
Using Convolutional Neural Networks (CNN) for Semantic Segmentation of Breast Cancer Lesions (BRCA)

Using Convolutional Neural Networks (CNN) for Semantic Segmentation of Breast Cancer Lesions (BRCA). Master's thesis documents. Bibliography, experiments and reports.

Erick Cobos 73 Dec 04, 2022
The code of paper 'Learning to Aggregate and Personalize 3D Face from In-the-Wild Photo Collection'

Learning to Aggregate and Personalize 3D Face from In-the-Wild Photo Collection Pytorch implemetation of paper 'Learning to Aggregate and Personalize

Tencent YouTu Research 136 Dec 29, 2022
A simple and extensible library to create Bayesian Neural Network layers on PyTorch.

Blitz - Bayesian Layers in Torch Zoo BLiTZ is a simple and extensible library to create Bayesian Neural Network Layers (based on whats proposed in Wei

Pi Esposito 722 Jan 08, 2023
Tidy interface to polars

tidypolars tidypolars is a data frame library built on top of the blazingly fast polars library that gives access to methods and functions familiar to

Mark Fairbanks 144 Jan 08, 2023
Paper list of log-based anomaly detection

Paper list of log-based anomaly detection

Weibin Meng 411 Dec 05, 2022
This repository contains various models targetting multimodal representation learning, multimodal fusion for downstream tasks such as multimodal sentiment analysis.

Multimodal Deep Learning 🎆 🎆 🎆 Announcing the multimodal deep learning repository that contains implementation of various deep learning-based model

Deep Cognition and Language Research (DeCLaRe) Lab 398 Dec 30, 2022
Selecting Parallel In-domain Sentences for Neural Machine Translation Using Monolingual Texts

DataSelection-NMT Selecting Parallel In-domain Sentences for Neural Machine Translation Using Monolingual Texts Quick update: The paper got accepted o

Javad Pourmostafa 6 Jan 07, 2023
PyTorch implementation of MulMON

MulMON This repository contains a PyTorch implementation of the paper: Learning Object-Centric Representations of Multi-object Scenes from Multiple Vi

NanboLi 16 Nov 03, 2022
An open source Python package for plasma science that is under development

PlasmaPy PlasmaPy is an open source, community-developed Python 3.7+ package for plasma science. PlasmaPy intends to be for plasma science what Astrop

PlasmaPy 444 Jan 07, 2023
Pipeline for employing a Lightweight deep learning models for LOW-power systems

PL-LOW A high-performance deep learning model lightweight pipeline that gradually lightens deep neural networks in order to utilize high-performance d

POSTECH Data Intelligence Lab 9 Aug 13, 2022
Pytorch implementation of Deep Recursive Residual Network for Super Resolution (DRRN)

DRRN-pytorch This is an unofficial implementation of "Deep Recursive Residual Network for Super Resolution (DRRN)", CVPR 2017 in Pytorch. [Paper] You

yun_yang 192 Dec 12, 2022
Tensorflow implementation of soft-attention mechanism for video caption generation.

SA-tensorflow Tensorflow implementation of soft-attention mechanism for video caption generation. An example of soft-attention mechanism. The attentio

Paul Chen 153 Nov 14, 2022
PyTorch implementation of Deformable Convolution

PyTorch implementation of Deformable Convolution !!!Warning: There is some issues in this implementation and this repo is not maintained any more, ple

Wei Ouyang 893 Dec 18, 2022
An addernet CUDA version

Training addernet accelerated by CUDA Usage cd adder_cuda python setup.py install cd .. python main.py Environment pytorch 1.10.0 CUDA 11.3 benchmark

LingXY 4 Jun 20, 2022
DL & CV-based indicator toolset for the vehicle drivers via live dash-cam footage.

Vehicle Indicator Toolset Deep Learning and Computer Vision based indicator toolset for vehicle drivers using live dash-cam footages. Tracking of vehi

Alex Xu 12 Dec 28, 2021