Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

Last update: Dec 23, 2022

Overview

Transformer in Transformer

Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch.

Install

$ pip install transformer-in-transformer

Usage

import torch
from transformer_in_transformer import TNT

tnt = TNT(
    image_size = 256,       # size of image
    patch_dim = 512,        # dimension of patch token
    pixel_dim = 24,         # dimension of pixel token
    patch_size = 16,        # patch size
    pixel_size = 4,         # pixel size
    depth = 6,              # depth
    num_classes = 1000,     # output number of classes
    attn_dropout = 0.1,     # attention dropout
    ff_dropout = 0.1        # feedforward dropout
)

img = torch.randn(2, 3, 256, 256)
logits = tnt(img) # (2, 1000)

Citations

@misc{han2021transformer,
    title   = {Transformer in Transformer}, 
    author  = {Kai Han and An Xiao and Enhua Wu and Jianyuan Guo and Chunjing Xu and Yunhe Wang},
    year    = {2021},
    eprint  = {2103.00112},
    archivePrefix = {arXiv},
    primaryClass = {cs.CV}
}

Comments

Only works if pixel_size**2 == patch_size?

Hi, is this only supposed to work if

pixel_size**2 == patch_size

?. When setting the patch_size to any number that doesn't fulfill the equation this error occurs:

--> 146         pixels += rearrange(self.pixel_pos_emb, 'n d -> () n d')
    147 
    148         for pixel_attn, pixel_ff, pixel_to_patch_residual, patch_attn, patch_ff in self.layers:

RuntimeError: The size of tensor a (4) must match the size of tensor b (64) at non-singleton dimension 1

The error came when running

tnt = TNT(
    image_size = 128,       # size of image
    patch_dim = 256,        # dimension of patch token
    pixel_dim = 24,         # dimension of pixel token
    patch_size = 16,        # patch size
    pixel_size = 2,         # pixel size
    depth = 6,              # depth
    heads = 1,
    num_classes = 2,     # output number of classes
    attn_dropout = 0.1,     # attention dropout
    ff_dropout = 0.1        # feedforward dropout,
)
img = torch.randn(2, 3, 128, 128)
logits = tnt(img)

Since I am completely new to einops its quite hard for me to debug :D Thanks

opened by PhilippMarquardt 1

Not sure what is wrong!

RuntimeError Traceback (most recent call last) in 14 15 img = torch.randn(1, 3, 256, 256) ---> 16 logits = tnt(img) # (2, 1000)

~/opt/anaconda3/envs/ml/lib/python3.8/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs) 1108 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks 1109 or _global_forward_hooks or _global_forward_pre_hooks): -> 1110 return forward_call(*input, **kwargs) 1111 # Do not call functions when jit is used 1112 full_backward_hooks, non_full_backward_hooks = [], []

~/opt/anaconda3/envs/ml/lib/python3.8/site-packages/transformer_in_transformer/tnt.py in forward(self, x) 159 patches = repeat(self.patch_tokens[:(n + 1)], 'n d -> b n d', b = b) 160 --> 161 patches += rearrange(self.patch_pos_emb[:(n + 1)], 'n d -> () n d') 162 pixels += rearrange(self.pixel_pos_emb, 'n d -> () n d') 163

RuntimeError: a view of a leaf Variable that requires grad is being used in an in-place operation.

opened by RisabBiswas 0

patch_tokens vs patch_pos_emb

Hi!

I'm trying to understand your TNT implementation and one thing that got me a bit confused is why there are 2 parameters patch_tokens and patch_pos_emb which seems to have the same purpose - to encode patch position. Isn't one of them redundant?

self.patch_tokens = nn.Parameter(torch.randn(num_patch_tokens + 1, patch_dim))
self.patch_pos_emb = nn.Parameter(torch.randn(num_patch_tokens + 1, patch_dim))
...
patches = repeat(self.patch_tokens[:(n + 1)], 'n d -> b n d', b = b)
patches += rearrange(self.patch_pos_emb[:(n + 1)], 'n d -> () n d')

opened by stas-sl 0

Inconsistent model params with MindSpore src code
There's no function or readme description of TNT-S/TNT-B model in this codebase. Something like :

def tnt_b(num_class): return TNT(img_size=384, patch_size=16, num_channels=3, embedding_dim=640, num_heads=10, num_layers=12, hidden_dim=640*4, stride=4, num_class=num_class)

And heads number of inner block should be 4.... https://github.com/lucidrains/transformer-in-transformer/blob/main/transformer_in_transformer/tnt.py#L135

Wondering if anyone reproduce the paper reported results with this codebase??
opened by WongChen 0
Why the loss become NaN?

It is a great project. I am very interested in Transformer in Transformer model. I had use your model to train on Vehicle-1M dataset. Vehicle-1M is a fine graied visual classification dataset. When I use this model the loss become NaN after some batch iteration. I had decrease the learning rate of AdamOptimizer and clipping the graident torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=2.0, norm_type=2) . But the loss still will become NaN sometimes. It seems that gradients are not big but they are in the same direction for many iterations. How to solve it?

opened by yt7589 3

Releases(0.1.2)

0.1.2(Dec 27, 2021)

Source code(tar.gz)
Source code(zip)
0.1.1(Mar 23, 2021)

Source code(tar.gz)
Source code(zip)
0.1.0(Mar 21, 2021)

Source code(tar.gz)
Source code(zip)
0.0.9(Mar 18, 2021)

Source code(tar.gz)
Source code(zip)
0.0.8(Mar 10, 2021)

Source code(tar.gz)
Source code(zip)
0.0.7(Mar 9, 2021)

Source code(tar.gz)
Source code(zip)
0.0.6(Mar 4, 2021)

Source code(tar.gz)
Source code(zip)
0.0.5(Mar 4, 2021)

Source code(tar.gz)
Source code(zip)
0.0.4(Mar 4, 2021)

Source code(tar.gz)
Source code(zip)
0.0.3(Mar 3, 2021)

Source code(tar.gz)
Source code(zip)
0.0.2(Mar 2, 2021)

Source code(tar.gz)
Source code(zip)
0.0.1(Mar 2, 2021)

Source code(tar.gz)
Source code(zip)

Owner

Phil Wang

Working with Attention. It's all we need.

GitHub Repository

Pytorch tutorials for Neural Style transfert

PyTorch Tutorials This tutorial is no longer maintained. Please use the official version: https://pytorch.org/tutorials/advanced/neural_style_tutorial

135 Jun 26, 2022

Augmented Traffic Control: A tool to simulate network conditions

Augmented Traffic Control Full documentation for the project is available at http://facebook.github.io/augmented-traffic-control/. Overview Augmented

4.3k Jan 08, 2023

Codes for realizing theories learned from Data Mining, Machine Learning, Deep Learning without using the present Python packages.

Codes-for-Algorithms Codes for realizing theories learned from Data Mining, Machine Learning, Deep Learning without using the present Python packages.

1 Apr 12, 2022

Official Pytorch Implementation of GraphiT

GraphiT: Encoding Graph Structure in Transformers This repository implements GraphiT, described in the following paper: Grégoire Mialon*, Dexiong Chen

80 Nov 27, 2022

Bagua is a flexible and performant distributed training algorithm development framework.

786 Dec 17, 2022

This project uses ViT to perform image classification tasks on DATA set CIFAR10.

Vision-Transformer-Multiprocess-DistributedDataParallel-Apex Introduction This project uses ViT to perform image classification tasks on DATA set CIFA

3 Jun 03, 2022

A python bot to move your mouse every few seconds to appear active on Skype, Teams or Zoom as you go AFK. 🐭 🤖

PyMouseBot If you're from GT and annoyed with SGVPN idle timeouts while working on development laptop, You might find this useful. A python cli bot to

6 Oct 24, 2022

Baselines for TrajNet++

TrajNet++ : The Trajectory Forecasting Framework PyTorch implementation of Human Trajectory Forecasting in Crowds: A Deep Learning Perspective TrajNet

183 Jan 05, 2023

Official implementation of "OpenPifPaf: Composite Fields for Semantic Keypoint Detection and Spatio-Temporal Association" in PyTorch.

openpifpaf Continuously tested on Linux, MacOS and Windows: New 2021 paper: OpenPifPaf: Composite Fields for Semantic Keypoint Detection and Spatio-Te

50 Dec 29, 2022

Parametric Contrastive Learning (ICCV2021)

Parametric-Contrastive-Learning This repository contains the implementation code for ICCV2021 paper: Parametric Contrastive Learning (https://arxiv.or

156 Dec 21, 2022

Ladder Variational Autoencoders (LVAE) in PyTorch

Ladder Variational Autoencoders (LVAE) PyTorch implementation of Ladder Variational Autoencoders (LVAE) [1]: where the variational distributions q at

63 Dec 22, 2022

NeurIPS-2021: Neural Auto-Curricula in Two-Player Zero-Sum Games.

NAC Official PyTorch implementation of NAC from the paper: Neural Auto-Curricula in Two-Player Zero-Sum Games. We release code for: Gradient based ora

19 Nov 11, 2022

Joint Learning of 3D Shape Retrieval and Deformation, CVPR 2021

Joint Learning of 3D Shape Retrieval and Deformation Joint Learning of 3D Shape Retrieval and Deformation Mikaela Angelina Uy, Vladimir G. Kim, Minhyu

38 Oct 18, 2022

Code for NeurIPS 2021 paper: Invariant Causal Imitation Learning for Generalizable Policies

Invariant Causal Imitation Learning for Generalizable Policies Ioana Bica, Daniel Jarrett, Mihaela van der Schaar Neural Information Processing System

17 Dec 01, 2022

Automatic caption evaluation metric based on typicality analysis.

SeMantic and linguistic UndeRstanding Fusion (SMURF) Automatic caption evaluation metric described in the paper "SMURF: SeMantic and linguistic UndeRs

6 Jan 09, 2022

This is the implementation of "SELF SUPERVISED REPRESENTATION LEARNING WITH DEEP CLUSTERING FOR ACOUSTIC UNIT DISCOVERY FROM RAW SPEECH" submitted to ICASSP 2022

CPC_DeepCluster This is the implementation of "SELF SUPERVISED REPRESENTATION LEARNING WITH DEEP CLUSTERING FOR ACOUSTIC UNIT DISCOVERY FROM RAW SPEEC

2 Sep 15, 2022

This repository contains the code to replicate the analysis from the paper "Moving On - Investigating Inventors' Ethnic Origins Using Supervised Learning"

Replication Code for 'Moving On' - Investigating Inventors' Ethnic Origins Using Supervised Learning This repository contains the code to replicate th

0 Jan 04, 2022

Implementation of Transformer in Transformer, pixel level attention paired with patch level attention for image classification, in Pytorch

Related tags

Overview

Transformer in Transformer

Install

Usage

Citations

Comments

Only works if pixel_size**2 == patch_size?

Not sure what is wrong!

patch_tokens vs patch_pos_emb

Inconsistent model params with MindSpore src code

Why the loss become NaN?

Releases(0.1.2)

0.1.2(Dec 27, 2021)

0.1.1(Mar 23, 2021)

0.1.0(Mar 21, 2021)

0.0.9(Mar 18, 2021)

0.0.8(Mar 10, 2021)

0.0.7(Mar 9, 2021)

0.0.6(Mar 4, 2021)

0.0.5(Mar 4, 2021)

0.0.4(Mar 4, 2021)

0.0.3(Mar 3, 2021)

0.0.2(Mar 2, 2021)

0.0.1(Mar 2, 2021)