Implementation of Uniformer, a simple attention and 3d convolutional net that achieved SOTA in a number of video classification tasks

Last update: Nov 24, 2022

Overview

Uniformer - Pytorch

Implementation of Uniformer, a simple attention and 3d convolutional net that achieved SOTA in a number of video classification tasks

Install

$ pip install uniformer-pytorch

Usage

Uniformer-S

import torch
from uniformer_pytorch import Uniformer

model = Uniformer(
    num_classes = 1000,                 # number of output classes
    dims = (64, 128, 256, 512),         # feature dimensions per stage (4 stages)
    depths = (3, 4, 8, 3),              # depth at each stage
    mhsa_types = ('l', 'l', 'g', 'g')   # aggregation type at each stage, 'l' stands for local, 'g' stands for global
)

video = torch.randn(1, 3, 8, 224, 224)  # (batch, channels, time, height, width)

logits = model(video) # (1, 1000)

Uniformer-B

import torch
from uniformer_pytorch import Uniformer

model = Uniformer(
    num_classes = 1000
    depths = (5, 8, 20, 7)
)

Citations

@inproceedings{anonymous2022uniformer,
    title   = {UniFormer: Unified Transformer for Efficient Spatial-Temporal Representation Learning},
    author  = {Anonymous},
    booktitle = {Submitted to The Tenth International Conference on Learning Representations },
    year    = {2022},
    url     = {https://openreview.net/forum?id=nBU_u6DLvoK},
    note    = {under review}
}

Comments

About the re-implementation of Layernorm

Hi @lucidrains ,

Thanks for the implementation of Uniformer. After checking the code, I found that you have re-implemented the layernorm operation in the following. https://github.com/lucidrains/uniformer-pytorch/blob/78397000e647fd4035a7fa2bede999d59d4c3ded/uniformer_pytorch/uniformer_pytorch.py#L13-L23

I wonder if there are any differences between your re-implementation and the official code provided by PyTorch (i.e., torch.nn.Layernorm(dim) ).

Looking forward to your reply. Thx.

opened by ChongjianGE 2
Pretrained weights

Did you pretrain the uniformer on any of the benchmark datasets? (Kinetics, something-something, ...) If you did, would you be able to share the state dicts?

Thanks in advance!

opened by Fritskee 3

A PyTorch implementation of Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks

SVHNClassifier-PyTorch A PyTorch implementation of Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks If

182 Jan 3, 2023

U^2-Net - Portrait matting This repository explores possibilities of using the original u^2-net model for portrait matting.

104 Nov 25, 2022

U-2-Net: U Square Net - Modified for paired image training of style transfer

U2-Net: U Square Net Modified for paired image training of style transfer This is an unofficial repo making use of the code which was made available b

43 Oct 3, 2022

Implementation of TimeSformer, a pure attention-based solution for video classification

TimeSformer - Pytorch Implementation of TimeSformer, a pure and simple attention-based solution for reaching SOTA on video classification.

602 Jan 3, 2023

A PyTorch implementation for V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation

A PyTorch implementation of V-Net Vnet is a PyTorch implementation of the paper V-Net: Fully Convolutional Neural Networks for Volumetric Medical Imag

606 Dec 21, 2022

An implementation of the research paper "Retina Blood Vessel Segmentation Using A U-Net Based Convolutional Neural Network"

Retina Blood Vessels Segmentation This is an implementation of the research paper "Retina Blood Vessel Segmentation Using A U-Net Based Convolutional

23 Aug 20, 2022

U-Net Implementation: Convolutional Networks for Biomedical Image Segmentation" using the Carvana Image Masking Dataset in PyTorch

U-Net Implementation By Christopher Ley This is my interpretation and implementation of the famous paper "U-Net: Convolutional Networks for Biomedical

1 Jan 6, 2022

RETRO-pytorch - Implementation of RETRO, Deepmind's Retrieval based Attention net, in Pytorch

RETRO - Pytorch (wip) Implementation of RETRO, Deepmind's Retrieval based Attent

556 Jan 4, 2023

Implementation of 🦩 Flamingo, state-of-the-art few-shot visual question answering attention net out of Deepmind, in Pytorch

🦩 Flamingo - Pytorch Implementation of Flamingo, state-of-the-art few-shot visual question answering attention net, in Pytorch. It will include the p

630 Dec 28, 2022

Implementation of Uniformer, a simple attention and 3d convolutional net that achieved SOTA in a number of video classification tasks

Related tags

Overview

Uniformer - Pytorch

Install

Usage

Citations

You might also like...

A PyTorch implementation of Multi-digit Number Recognition from Street View Imagery using Deep Convolutional Neural Networks

U^2-Net - Portrait matting This repository explores possibilities of using the original u^2-net model for portrait matting.

U-2-Net: U Square Net - Modified for paired image training of style transfer

Implementation of TimeSformer, a pure attention-based solution for video classification

A PyTorch implementation for V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation

An implementation of the research paper "Retina Blood Vessel Segmentation Using A U-Net Based Convolutional Neural Network"

U-Net Implementation: Convolutional Networks for Biomedical Image Segmentation" using the Carvana Image Masking Dataset in PyTorch

RETRO-pytorch - Implementation of RETRO, Deepmind's Retrieval based Attention net, in Pytorch

Implementation of 🦩 Flamingo, state-of-the-art few-shot visual question answering attention net out of Deepmind, in Pytorch

Comments

About the re-implementation of Layernorm

Pretrained weights

Releases(0.0.4)

0.0.4(Apr 22, 2022)

0.0.3(Nov 17, 2021)

0.0.2(Nov 16, 2021)

0.0.1(Nov 16, 2021)

Owner

Phil Wang

Taming Transformers for High-Resolution Image Synthesis

Boostcamp AI Tech 3rd / Basic Paper reading w.r.t Embedding

D²Conv3D: Dynamic Dilated Convolutions for Object Segmentation in Videos

Layered Neural Atlases for Consistent Video Editing

The PyTorch implementation of Directed Graph Contrastive Learning (DiGCL), NeurIPS-2021

RoMA: Robust Model Adaptation for Offline Model-based Optimization

Unsupervised Image-to-Image Translation

D-NeRF: Neural Radiance Fields for Dynamic Scenes

SimplEx - Explaining Latent Representations with a Corpus of Examples

Self-Supervised Generative Style Transfer for One-Shot Medical Image Segmentation

Official PyTorch implementation of U-GAT-IT: Unsupervised Generative Attentional Networks with Adaptive Layer-Instance Normalization for Image-to-Image Translation

FactSeg: Foreground Activation Driven Small Object Semantic Segmentation in Large-Scale Remote Sensing Imagery (TGRS)

Deep Distributed Control of Port-Hamiltonian Systems

Inhomogeneous Social Recommendation with Hypergraph Convolutional Networks

Transfer Reinforcement Learning for Differing Action Spaces via Q-Network Representations

This library is a location of the LegacyLogger for PyTorch Lightning.

ShapeGlot: Learning Language for Shape Differentiation

Plenoxels: Radiance Fields without Neural Networks

The all new way to turn your boring vector meshes into the new fad in town; Voxels!

Public repository created to store my custom-made tools for Just Dance (UbiArt Engine)