Vision Transformer Segmentation Network

This implementation of ViT in pytorch uses a super simple and straight-forward way of generating an output of the same size as the input by applying the inverse rearrange operation on all the predicted outputs. This enables convolution-free multi-class segmentation.

Most of the code is taken from https://github.com/lucidrains/vit-pytorch/blob/main/vit_pytorch/vit.py

Default Architecture Parameters:

model = ViTSeg( image_size=112, 
                channels=1,
                patch_size=7, 
                num_classes=1, 
                dim=768, 
                depth=6, 
                heads=12, 
                mlp_dim=2048, 
                learned_pos=False, 
                use_token=False)

image_size: An integer or a tuple defining the size of the input image (some code rewrite would enable any image size to be passed)
channels: An integer defining the umber of channels in the input image
patch_size: An integer or a tuple defining the size of the patches
num_classes: An integer representing the nuber of channels in the ouput
dim: An integer defining the size of the embedding dimension
depth: An integer defining the number of transformer layers
heads: An integer defining the number of heads in the transformer layers
mlp_dim: An integer defining the size of the MLP in the transformer layers
learned_pos: A boolean which, if true, switches from fixed positional encoding to learned positional encodings
use_token: A boolean which, if true, add a CLS token in the input and output

Citation

If you find this repository useful, please consider citing it:

@article{reynaud2021vitseg,
  title={ViTSeg-https://github.com/HReynaud/ViTSeg}, 
  url={https://github.com/HReynaud/ViTSeg},  
  Author={Reynaud, Hadrien}, 
  Year={2021}
}

A simple approach to emable dense segmentation with ViT.

Related tags

Overview

Vision Transformer Segmentation Network

Default Architecture Parameters:

Citation

Owner

HReynaud

DL & CV-based indicator toolset for the vehicle drivers via live dash-cam footage.

Colab notebook and additional materials for Python-driven analysis of redlining data in Philadelphia

PASSL包含 SimCLR，MoCo，BYOL，CLIP等基于对比学习的图像自监督算法以及 Vision-Transformer，Swin-Transformer，BEiT，CVT，T2T，MLP_Mixer等视觉Transformer算法

The easiest tool for extracting radiomics features and training ML models on them.

TorchX is a library containing standard DSLs for authoring and running PyTorch related components for an E2E production ML pipeline.

Real-Time Seizure Detection using EEG: A Comprehensive Comparison of Recent Approaches under a Realistic Setting

An experimentation and research platform to investigate the interaction of automated agents in an abstract simulated network environments.

Source code of our TTH paper: Targeted Trojan-Horse Attacks on Language-based Image Retrieval.

PyTorch reimplementation of hand-biomechanical-constraints (ECCV2020)

PyToch implementation of A Novel Self-supervised Learning Task Designed for Anomaly Segmentation

Official Pytorch Implementation of Relational Self-Attention: What's Missing in Attention for Video Understanding

Epidemiology analysis package

A Python library created to assist programmers with complex mathematical functions

Keras code and weights files for popular deep learning models.

PyTorch implementation of Constrained Policy Optimization

AAI supports interdisciplinary research to help better understand human, animal, and artificial cognition.

Tensorflow implementation of Swin Transformer model.

VGGFace2-HQ - A high resolution face dataset for face editing purpose

MoCoPnet - Deformable 3D Convolution for Video Super-Resolution

PyTorch implementation of Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy