DeepMind Perceiver (in PyTorch)

Disclaimer: This is not official and I'm not affiliated with DeepMind.

My implementation of the Perceiver: General Perception with Iterative Attention. You can read more about the model on DeepMind's website.

I trained an MNIST model which you can find in models/mnist.pkl or by using perceiver.load_mnist_model(). It gets 96.02% on the test-data.

Getting started

To run this you need PyTorch installed:

pip3 install torch

From perceiver you can import Perceiver or PerceiverLogits.

Then you can use it as such (or look in examples.ipynb):

from perceiver import Perceiver

model = Perceiver(
    input_channels, # <- How many channels in the input? E.g. 3 for RGB.
    input_shape, # <- How big is the input in the different dimensions? E.g. (28, 28) for MNIST
    fourier_bands=4, # <- How many bands should the positional encoding have?
    latents=64, # <- How many latent vectors?
    d_model=32, # <- Model dimensionality. Every pixel/token/latent vector will have this size.
    heads=8, # <- How many heads in self-attention? Cross-attention always has 1 head.
    latent_blocks=6, # <- How much latent self-attention for each cross attention with the input?
    dropout=0.1, # <- Dropout
    layers=8, # <- This will become two unique layer-blocks: layer 1 and layer 2-8 (using weight sharing).
)

The above model outputs the latents after the final layer. If you want logits instead, use the following model:

from perceiver import PerceiverLogits

model = PerceiverLogits(
    input_channels, # <- How many channels in the input? E.g. 3 for RGB.
    input_shape, # <- How big is the input in the different dimensions? E.g. (28, 28) for MNIST
    output_features, # <- How many different classes? E.g. 10 for MNIST.
    fourier_bands=4, # <- How many bands should the positional encoding have?
    latents=64, # <- How many latent vectors?
    d_model=32, # <- Model dimensionality. Every pixel/token/latent vector will have this size.
    heads=8, # <- How many heads in self-attention? Cross-attention always has 1 head.
    latent_blocks=6, # <- How much latent self-attention for each cross attention with the input?
    dropout=0.1, # <- Dropout
    layers=8, # <- This will become two unique layer-blocks: layer 1 and layer 2-8 (using weight sharing).
)

To use my pre-trained MNIST model (not very good):

from perceiver import load_mnist_model

model = load_mnist_model()

TODO:

Positional embedding generalized to n dimensions (with fourier features)
Train other models (like CIFAR-100 or something not in the image domain)
Type indication
Unit tests for components of model
Package

My implementation of DeepMind's Perceiver

Related tags

Overview

DeepMind Perceiver (in PyTorch)

Getting started

TODO:

Owner

Louis Arge

General-purpose program synthesiser

SIMULEVAL A General Evaluation Toolkit for Simultaneous Translation

Implementation of Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning

VOS: Learning What You Don’t Know by Virtual Outlier Synthesis

Python scripts form performing stereo depth estimation using the CoEx model in ONNX.

ALBERT-pytorch-implementation - ALBERT pytorch implementation

Character Grounding and Re-Identification in Story of Videos and Text Descriptions

CellRank's reproducibility repository.

Pytorch version of VidLanKD: Improving Language Understanding viaVideo-Distilled Knowledge Transfer

PyTorch image models, scripts, pretrained weights -- ResNet, ResNeXT, EfficientNet, EfficientNetV2, NFNet, Vision Transformer, MixNet, MobileNet-V3/V2, RegNet, DPN, CSPNet, and more

How Do Adam and Training Strategies Help BNNs Optimization? In ICML 2021.

Learning Optical Flow from a Few Matches (CVPR 2021)

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

Cobalt Strike teamserver detection.

Moon-patrol - A faithful recreation of the 1983 hit classic Moon Patrol for the Atari 2600 created using the Pygame library for Python

Pipeline code for Sequential-GAM(Genome Architecture Mapping).

《Train in Germany, Test in The USA: Making 3D Object Detectors Generalize》(CVPR 2020)

MoCap-Solver: A Neural Solver for Optical Motion Capture Data

🔮 Execution time predictions for deep neural network training iterations across different GPUs.

Tensorflow implementation of "BEGAN: Boundary Equilibrium Generative Adversarial Networks"