This repository implements and evaluates convolutional networks on the Möbius strip as toy model instantiations of Coordinate Independent Convolutional Networks.

Overview

Orientation independent Möbius CNNs





This repository implements and evaluates convolutional networks on the Möbius strip as toy model instantiations of Coordinate Independent Convolutional Networks.

Background (tl;dr)

All derivations and a detailed description of the models are found in Section 5 of our paper. What follows is an informal tl;dr, summarizing the central aspects of Möbius CNNs.

Feature fields on the Möbius strip: A key characteristic of the Möbius strip is its topological twist, making it a non-orientable manifold. Convolutional weight sharing on the Möbius strip is therefore only well defined up to a reflection of kernels. To account for the ambiguity of kernel orientations, one needs to demand that the kernel responses (feature vectors) transform in a predictable way when different orientations are chosen. Mathematically, this transformation is specified by a group representation ρ of the reflection group. We implement three different feature field types, each characterized by a choice of group representation:

  • scalar fields are modeled by the trivial representation. Scalars stay invariant under reflective gauge transformations:

  • sign-flip fields transform according to the sign-flip representation of the reflection group. Reflective gauge transformations negate the single numerical coefficient of a sign-flip feature:

  • regular feature fields are associated to the regular representation. For the reflection group, this implies 2-dimensional features whose two values (channels) are swapped by gauge transformations:

Reflection steerable kernels (gauge equivariance):

Convolution kernels on the Möbius strip are parameterized maps

whose numbers of input and output channels depend on the types of feature fields between which they map. Since a reflection of a kernel should result in a corresponding transformation of its output feature field, the kernel has to obey certain symmetry constraints. Specifically, kernels have to be reflection steerable (or gauge equivariant), i.e. should satisfy:

The following table visualizes this symmetry constraint for any pair of input and output field types that we implement:

Similar equivariance constraints are imposed on biases and nonlinearities; see the paper for more details.

Isometry equivariance: Shifts of the Möbius strip along itself are isometries. After one revolution (a shift by 2π), points on the strip do not return to themselves, but end up reflected along the width of the strip:

Such reflections of patterns are explained away by the reflection equivariance of the convolution kernels. Orientation independent convolutions are therefore automatically equivariant w.r.t. the action of such isometries on feature fields. Our empirical results, shown in the table below, confirm that this theoretical guarantee holds in practice. Conventional CNNs, on the other hand, are explicitly coordinate dependent, and are therefore in particular not isometry equivariant.

Implementation

Neural network layers are implemented in nn_layers.py while the models are found in models.py. All individual layers and all models are unit tested in unit_tests.py.

Feature fields: We assume Möbius strips with a locally flat geometry, i.e. strips which can be thought of as being constructed by gluing two opposite ends of a rectangular flat stripe together in a twisted way. Feature fields are therefore discretized on a regular sampling grid on a rectangular domain of pixels. Note that this choice induces a global gauge (frame field), which is discontinuous at the cut.

In practice, a neural network operates on multiple feature fields which are stacked in the channel dimension (a direct sum). Feature spaces are therefore characterized by their feature field multiplicities. For instance, one could have 10 scalar fields, 4 sign-flip fields and 8 regular feature fields, which consume in total channels. Denoting the batch size by , a feature space is encoded by a tensor of shape .

The correct transformation law of the feature fields is guaranteed by the coordinate independence (steerability) of the network layers operating on it.

Orientation independent convolutions and bias summation: The class MobiusConv implements orientation independent convolutions and bias summations between input and output feature spaces as specified by the multiplicity constructor arguments in_fields and out_fields, respectively. Kernels are as usual discretized by a grid of size*size pixels. The steerability constraints on convolution kernels and biases are implemented by allocating a reduced number of parameters, from which the symmetric (steerable) kernels and biases are expanded during the forward pass.

Coordinate independent convolutions rely furthermore on parallel transporters of feature vectors, which are implemented as a transport padding operation. This operation pads both sides of the cut with size//2 columns of pixels which are 1) spatially reflected and 2) reflection-steered according to the field types. The stripes are furthermore zero-padded along their width.

The forward pass operates then by:

  • expanding steerable kernels and biases from their non-redundant parameter arrays
  • transport padding the input field array
  • running a conventional Euclidean convolution

As the padding added size//2 pixels around the strip, the spatial resolution of the output field agrees with that of the input field.

Orientation independent nonlinearities: Scalar fields and regular feature fields are acted on by conventional ELU nonlinearities, which are equivariant for these field types. Sign-flip fields are processed by applying ELU nonlinearities to their absolute value after summing a learnable bias parameter. To ensure that the resulting fields are again transforming according to the sign-flip representation, we multiply them subsequently with the signs of the input features. See the paper and the class EquivNonlin for more details.

Feature field pooling: The module MobiusPool implements an orientation independent pooling operation with a stride and kernel size of two pixels, thus halving the fields' spatial resolution. Scalar and regular feature fields are pooled with a conventional max pooling operation, which is for these field types coordinate independent. As the coefficients of sign-flip fields negate under gauge transformations, they are pooled based on their (gauge invariant) absolute value.

While the pooling operation is tested to be exactly gauge equivariant, its spatial subsampling interferes inevitably with its isometry equivariance. Specifically, the pooling operation is only isometry equivariant w.r.t. shifts by an even number of pixels. Note that the same issue applies to conventional Euclidean CNNs as well; see e.g. (Azulay and Weiss, 2019) or (Zhang, 2019).

Models: All models are implemented in models.py. The orientation independent models, which differ only in their field type multiplicities but agree in their total number of channels, are implemented as class MobiusGaugeCNN. We furthermore implement conventional CNN baselines, one with the same number of channels and thus more parameters (α=1) and one with the same number of parameters but less channels (α=2). Since conventional CNNs are explicitly coordinate dependent they utilize a naive padding operation (MobiusPadNaive), which performs a spatial reflection of feature maps but does not apply the unspecified gauge transformation. The following table gives an overview of the different models:

Data - Möbius MNIST

We benchmark our models on Möbius MNIST, a simple classification dataset which consists of MNIST digits that are projected on the Möbius strip. Since MNIST digits are gray-scale images, they are geometrically identified as scalar fields. The size of the training set is by default set to 12000 digits, which agrees with the rotated MNIST dataset.

There are two versions of the training and test sets which consist of centered and shifted digits. All digits in the centered datasets occur at the same location (and the same orientation) of the strip. The isometry shifted digits appear at uniformly sampled locations. Recall that shifts once around the strip lead to a reflection of the digits as visualized above. The following digits show isometry shifted digits (note the reflection at the cut):

To generate the datasets it is sufficient to call convert_mnist.py, which downloads the original MNIST dataset via torchvision and saves the Möbius MNIST datasets in data/mobius_MNIST.npz.

Results

The models can then be trained by calling, for instance,

python train.py --model mobius_regular

For more options and further model types, consult the help message: python train.py -h

The following table gives an overview of the performance of all models in two different settings, averaged over 32 runs:

The setting "shifted train digits" trains and evaluates on isometry shifted digits. To test the isometry equivariance of the models, we train them furthermore on "centered train digits", testing them then out-of-distribution on shifted digits. As one can see, the orientation independent models generalize well over these unseen variations while the conventional coordinate dependent CNNs' performance deteriorates.

Dependencies

This library is based on Python3.7. It requires the following packages:

numpy
torch>=1.1
torchvision>=0.3

Logging via tensorboard is optional.

Owner
Maurice Weiler
AI researcher with a focus on geometric and equivariant deep learning. PhD candidate under the supervision of Max Welling. Master's degree in Physics.
Maurice Weiler
A deep learning network built with TensorFlow and Keras to classify gender and estimate age.

Convolutional Neural Network (CNN). This repository contains a source code of a deep learning network built with TensorFlow and Keras to classify gend

Pawel Dziemiach 1 Dec 18, 2021
Code for Parameter Prediction for Unseen Deep Architectures (NeurIPS 2021)

Parameter Prediction for Unseen Deep Architectures (NeurIPS 2021) authors: Boris Knyazev, Michal Drozdzal, Graham Taylor, Adriana Romero-Soriano Overv

Facebook Research 462 Jan 03, 2023
Code for "Learning to Regrasp by Learning to Place"

Learning2Regrasp Learning to Regrasp by Learning to Place, CoRL 2021. Introduction We propose a point-cloud-based system for robots to predict a seque

Shuo Cheng (成硕) 18 Aug 27, 2022
The Malware Open-source Threat Intelligence Family dataset contains 3,095 disarmed PE malware samples from 454 families

MOTIF Dataset The Malware Open-source Threat Intelligence Family (MOTIF) dataset contains 3,095 disarmed PE malware samples from 454 families, labeled

Booz Allen Hamilton 112 Dec 13, 2022
[Pedestron] Generalizable Pedestrian Detection: The Elephant In The Room. @ CVPR2021

Pedestron Pedestron is a MMdetection based repository, that focuses on the advancement of research on pedestrian detection. We provide a list of detec

Irtiza Hasan 594 Jan 05, 2023
Semantic Segmentation of images using PixelLib with help of Pascalvoc dataset trained with Deeplabv3+ framework.

CARscan- Approach 1 - Segmentation of images by detecting contours. It failed because in images with elements along with cars were also getting detect

Padmanabha Banerjee 5 Jul 29, 2021
Human Dynamics from Monocular Video with Dynamic Camera Movements

Human Dynamics from Monocular Video with Dynamic Camera Movements Ri Yu, Hwangpil Park and Jehee Lee Seoul National University ACM Transactions on Gra

215 Jan 01, 2023
This is the pytorch re-implementation of the IterNorm

IterNorm-pytorch Pytorch reimplementation of the IterNorm methods, which is described in the following paper: Iterative Normalization: Beyond Standard

Lei Huang 32 Dec 27, 2022
Deploy optimized transformer based models on Nvidia Triton server

🤗 Hugging Face Transformer submillisecond inference 🤯 and deployment on Nvidia Triton server Yes, you can perfom inference with transformer based mo

Lefebvre Sarrut Services 1.2k Jan 05, 2023
a morph transfer UGATIT for image translation.

Morph-UGATIT a morph transfer UGATIT for image translation. Introduction 中文技术文档 This is Pytorch implementation of UGATIT, paper "U-GAT-IT: Unsupervise

55 Nov 14, 2022
An alarm clock coded in Python 3 with Tkinter

Tkinter-Alarm-Clock An alarm clock coded in Python 3 with Tkinter. Run python3 Tkinter Alarm Clock.py in a terminal if you have Python 3. NOTE: This p

CodeMaster7000 1 Dec 25, 2021
Official public repository of paper "Intention Adaptive Graph Neural Network for Category-Aware Session-Based Recommendation"

Intention Adaptive Graph Neural Network (IAGNN) This is the official repository of paper Intention Adaptive Graph Neural Network for Category-Aware Se

9 Nov 22, 2022
基于深度强化学习的原神自动钓鱼AI

原神自动钓鱼AI由YOLOX, DQN两部分模型组成。使用迁移学习,半监督学习进行训练。 模型也包含一些使用opencv等传统数字图像处理方法实现的不可学习部分。

4.2k Jan 01, 2023
Pytorch implementation of “Recursive Non-Autoregressive Graph-to-Graph Transformer for Dependency Parsing with Iterative Refinement”

Graph-to-Graph Transformers Self-attention models, such as Transformer, have been hugely successful in a wide range of natural language processing (NL

Idiap Research Institute 40 Aug 14, 2022
Plenoxels: Radiance Fields without Neural Networks

Plenoxels: Radiance Fields without Neural Networks Alex Yu*, Sara Fridovich-Keil*, Matthew Tancik, Qinhong Chen, Benjamin Recht, Angjoo Kanazawa UC Be

Sara Fridovich-Keil 81 Dec 25, 2022
Deep Structured Instance Graph for Distilling Object Detectors (ICCV 2021)

DSIG Deep Structured Instance Graph for Distilling Object Detectors Authors: Yixin Chen, Pengguang Chen, Shu Liu, Liwei Wang, Jiaya Jia. [pdf] [slide]

DV Lab 31 Nov 17, 2022
Unsupervised Representation Learning via Neural Activation Coding

Neural Activation Coding This repository contains the code for the paper "Unsupervised Representation Learning via Neural Activation Coding" published

yookoon park 5 May 26, 2022
An index of recommendation algorithms that are based on Graph Neural Networks.

An index of recommendation algorithms that are based on Graph Neural Networks.

FIB LAB, Tsinghua University 564 Jan 07, 2023
A Pytree Module system for Deep Learning in JAX

Treex A Pytree-based Module system for Deep Learning in JAX Intuitive: Modules are simple Python objects that respect Object-Oriented semantics and sh

Cristian Garcia 216 Dec 20, 2022
An unopinionated replacement for PyTorch's Dataset and ImageFolder, that handles Tar archives

Simple Tar Dataset An unopinionated replacement for PyTorch's Dataset and ImageFolder classes, for datasets stored as uncompressed Tar archives. Just

Joao Henriques 47 Dec 20, 2022