A denoising diffusion probabilistic model (DDPM) tailored for conditional generation of protein distograms

Last update: Nov 23, 2022

Overview

Denoising Diffusion Probabilistic Model for Proteins

Implementation of Denoising Diffusion Probabilistic Model in Pytorch. It is a new approach to generative modeling that may have the potential to rival GANs. It uses denoising score matching to estimate the gradient of the data distribution, followed by Langevin sampling to sample from the true distribution. This implementation was transcribed from the official Tensorflow version here.

This specific repository will be using a heavily modifying version of the U-net for learning on protein structure, with eventual conditioning from MSA Transformers attention heads.

** at around 40k iterations **

Install

$ pip install ddpm-proteins

Training

We are using weights & biases for experimental tracking

First you need to login

$ wandb login

Then

$ python train.py

Edit train.py to whatever for your research desires

Todo

condition on mask
condition on MSA transformers (with caching of tensors in specified directory by protein id)
reach for size 384
all-attention network with uformer https://arxiv.org/abs/2106.03106 (with 1d + 2d conv kernels)
add all improvements from https://arxiv.org/abs/2105.05233 and https://cascaded-diffusion.github.io/

Usage

import torch
from ddpm_proteins import Unet, GaussianDiffusion

model = Unet(
    dim = 64,
    dim_mults = (1, 2, 4, 8)
)

diffusion = GaussianDiffusion(
    model,
    image_size = 128,
    timesteps = 1000,   # number of steps
    loss_type = 'l1'    # L1 or L2
)

training_images = torch.randn(8, 3, 128, 128)
loss = diffusion(training_images)
loss.backward()
# after a lot of training

sampled_images = diffusion.sample(batch_size = 4)
sampled_images.shape # (4, 3, 128, 128)

Or, if you simply want to pass in a folder name and the desired image dimensions, you can use the Trainer class to easily train a model.

from ddpm_proteins import Unet, GaussianDiffusion, Trainer

model = Unet(
    dim = 64,
    dim_mults = (1, 2, 4, 8)
).cuda()

diffusion = GaussianDiffusion(
    model,
    image_size = 128,
    timesteps = 1000,   # number of steps
    loss_type = 'l1'    # L1 or L2
).cuda()

trainer = Trainer(
    diffusion,
    'path/to/your/images',
    train_batch_size = 32,
    train_lr = 2e-5,
    train_num_steps = 700000,         # total training steps
    gradient_accumulate_every = 2,    # gradient accumulation steps
    ema_decay = 0.995,                # exponential moving average decay
    fp16 = True                       # turn on mixed precision training with apex
)

trainer.train()

Samples and model checkpoints will be logged to ./results periodically

Citations

@misc{ho2020denoising,
    title   = {Denoising Diffusion Probabilistic Models},
    author  = {Jonathan Ho and Ajay Jain and Pieter Abbeel},
    year    = {2020},
    eprint  = {2006.11239},
    archivePrefix = {arXiv},
    primaryClass = {cs.LG}
}

@inproceedings{anonymous2021improved,
    title   = {Improved Denoising Diffusion Probabilistic Models},
    author  = {Anonymous},
    booktitle = {Submitted to International Conference on Learning Representations},
    year    = {2021},
    url     = {https://openreview.net/forum?id=-NEXDKk8gZ},
    note    = {under review}
}

@article{Rao2021.02.12.430858,
    author  = {Rao, Roshan and Liu, Jason and Verkuil, Robert and Meier, Joshua and Canny, John F. and Abbeel, Pieter and Sercu, Tom and Rives, Alexander},
    title   = {MSA Transformer},
    year    = {2021},
    publisher = {Cold Spring Harbor Laboratory},
    URL     = {https://www.biorxiv.org/content/early/2021/02/13/2021.02.12.430858},
    journal = {bioRxiv}
}

QSYM: A Practical Concolic Execution Engine Tailored for Hybrid Fuzzing

QSYM: A Practical Concolic Execution Engine Tailored for Hybrid Fuzzing Environment Tested on Ubuntu 14.04 64bit and 16.04 64bit Installation # disabl

581 Dec 30, 2022

Official PyTorch implementation for FastDPM, a fast sampling algorithm for diffusion probabilistic models

Official PyTorch implementation for "On Fast Sampling of Diffusion Probabilistic Models". FastDPM generation on CIFAR-10, CelebA, and LSUN datasets. S

68 Dec 26, 2022

Official codebase for running the small, filtered-data GLIDE model from GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models.

GLIDE This is the official codebase for running the small, filtered-data GLIDE model from GLIDE: Towards Photorealistic Image Generation and Editing w

2.9k Jan 4, 2023

Implementation of GeoDiff: a Geometric Diffusion Model for Molecular Conformation Generation (ICLR 2022).

GeoDiff: a Geometric Diffusion Model for Molecular Conformation Generation [OpenReview] [arXiv] [Code] The official implementation of GeoDiff: A Geome

155 Dec 26, 2022

Implementation and replication of ProGen, Language Modeling for Protein Generation, in Jax

ProGen - (wip) Implementation and replication of ProGen, Language Modeling for Protein Generation, in Pytorch and Jax (the weights will be made easily

71 Dec 1, 2022

Replication attempt for the Protein Folding Model

RGN2-Replica (WIP) To eventually become an unofficial working Pytorch implementation of RGN2, an state of the art model for MSA-less Protein Folding f

36 Nov 29, 2022

Implementation of a protein autoregressive language model, but with autoregressive infilling objective (editing subsequences capability)

Protein GLM (wip) Implementation of a protein autoregressive language model, but with autoregressive infilling objective (editing subsequences capabil

17 May 6, 2022

McGill Physics Hackathon 2021: Reaction-Diffusion Models for the Generation of Biological Patterns

DiffuseAnimals: Reaction-Diffusion Models for the Generation of Biological Patterns Introduction Reaction-diffusion equations can be utilized in order

2 Mar 7, 2022

StudioGAN is a Pytorch library providing implementations of representative Generative Adversarial Networks (GANs) for conditional/unconditional image generation.

3k Jan 8, 2023

A denoising diffusion probabilistic model (DDPM) tailored for conditional generation of protein distograms

Related tags

Overview

Denoising Diffusion Probabilistic Model for Proteins

Install

Training

Todo

Usage

Citations

You might also like...

QSYM: A Practical Concolic Execution Engine Tailored for Hybrid Fuzzing

Official PyTorch implementation for FastDPM, a fast sampling algorithm for diffusion probabilistic models

Official codebase for running the small, filtered-data GLIDE model from GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models.

Implementation of GeoDiff: a Geometric Diffusion Model for Molecular Conformation Generation (ICLR 2022).

Implementation and replication of ProGen, Language Modeling for Protein Generation, in Jax

Replication attempt for the Protein Folding Model

Implementation of a protein autoregressive language model, but with autoregressive infilling objective (editing subsequences capability)

McGill Physics Hackathon 2021: Reaction-Diffusion Models for the Generation of Biological Patterns

StudioGAN is a Pytorch library providing implementations of representative Generative Adversarial Networks (GANs) for conditional/unconditional image generation.

Releases(0.0.11)

0.0.11(Apr 20, 2022)

0.0.10(Aug 24, 2021)

0.0.9(Jun 22, 2021)

0.0.8(Jun 17, 2021)

0.0.7(Jun 17, 2021)

0.0.6(Jun 15, 2021)

0.0.5(Jun 14, 2021)

0.0.4(Jun 14, 2021)

0.0.2(Jun 14, 2021)

0.0.1b(Jun 14, 2021)

Owner

Phil Wang

A standard framework for modelling Deep Learning Models for tabular data

Code and Data for the paper: Molecular Contrastive Learning with Chemical Element Knowledge Graph [AAAI 2022]

A tensorflow implementation of an HMM layer

Bridging Composite and Real: Towards End-to-end Deep Image Matting

Weighted K Nearest Neighbors (kNN) algorithm implemented on python from scratch.

Random Forests for Regression with Missing Entries

Compute descriptors for 3D point cloud registration using a multi scale sparse voxel architecture

Prototypical python implementation of the trust-region algorithm presented in Sequential Linearization Method for Bound-Constrained Mathematical Programs with Complementarity Constraints by Larson, Leyffer, Kirches, and Manns.

Proximal Backpropagation - a neural network training algorithm that takes implicit instead of explicit gradient steps

REBEL: Relation Extraction By End-to-end Language generation

Json2Xml tool will help you convert from json COCO format to VOC xml format in Object Detection Problem.

This repository contains the implementation of the paper: "Towards Frequency-Based Explanation for Robust CNN"

EM-POSE 3D Human Pose Estimation from Sparse Electromagnetic Trackers.

Tensorflow port of a full NetVLAD network

Collect super-resolution related papers, data, repositories

An implementation of "Optimal Textures: Fast and Robust Texture Synthesis and Style Transfer through Optimal Transport"

details on efforts to dump the Watermelon Games Paprium cart

PyTorch Implement for Path Attention Graph Network

Pytorch implementations of popular off-policy multi-agent reinforcement learning algorithms, including QMix, VDN, MADDPG, and MATD3.

Pytorch implementation of One-Shot Affordance Detection