Improved Fitness Optimization Landscapes for Sequence Design

Overview

ReLSO

Improved Fitness Optimization Landscapes for Sequence Design

Description


In recent years, deep learning approaches for determining protein sequence-fitness relationships have gained traction. Advances in high-throughput mutagenesis, directed evolution, and next-generation sequencing have allowed for the accumulation of large amounts of labelled fitness data and consequently, attracted the application of various deep learning methods. Although these methods learn an implicit fitness landscape, there is little work on using the latent encoding directly for protein sequence optimization. Here we show that this latent space representation of a fitness landscape can be made very amenable to latent space optimization through a joint-training process. We also show that this encoding strategy which also provides improvements to generalization over more traditional training strategies. We apply our approach to several biological contexts and show that latent space optimization in a smooth learned folding landscape allows for more accurate and efficient optimization of protein sequences.

Citation

This repo accompanies the following publication:

Egbert Castro, Abhinav Godavarthi, Julien Rubinfien, Smita Krishnaswamy. Guided Generative Protein Design using Regularized Transformers. Nature Machine Intelligence, in review (2021).

How to run


First, install dependencies

# clone project   
git clone https://github.com/KrishnaswamyLab/ReLSO-Guided-Generative-Protein-Design-using-Regularized-Transformers.git


# install project   
cd ReLSO-Guided-Generative-Protein-Design-using-Regularized-Transformers 
pip install -e .   
pip install -r requirements.txt

Usage

Training models

# run training script
python train_relso.py  --data gifford

*note: if arg option is not relevant to current model selection, it will not be used. See init method of each model to see what's used.

available dataset args:

    gifford, GB1_WU, GFP, TAPE

available auxnetwork args:

    base_reg

Original data sources

You might also like...
An implementation of a sequence to sequence neural network using an encoder-decoder
An implementation of a sequence to sequence neural network using an encoder-decoder

Keras implementation of a sequence to sequence model for time series prediction using an encoder-decoder architecture. I created this post to share a

Sequence lineage information extracted from RKI sequence data repo
Sequence lineage information extracted from RKI sequence data repo

Pango lineage information for German SARS-CoV-2 sequences This repository contains a join of the metadata and pango lineage tables of all German SARS-

Official repository of OFA. Paper: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
Official repository of OFA. Paper: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

Paper | Blog OFA is a unified multimodal pretrained model that unifies modalities (i.e., cross-modality, vision, language) and tasks (e.g., image gene

Aircraft design optimization made fast through modern automatic differentiation
Aircraft design optimization made fast through modern automatic differentiation

Aircraft design optimization made fast through modern automatic differentiation. Plug-and-play analysis tools for aerodynamics, propulsion, structures, trajectory design, and much more.

Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Optimization Algorithm,Immune Algorithm, Artificial Fish Swarm Algorithm, Differential Evolution and TSP(Traveling salesman)
Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Optimization Algorithm,Immune Algorithm, Artificial Fish Swarm Algorithm, Differential Evolution and TSP(Traveling salesman)

scikit-opt Swarm Intelligence in Python (Genetic Algorithm, Particle Swarm Optimization, Simulated Annealing, Ant Colony Algorithm, Immune Algorithm,A

library for nonlinear optimization, wrapping many algorithms for global and local, constrained or unconstrained, optimization

NLopt is a library for nonlinear local and global optimization, for functions with and without gradient information. It is designed as a simple, unifi

Racing line optimization algorithm in python that uses Particle Swarm Optimization.
Racing line optimization algorithm in python that uses Particle Swarm Optimization.

Racing Line Optimization with PSO This repository contains a racing line optimization algorithm in python that uses Particle Swarm Optimization. Requi

Puzzle-CAM: Improved localization via matching partial and full features.
Puzzle-CAM: Improved localization via matching partial and full features.

Puzzle-CAM The official implementation of "Puzzle-CAM: Improved localization via matching partial and full features".

[ECCVW2020] Robust Long-Term Object Tracking via Improved Discriminative Model Prediction (RLT-DiMP)
[ECCVW2020] Robust Long-Term Object Tracking via Improved Discriminative Model Prediction (RLT-DiMP)

Feel free to visit my homepage Robust Long-Term Object Tracking via Improved Discriminative Model Prediction (RLT-DIMP) [ECCVW2020 paper] Presentation

Comments
  • Conda env create not working

    Conda env create not working

    When I type in the command as instructed in how to run, I get this error:

    Warning: you have pip-installed dependencies in your environment file, but you do not list pip itself as one of your conda dependencies. Conda may not use the correct pip to install your packages, and they may end up in the wrong place. Please add an explicit pip dependency. I'm adding one for you, but still nagging you. Collecting package metadata (repodata.json): done Solving environment: failed

    ResolvePackageNotFound:

    • libcxx==12.0.0=h2f01273_0
    • python==3.10.4=hdfd78df_0
    • openssl==1.1.1q=hca72f7f_0
    • ncurses==6.3=hca72f7f_3
    • readline==8.1.2=hca72f7f_1
    • bzip2==1.0.8=h1de35cc_0
    • ca-certificates==2022.07.19=hecd8cb5_0
    • xz==5.2.5=hca72f7f_1
    • libffi==3.3=hb1e8313_2
    • zlib==1.2.12=h4dc903c_2
    • sqlite==3.38.5=h707629a_0
    • tk==8.6.12=h5d9f67b_0
    opened by Pixelatory 1
  • May the internal information of gifford data leads to a bias results given by model?

    May the internal information of gifford data leads to a bias results given by model?

    I'm very intersted in your work and analysize the gifford data. Firstly, I use the CD-HIT( a Cluster tool) split into different clusters.Then, I chose the sequence (comes the Clsuter-1(a cluster subset contaiing similar sequences given by CD-HIT)) with highest enrich value as a baseline, and focus on the residue difference between it and others sequences. Very interstingly, i find those sequences that containg 2 or 3 different residues compared to baseline sequence, usually have high enrichments. In Top-100 high enrichments, it can at 65%. As i know, your work is a multitask that both focus on generation and prediction. **I wonder that whether the JT-VAE tends to produce the new sequences that different from the corresponding baseline sequence with highest enrichment about 2 or 3 different residues , and the prediction neural network may think such sequences are good results.**It means that the model only need to realize the fact that compared to high enrich sequnces,the new sequnces contain 2 or 3 different residues is good enough. Beacuse i not find your results, i hope you can give me some advices.

    opened by chengyunzhang 0
Releases(v1.0)
Owner
Krishnaswamy Lab
Krishnaswamy Lab
🏅 The Most Comprehensive List of Kaggle Solutions and Ideas 🏅

🏅 Collection of Kaggle Solutions and Ideas 🏅

Farid Rashidi 2.3k Jan 08, 2023
SEAN: Image Synthesis with Semantic Region-Adaptive Normalization (CVPR 2020, Oral)

SEAN: Image Synthesis with Semantic Region-Adaptive Normalization (CVPR 2020 Oral) Figure: Face image editing controlled via style images and segmenta

Peihao Zhu 579 Dec 30, 2022
HandFoldingNet ✌️ : A 3D Hand Pose Estimation Network Using Multiscale-Feature Guided Folding of a 2D Hand Skeleton

HandFoldingNet ✌️ : A 3D Hand Pose Estimation Network Using Multiscale-Feature Guided Folding of a 2D Hand Skeleton Wencan Cheng, Jae Hyun Park, Jong

cwc1260 23 Oct 21, 2022
AAAI 2022: Stationary diffusion state neural estimation

Stationary Diffusion State Neural Estimation Although many graph-based clustering methods attempt to model the stationary diffusion state in their obj

绽琨 33 Nov 24, 2022
Source code of AAAI 2022 paper "Towards End-to-End Image Compression and Analysis with Transformers".

Towards End-to-End Image Compression and Analysis with Transformers Source code of our AAAI 2022 paper "Towards End-to-End Image Compression and Analy

37 Dec 21, 2022
“英特尔创新大师杯”深度学习挑战赛 赛道3:CCKS2021中文NLP地址相关性任务

基于 bert4keras 的一个baseline 不作任何 数据trick 单模 线上 最高可到 0.7891 # 基础 版 train.py 0.7769 # transformer 各层 cls concat 明神的trick https://xv44586.git

孙永松 7 Dec 28, 2021
RCDNet: A Model-driven Deep Neural Network for Single Image Rain Removal (CVPR2020)

RCDNet: A Model-driven Deep Neural Network for Single Image Rain Removal (CVPR2020) Hong Wang, Qi Xie, Qian Zhao, and Deyu Meng [PDF] [Supplementary M

Hong Wang 6 Sep 27, 2022
🔪 Elimination based Lightweight Neural Net with Pretrained Weights

ELimNet ELimNet: Eliminating Layers in a Neural Network Pretrained with Large Dataset for Downstream Task Removed top layers from pretrained Efficient

snoop2head 4 Jul 12, 2022
This reporistory contains the test-dev data of the paper "xGQA: Cross-lingual Visual Question Answering".

This reporistory contains the test-dev data of the paper "xGQA: Cross-lingual Visual Question Answering".

AdapterHub 18 Dec 09, 2022
Council-GAN - Implementation for our paper Breaking the Cycle - Colleagues are all you need (CVPR 2020)

Council-GAN Implementation of our paper Breaking the Cycle - Colleagues are all you need (CVPR 2020) Paper Ori Nizan , Ayellet Tal, Breaking the Cycle

ori nizan 260 Nov 16, 2022
The Fundamental Clustering Problems Suite (FCPS) summaries 54 state-of-the-art clustering algorithms, common cluster challenges and estimations of the number of clusters as well as the testing for cluster tendency.

FCPS Fundamental Clustering Problems Suite The package provides over sixty state-of-the-art clustering algorithms for unsupervised machine learning pu

9 Nov 27, 2022
PyTorch implementation for ACL 2021 paper "Maria: A Visual Experience Powered Conversational Agent".

Maria: A Visual Experience Powered Conversational Agent This repository is the Pytorch implementation of our paper "Maria: A Visual Experience Powered

Jokie 22 Dec 12, 2022
Fast Soft Color Segmentation

Fast Soft Color Segmentation

3 Oct 29, 2022
An implementation of the AlphaZero algorithm for Gomoku (also called Gobang or Five in a Row)

AlphaZero-Gomoku This is an implementation of the AlphaZero algorithm for playing the simple board game Gomoku (also called Gobang or Five in a Row) f

Junxiao Song 2.8k Dec 26, 2022
Sample code and notebooks for Vertex AI, the end-to-end machine learning platform on Google Cloud

Google Cloud Vertex AI Samples Welcome to the Google Cloud Vertex AI sample repository. Overview The repository contains notebooks and community conte

Google Cloud Platform 560 Dec 31, 2022
A library for uncertainty representation and training in neural networks.

Epistemic Neural Networks A library for uncertainty representation and training in neural networks. Introduction Many applications in deep learning re

DeepMind 211 Dec 12, 2022
Plotting points that lie on the intersection of the given curves using gradient descent.

Plotting intersection of curves using gradient descent Webapp Link --- What's the app about Why this app Plotting functions and their intersection. A

Divakar Verma 2 Jan 09, 2022
Personal thermal comfort models using digital twins: Preference prediction with BIM-extracted spatial-temporal proximity data from Build2Vec

Personal thermal comfort models using digital twins: Preference prediction with BIM-extracted spatial-temporal proximity data from Build2Vec This repo

Building and Urban Data Science (BUDS) Group 5 Dec 02, 2022
[CVPR'21] FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space

FedDG: Federated Domain Generalization on Medical Image Segmentation via Episodic Learning in Continuous Frequency Space by Quande Liu, Cheng Chen, Ji

Quande Liu 178 Jan 06, 2023
Code for T-Few from "Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning"

T-Few This repository contains the official code for the paper: "Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learni

220 Dec 31, 2022