nextPARS, a novel Illumina-based implementation of in-vitro parallel probing of RNA structures.

Related tags

Deep LearningnextPARS
Overview

nextPARS, a novel Illumina-based implementation of in-vitro parallel probing of RNA structures.

Here you will find the scripts necessary to produce the scores described in our paper from fastq files obtained during the experiment.

Install Prerequisites

First install git:

sudo apt-get update
sudo apt-get install git-all

Then clone this repository

git clone https://github.com/jwill123/nextPARS.git

Now, ensure the necessary python packages are installed, and can be found in the $PYTHONPATH environment variable by running the script packages_for_nextPARS.sh in the nextPARS directory.

cd nextPARS/conf
chmod 775 packages_for_nextPARS.sh
./packages_for_nextPARS.sh

Convert fastq to tab

In order to go from the fastq outputs of the nextPARS experiments to a format that allows us to calculate scores, first map the reads in the fastq files to a reference using the program of your choice. Once you have obtained a bam file, use PARSParser_0.67.b.jar. This program counts the number of reads beginning at each position (which indicates a cut site for the enzyme in the file name) and outputs it in .tab format (count values for each position are separated by semi-colons).

Example usage:

java -jar PARSParser_0.67.b.jar -a bamFile -b bedFile -out outFile -q 20 -m 5

where the required arguments are:

  • -a gives the bam file of interest
  • -b is the bed file for the reference
  • -out is the name given to the output file in .tab format

Also accepts arguments:

  • -q for minimum mapping quality for reads to be included [default = 0]
  • -m for minimum average counts per position for a given transcript [default = 5.0]

Sample Data

There are sample data files found in the folder nextPARS/data, as well as the necessary fasta files in nextPARS/data/SEQS/PROBES, and the reference structures obtained from PDB in nextPARS/data/STRUCTURES/REFERENCE_STRUCTURES There are also 2 folders of sample output files from the PARSParser_0.67.b.jar program that can be used as further examples of the nextPARS score calculations described below. These folders are found in nextPARS/data/PARSParser_outputs. NOTE: these are randomly generated sequences with random enzyme values, so they are just to be used as examples for the usage of the scripts, good results should not be expected with these.

nextPARS Scores

To obtain the scores from nextPARS experiments, use the script get_combined_score.py. Sample data for the 5 PDB control structures can be found in the folder nextPARS/data/

There are a number of different command line options in the script, many of which were experimental or exploratory and are not relevant here. The useful ones in this context are the following:

  • Use the -i option [REQUIRED] to indicate the molecule for which you want scores (all available data files will be included in the calculations -- molecule name must match that in the data file names)

  • Use the -inDir option to indicate the directory containing the .tab files with read counts for each V1 and S1 enzyme cuts

  • Use the -f option to indicate the path to the fasta file for the input molecule

  • Use the -s option to produce an output Structure Preference Profile (SPP) file. Values for each position are separated by semi-colons. Here 0 = paired position, 1 = unpaired position, and NA = position with a score too low to determine its configuration.

  • Use the -o option to output the calculated scores, again with values for each position separated by semi-colons.

  • Use the --nP_only option to output the calculated nextPARS scores before incorporating the RNN classifier, again with values for each position separated by semi-colons.

  • Use the option {-V nextPARS} to produce an output with the scores that is compatible with the structure visualization program VARNA1

  • Use the option {-V spp} to produce an output with the SPP values that is compatible with VARNA.

  • Use the -t option to change the threshold value for scores when determining SPP values [default = 0.8, or -0.8 for negative scores]

  • Use the -c option to change the percentile cap for raw values at the beginning of calculations [default = 95]

  • Use the -v option to print some statistics in the case that there is a reference CT file available ( as with the example molecules, found in nextPARS/data/STRUCTURES/REFERENCE_STRUCTURES ). If not, will still print nextPARS scores and info about the enzyme .tab files included in the calculations.

Example usage:

# to produce an SPP file for the molecule TETp4p6
python get_combined_score.py -i TETp4p6 -s
# to produce a Varna-compatible output with the nextPARS scores for one of the 
# randomly generated example molecules
python get_combined_score.py -i test_37 -inDir nextPARS/data/PARSParser_outputs/test1 \
  -f nextPARS/data/PARSParser_outputs/test1/test1.fasta -V nextPARS

RNN classifier (already incorporated into the nextPARS scores above)

To run the RNN classifier separately, using a different experimental score input (in .tab format), it can be run like so with the predict2.py script:

python predict2.py -f molecule.fasta -p scoreFile.tab -o output.tab

Where the command line options are as follows:

  • the -f option [REQUIRED] is the input fasta file
  • the -p option [REQUIRED] is the input Score tab file
  • the -o option [REQUIRED] is the final Score tab output file.
  • the -w1 option is the weight for the RNN score. [default = 0.5]
  • the -w2 option is the weight for the experimental data score. [default = 0.5]

References:

  1. Darty,K., Denise,A. and Ponty,Y. (2009) VARNA: Interactive drawing and editing of the RNA secondary structure. Bioinforma. Oxf. Engl., 25, 1974–197
Owner
Jesse Willis
Jesse Willis
Official Code for VideoLT: Large-scale Long-tailed Video Recognition (ICCV 2021)

Pytorch Code for VideoLT [Website][Paper] Updates [10/29/2021] Features uploaded to Google Drive, for access please send us an e-mail: zhangxing18 at

Skye 26 Sep 18, 2022
Simple tools for logging and visualizing, loading and training

TNT TNT is a library providing powerful dataloading, logging and visualization utilities for Python. It is closely integrated with PyTorch and is desi

1.5k Jan 02, 2023
Tensorflow 2.x implementation of Panoramic BlitzNet for object detection and semantic segmentation on indoor panoramic images.

Deep neural network for object detection and semantic segmentation on indoor panoramic images. The implementation is based on the papers:

Alejandro de Nova Guerrero 9 Nov 24, 2022
Vision transformers (ViTs) have found only limited practical use in processing images

CXV Convolutional Xformers for Vision Vision transformers (ViTs) have found only limited practical use in processing images, in spite of their state-o

Cloudwalker 23 Sep 10, 2022
Implementation for our ICCV 2021 paper: Dual-Camera Super-Resolution with Aligned Attention Modules

DCSR: Dual Camera Super-Resolution Implementation for our ICCV 2021 oral paper: Dual-Camera Super-Resolution with Aligned Attention Modules paper | pr

Tengfei Wang 110 Dec 20, 2022
code for paper"A High-precision Semantic Segmentation Method Combining Adversarial Learning and Attention Mechanism"

PyTorch implementation of UAGAN(U-net Attention Generative Adversarial Networks) This repository contains the source code for the paper "A High-precis

Tong 8 Apr 25, 2022
Official implementation of NeuralFusion: Online Depth Map Fusion in Latent Space

NeuralFusion This is the official implementation of NeuralFusion: Online Depth Map Fusion in Latent Space. We provide code to train the proposed pipel

53 Jan 01, 2023
Official PyTorch implementation of the paper "Likelihood Training of Schrödinger Bridge using Forward-Backward SDEs Theory (SB-FBSDE)"

Official PyTorch implementation of the paper "Likelihood Training of Schrödinger Bridge using Forward-Backward SDEs Theory (SB-FBSDE)" which introduces a new class of deep generative models that gene

Guan-Horng Liu 43 Jan 03, 2023
Pytorch implementation of the paper DocEnTr: An End-to-End Document Image Enhancement Transformer.

DocEnTR Description Pytorch implementation of the paper DocEnTr: An End-to-End Document Image Enhancement Transformer. This model is implemented on to

Mohamed Ali Souibgui 74 Jan 07, 2023
ZEBRA: Zero Evidence Biometric Recognition Assessment

ZEBRA: Zero Evidence Biometric Recognition Assessment license: LGPLv3 - please reference our paper version: 2020-06-11 author: Andreas Nautsch (EURECO

Voice Privacy Challenge 2 Dec 12, 2021
Tackling data scarcity in Speech Translation using zero-shot multilingual Machine Translation techniques

Tackling data scarcity in Speech Translation using zero-shot multilingual Machine Translation techniques This repository is derived from the NMTGMinor

Tu Anh Dinh 1 Sep 07, 2022
NHL 94 AI contests

nhl94-ai The end goals of this project is to: Train Models that play NHL 94 Support AI vs AI contests in NHL 94 Provide an improved AI opponent for NH

Mathieu Poliquin 2 Dec 06, 2021
A Unified Framework and Analysis for Structured Knowledge Grounding

UnifiedSKG 📚 : Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models Code for paper UnifiedSKG: Unifying and Mu

HKU NLP Group 370 Dec 21, 2022
A computational block to solve entity alignment over textual attributes in a knowledge graph creation pipeline.

How to apply? Create your config.ini file following the example provided in config.ini Choose one of the options below to run: Run with Python3 pip in

Scientific Data Management Group 3 Jun 23, 2022
⚾🤖⚾ Automatic baseball pitching overlay in realtime

⚾ Automatically overlaying pitch motion and trajectory with machine learning! This project takes your baseball pitching clips and automatically genera

Tony Chou 240 Dec 05, 2022
Joint Unsupervised Learning (JULE) of Deep Representations and Image Clusters.

Joint Unsupervised Learning (JULE) of Deep Representations and Image Clusters. Overview This project is a Torch implementation for our CVPR 2016 paper

Jianwei Yang 278 Dec 25, 2022
Contains code for the paper "Vision Transformers are Robust Learners".

Vision Transformers are Robust Learners This repository contains the code for the paper Vision Transformers are Robust Learners by Sayak Paul* and Pin

Sayak Paul 103 Jan 05, 2023
This repository contains the source code of Auto-Lambda and baselines from the paper, Auto-Lambda: Disentangling Dynamic Task Relationships.

Auto-Lambda This repository contains the source code of Auto-Lambda and baselines from the paper, Auto-Lambda: Disentangling Dynamic Task Relationship

Shikun Liu 76 Dec 20, 2022
pytorch bert intent classification and slot filling

pytorch_bert_intent_classification_and_slot_filling 基于pytorch的中文意图识别和槽位填充 说明 基本思路就是:分类+序列标注(命名实体识别)同时训练。 使用的预训练模型:hugging face上的chinese-bert-wwm-ext 依

西西嘛呦 33 Dec 15, 2022
SAFL: A Self-Attention Scene Text Recognizer with Focal Loss

SAFL: A Self-Attention Scene Text Recognizer with Focal Loss This repository implements the SAFL in pytorch. Installation conda env create -f environm

6 Aug 24, 2022