Skip to content

GoekeLab/rnome-minimal-nextflow-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RNome Minimal Modification Analysis Pipeline

Nextflow pipeline used for aligning nanopore modBAM files using minimap2 (preserving modification tags) and summarising modifications using modkit.

Data processing notes:

The output (raw) modBAMs from MinKNOW were first merged and sorted using samtools (1.19.1):

 samtools merge -  data/bam_pass/*.bam -@ 16 | samtools sort -@ 16 -o PBG54229_pass_2aeee96b_4a0fe7c1_merged.bam

Overview

The pipeline:

  1. Converts modBAMs to fastq files preserving RNA modification tags using samtools and aligns to a reference (genome or transcriptome) using minimap2
  2. Runs 'pileup' (summarises) RNA modifications on merged BAMs with modkit & outputs as bedRmod format. Specifically m6A, m5C, inosine & pseudouridine mods.

The pipeline is written to run on an HPC cluster, or AWS Server/EC2 instance with all data locally available on the server and requires Nextflow (≥22.04) and Docker.

Running the pipeline

Uses a samplesheet to specify inputs (modbam paths, sample_id and reference against which to align the data). Use unique sample_id values to avoid filename collisions. The same sample can be aligned to different references by using different row entries.

Example samplesheet.csv:

sample_id,reference_type,reference,bam
sample1_gm,genome,/path/to/GRCh38.fa.gz,/path/to/sample1.bam
sample1_tx,transcriptome,/path/to/gencode.fa.gz,/path/to/sample1.bam

Run:

nextflow run main.nf \
    -profile aws_ec2,docker \
    --samplesheet samplesheet.csv \
    --outdir results/

Includes a convenience script to run the nextflow command on an EC2 instance or an HPC cluster using slurm scheduler & singularity.

bash run_ec2.sh
sbatch run_slurm.sh

Output Structure

results/
├── alignment/              # Individual aligned BAMs
│   ├── sample1_gm.bam
│   └── sample1_tx.bam
├── bedRmod/               # Modification calls
│   ├── sample1_gm_bedRmod.bed.gz
│   ├── sample1_tx_bedRmod.bed.gz
├── reference/             # Processed references
└── pipeline_info/         # Execution reports

About

Nextflow pipeline used for aligning nanopore modBAM files using minimap2 (preserving modification tags) and summarising modifications using modkit. Used to process RNome data

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors