Nextflow pipeline used for aligning nanopore modBAM files using minimap2 (preserving modification tags) and summarising modifications using modkit.
The output (raw) modBAMs from MinKNOW were first merged and sorted using samtools (1.19.1):
samtools merge - data/bam_pass/*.bam -@ 16 | samtools sort -@ 16 -o PBG54229_pass_2aeee96b_4a0fe7c1_merged.bamThe pipeline:
- Converts modBAMs to fastq files preserving RNA modification tags using samtools and aligns to a reference (genome or transcriptome) using minimap2
- Runs 'pileup' (summarises) RNA modifications on merged BAMs with modkit & outputs as bedRmod format. Specifically m6A, m5C, inosine & pseudouridine mods.
The pipeline is written to run on an HPC cluster, or AWS Server/EC2 instance with all data locally available on the server and requires Nextflow (≥22.04) and Docker.
Uses a samplesheet to specify inputs (modbam paths, sample_id and reference against which to align the data). Use unique sample_id values to avoid filename collisions. The same sample can be aligned to different references by using different row entries.
Example samplesheet.csv:
sample_id,reference_type,reference,bam
sample1_gm,genome,/path/to/GRCh38.fa.gz,/path/to/sample1.bam
sample1_tx,transcriptome,/path/to/gencode.fa.gz,/path/to/sample1.bamRun:
nextflow run main.nf \
-profile aws_ec2,docker \
--samplesheet samplesheet.csv \
--outdir results/Includes a convenience script to run the nextflow command on an EC2 instance or an HPC cluster using slurm scheduler & singularity.
bash run_ec2.shsbatch run_slurm.shresults/
├── alignment/ # Individual aligned BAMs
│ ├── sample1_gm.bam
│ └── sample1_tx.bam
├── bedRmod/ # Modification calls
│ ├── sample1_gm_bedRmod.bed.gz
│ ├── sample1_tx_bedRmod.bed.gz
├── reference/ # Processed references
└── pipeline_info/ # Execution reports