Structural Variants (SVs), defined as insertions and deletions greater than 50 bp, inversions, and translocations, account for the majority of human genomic variation by total base pair count and have been implicated in genomic disorders.
Pacific Biosciences has recently released two ~30-fold datasets of high-fidelity long reads, with mean insert sizes of 10 and 15 kb and read qualities above 99%. Because these high fidelity reads have a different error profile from PacBio subreads, it is unclear whether they will have similar performance in long read SV callers. To test this, we cover the parameter space by comparing SV calls (using truvari) across error modes (high-fidelity vs raw subreads), read length (10 kbp vs 15 kbps vs 10-50 kbp ), aligner (minimap2 vs NGM-LR), and variant caller (Sniffles vs pbsv).
-
PacBio HG002 High-Fidelity QV20 10 kbp CCS, ~30-fold coverage
-
PacBio HG002 High-Fidelity QV20 15 kbp CCS, ~30-fold coverage
-
PacBio HG002 10-50 kbp Subreads, ~70-fold coverage
Starting at ~30-fold, downsample to 20-fold, 10-fold, 5-fold coverage
ngmlr --presets pacbio \
--query "${FASTQ}" \
--reference "${REF}" \
--rg-id "${SAMPLE}_${FASTQ##*/}" \
--rg-sm "${SAMPLE}"minimap2 -t 16 \
-a -k 19 \
-O 5,56 -E 4,1 \
-B 5 -z 400,50 -r 2k \
--eqx --MD -Y \
--secondary=no \
-R "@RG\tID:${SAMPLE}_${FASTQ##*/}\tSM:${SAMPLE}" \
"${REF}" "${FASTQ}"sniffles -s 3 --skip_parameter_estimation -m !{bam} -v "!{basename}.!{depth}x.Sniffles_s3_ignoreParam.vcf"pbsv discover !{bam} !{basename}.!{depth}x.pbsv.svsig.gz
pbsv call !{ref} !{basename}.!{depth}x.pbsv.svsig.gz !{basename}.!{depth}x.pbsv.vcf- the nextflow pipeline can be run by modifying the run_pipeline.sh script for your local environment
- for full reproducibility you'll need to create a conda environment like this:
- conda create --name assessv --file conda_environment.txt

