Skip to content

NCBI-Hackathons/AssesSV

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

55 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

alt text

Assessing contribution of depth, read quality, and algorithm on Structural Variation calling.

Structural Variants (SVs), defined as insertions and deletions greater than 50 bp, inversions, and translocations, account for the majority of human genomic variation by total base pair count and have been implicated in genomic disorders.

Pacific Biosciences has recently released two ~30-fold datasets of high-fidelity long reads, with mean insert sizes of 10 and 15 kb and read qualities above 99%. Because these high fidelity reads have a different error profile from PacBio subreads, it is unclear whether they will have similar performance in long read SV callers. To test this, we cover the parameter space by comparing SV calls (using truvari) across error modes (high-fidelity vs raw subreads), read length (10 kbp vs 15 kbps vs 10-50 kbp ), aligner (minimap2 vs NGM-LR), and variant caller (Sniffles vs pbsv).

Datasets:

  • PacBio HG002 High-Fidelity QV20 10 kbp CCS, ~30-fold coverage

  • PacBio HG002 High-Fidelity QV20 15 kbp CCS, ~30-fold coverage

  • PacBio HG002 10-50 kbp Subreads, ~70-fold coverage

  • HG002 Tier 1 SV calls, version 0.6

Conditions:

Depth titration:

Starting at ~30-fold, downsample to 20-fold, 10-fold, 5-fold coverage

Aligner:

NGMLR settings:

ngmlr --presets pacbio \
	--query "${FASTQ}" \
	--reference "${REF}" \
	--rg-id "${SAMPLE}_${FASTQ##*/}" \
	--rg-sm "${SAMPLE}"

minimap2 settings:

minimap2 -t 16 \
        -a -k 19 \
        -O 5,56 -E 4,1 \
        -B 5 -z 400,50 -r 2k \
        --eqx --MD -Y \
        --secondary=no \
        -R "@RG\tID:${SAMPLE}_${FASTQ##*/}\tSM:${SAMPLE}" \
        "${REF}" "${FASTQ}"

Caller:

Sniffles settings:

sniffles -s 3 --skip_parameter_estimation -m !{bam} -v "!{basename}.!{depth}x.Sniffles_s3_ignoreParam.vcf"

pbsv settings:

pbsv discover !{bam} !{basename}.!{depth}x.pbsv.svsig.gz
pbsv call !{ref} !{basename}.!{depth}x.pbsv.svsig.gz !{basename}.!{depth}x.pbsv.vcf

Workflow:

alt text

Running the pipeline

  • the nextflow pipeline can be run by modifying the run_pipeline.sh script for your local environment
  • for full reproducibility you'll need to create a conda environment like this:
    • conda create --name assessv --file conda_environment.txt

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages