Snakemake pipeline for running SEDEF on haplotype-resolved assemblies and generating final segmental duplication BED and BigBed files.
This workflow is adapted from DongAhn’s segmental duplication pipeline and is mainly intended for internal use in the lab.
This pipeline takes:
- haplotype FASTA files
- RepeatMasker BED
- TRF BED
and performs:
- FASTA preparation and indexing
- WindowMasker
- repeat BED processing
- FASTA masking
- SEDEF
- downstream filtering
- final BED and BigBed generation
The pipeline expects a tab-delimited manifest.tab file in the working directory.
Required columns:
SAMPLEH1H2RMTRF
Example:
SAMPLE H1 H2 RM TRF
ASM ASM/hap1.fasta ASM/hap2.fasta ASM/RM.bed.gz ASM/trf.bed.gzOutputs are written under:
results/{sample}/
Main final outputs:
results/{sample}/final_outputs/beds/{hap}.SDs.bedresults/{sample}/final_outputs/beds/{hap}.SDs.lowid.bedresults/{sample}/final_outputs/bigbeds/{hap}.SDs.bbresults/{sample}/final_outputs/bigbeds/{hap}.SDs.lowid.bb
./runlocal 8./runcluster 30- This pipeline is mainly intended for the Eichler lab compute environment.
- Contig names containing
#are temporarily rewritten during intermediate steps and restored in final outputs. RMandTRFare expected to be RepeatMasker and TRF BED outputs could be generated by the Rhodonite annotation workflow https://github.com/vollgerlab/Rhodonite.- Sample and haplotype expansion is driven by
manifest.tab.
.
├── Snakefile
├── runlocal
├── runcluster
├── rules/
├── scripts/
├── schema/
└── resources/