The workflow requires a genome reference and annotation to run. All files and settings for a genome are defined in a genome.json file. When launching the workflow, the reference genome is selected by passing the path to a specific genome.json in the --genome parameter.
The genome.json file includes
| Field | Description | Required? | Example |
|---|---|---|---|
| name | The name of the species / genome-version | Required | human |
| bsbolt_index | Path to the BSBolt index directory | Required (one index required) | /PATH/TO/bsbolt.ref |
| bwa_index | Path to the bwa-meth (bwa-mem2) index directory | Required (one index required) | /PATH/TO/bwa-meth/bwa-mem2.ref |
| parabricks_index | Path to the parabricks (bwa-meth/bwa-mem) index directory | Required (one index required) | /PATH/TO/bwa-meth/bwa-mem.ref |
| ref_fasta | Name of fasta file within bwa-meth/parabricks index | Required (with bwa_index or parabricks_index) | hg38.fa |
| genomeTiles | Path to binned genome sorted bed file for CG matrix | Optional | /PATH/TO/50kbp.bed |
| genomeTilesCH | Path to binned genome sorted bed file for CH matrix | Optional | /PATH/TO/100kbp.bed |
| filter_chrs | Path to tsv chromosome labels (mito, filter) to filter from deduplicated BAM | Required | /PATH/TO/filter_chrs.tsv |
| tssWin | Path to bed sorted 200nt windows centered at TSS | Required unless runTssEnrich: false | /PATH/TO/tss.bed |
| backgroundWin | Path to bed sorted 200nt windows centered at TSS -1kb upstream | Required unless runTssEnrich: false | /PATH/TO/background.bed |
- All files (
bsbolt_index,bins, ...) can be specified either as- an absolute path (
/path/to/genome) - a relative path starting from the location of the
genome.jsonfile (genes/bins.bed) - a AWS S3 url (s3://path/to/genome)
- an absolute path (
The bwa-meth index needs to be built with bwa-meth version >= 0.2.7 and bwa-mem2 version >= 2.2.1. See the bwa-meth documentation for additional options. An example command would be
bwameth.py index-mem2 {fasta reference}
The parabricks index needs to be built with bwa-meth version >= 0.2.7 and bwa-mem version >= 0.7.19. See the bwa-meth documentation for additional options. An example command would be
bwameth.py index {fasta reference}
The BSBolt index needs to be built with BSBolt version >= 1.5.0. See the BSbolt documentation for additional options. An example command would be
bsbolt Index -G {fasta reference} -DB {database output}
All genomic non-overlapping bins in the provided genomeTiles and genomeTilesCH bed files are used as features for CG and CH methylation matrix generation.
- To use bed files with different sized bins (default 50kb) for CG matrix, pass to genomeTiles in the
genome.json. - To pass a different size bins (default 250kb) for CH matrix, pass to genomeTilesCH in the
genome.json.
Pre-built reference genomes are available for download:
- Human:
- Mouse:
- bwa-meth (default): https://s3.us-east-2.amazonaws.com/scale.pub/genomes/methyl/mm39-bwa.tgz
- parabricks (GPU): https://s3.us-east-2.amazonaws.com/scale.pub/genomes/methyl/mm39-parabricks.tgz
- bsbolt (legacy): https://s3.us-east-2.amazonaws.com/scale.pub/genomes/methyl/mm39-bsbolt.tgz
- Human/Mouse Barnyard:
- bwa-meth (default): https://s3.us-east-2.amazonaws.com/scale.pub/genomes/methyl/grch38_mm39-bwa.tgz
- parabricks (GPU): https://s3.us-east-2.amazonaws.com/scale.pub/genomes/methyl/grch38_mm39-parabricks.tgz
- bsbolt (legacy): https://s3.us-east-2.amazonaws.com/scale.pub/genomes/methyl/grch38_mm39-bsbolt.tgz
Download these to your analysis server, unpack them and then use e.g.
--genome /PATH/TO/unpacked/folder/genome.json