Nextflow Version: 22.10.6
A HiCCUPS-based pipeline for simultaneous chromatin loop calling and cross-sample comparison, where both steps reinforce each other.
The unionloops pipeline provides:
- Enhanced sensitivity of loop detection using cross-sample evidence.
- Improved loop positional precision relative to CTCF/RAD21 sites.
- Loop annotation via clustering: shared vs. sample-specific.
- Loop strength quantification across all samples.
# Check Conda version
conda --version
# Show current solver configuration ("classic" or "libmamba")
conda config --show solver
# Note: On older Conda versions, if nothing is printed, it defaults to "classic".
# Make sure your current solver configuration is "libmamba", which is much faster than "classic".# Example: Update to Conda v25.7.0 and force reinstall in the base environment
conda install -n base -c defaults conda=25.7.0 --force-reinstall# Verify Conda version after update
conda --version
# Check solver again (default on modern Conda is "libmamba")
conda config --show solverconda env create -f nextflow_env.ymlconda env create -f unionloops_env.ymlRequired: prepare a TSV file (e.g., mcool_paths.tsv) with sample names and .mcool file paths:
name path
sample1 /full/path/to/sample1.mcool
sample2 /full/path/to/sample2.mcool
sample3 /full/path/to/sample3.mcool
MEGA /full/path/to/MEGA.mcool # Optional: merged high-resolution mapFor optimal performance, also consider including a single merged MEGA.mcool file that combines all samples. The high signal-to-noise ratio of the MEGA map can:
- Improve the rescue of sample-specific loops.
- Enhance the precision of loop detection.
Use
.mcoolfiles generated from distiller v0.3.3 (distiller-nf) for best compatibility.
Optional: UnionLoops (v1.1.0) now supports merging loops from multiple datasets generated by external loop callers. Users must provide a TSV file (e.g., external_loop_paths.tsv) containing sample names (matching those in mcool_paths.tsv) and paths to .bedpe files.
name path
sample1 /full/path/to/sample1_loops.bedpe
sample2 /full/path/to/sample2_loops.bedpe
sample3 /full/path/to/sample3_loops.bedpe
MEGA /full/path/to/MEGA_loops.bedpe # Optional: high-confidence loops with a high signal-to-noise ratioYou can launch unionloops using different hardware profiles:
- Default hardware profile (
configs/local.config) with yourmcool_paths.tsvand conda envunionloops-nf:
nextflow run /full/path/to/unionloops-nf/unionloops.nf \
-ansi-log false \
--input_cooler_paths /full/path/to/mcool_paths.tsv \
--outfilename union_loop_list.tsv \
--conda_env /full/path/to/miniconda3/envs/unionloops-nfclusterhardware profile (configs/cluster.config) with yourmcool_paths.tsvand conda envunionloops-nf:
nextflow run /full/path/to/unionloops-nf/unionloops.nf \
-profile cluster \
-ansi-log false \
--input_cooler_paths /full/path/to/mcool_paths.tsv \
--outfilename union_loop_list.tsv \
--conda_env /full/path/to/miniconda3/envs/unionloops-nfcustomhardware profile with your own configuration file with yourmcool_paths.tsvand conda envunionloops-nf:
nextflow run /full/path/to/unionloops-nf/unionloops.nf \
-profile custom --custom_config /full/path/to/your.config \
-ansi-log false \
--input_cooler_paths /full/path/to/mcool_paths.tsv \
--outfilename union_loop_list.tsv \
--conda_env /full/path/to/miniconda3/envs/unionloops-nfYou may override default parameters defined in nextflow.config as needed (see parameters section).
By default, output files will be saved in the results/ directory relative to your working directory.
results/
├── enriched_pixels/ # Enriched pixels per sample (only for built-in HiCCUPS)
│ ├── sample1.enriched.pixels.resolution.10kb.tsv
│ ├── sample2.enriched.pixels.resolution.10kb.tsv
│ ├── sample3.enriched.pixels.resolution.10kb.tsv
│ └── MEGA.enriched.pixels.resolution.10kb.tsv
│
├── clusters/ # Clustering results of pooled enriched pixels across all samples (only for built-in HiCCUPS)
│ ├── centroids_of_clusters_of_enriched_pixels.resolution.10kb.tsv # Without additional filtering
│ ├── clusters_of_enriched_pixels.resolution.10kb.tsv
│ └── enriched_pixels_meta.tsv
│
└── union_loop_list_10kb.tsv # Final union list of loops (for built-in HiCCUPS or external loops)
└── clusters_of_external_loops.resolution.10kb.tsv # only when external_loop_paths.tsv is provided
| Column | Description |
|---|---|
| chr1 | Chromosome of anchor 1 |
| start1 | Start position of anchor 1 |
| end1 | End position of anchor 1 |
| chr2 | Chromosome of anchor 2 |
| start2 | Start position of anchor 2 |
| end2 | End position of anchor 2 |
| sample_name | Detected sample(s); joined with & if multiple |
| sample1 | Loop strength in sample1 |
| sample2 | Loop strength in sample2 |
| sample3 | Loop strength in sample3 |
| MEGA | Loop strength in MEGA |
| Parameter | Description |
|---|---|
| input_cooler_paths | TSV with sample names and .mcool paths |
| outfilename | Output filename for union loop list |
| conda_env | Path to conda environment for pipeline |
| Parameter | Default | Description |
|---|---|---|
| assembly_name | hg38 | Genome assembly name from UCSC database |
| input_loop_paths | null | TSV with sample names and .bedpe paths for external loop lists |
| resolution | 10000 | Resolution (bp) must be ≥ 4,000 for built-in HiCCUPS, or match the resolution of provided external loops |
| outdir | results | Output directory |
| custom_config | custom.config | Custom Nextflow config |
| clr_weight_name | weight | Used by cooltools functions |
| max_loci_separation | 10000000 | Maximum loci separation for loop-calling (bp) (built-in HiCCUPS only) |
| max_nans_tolerated | 1 | Used in cooltools.dots() (built-in HiCCUPS only) |
| lambda_bin_fdr | 0.1 | Used in cooltools.dots() (built-in HiCCUPS only) |
| tile_size | 5000000 | Used in cooltools.dots() (built-in HiCCUPS only) |
| nproc | 1 | Number of processes used |
| dots_clustering_radius | 20000 | Clustering radius for HICCUPS enriched pixels or external loops (bp) |
| flank | 100000 | Flanking region for strength estimation (typically 10×resolution) |
-
HFF_MicroC
- Description: Micro-C data from HFF human cells for two chromosomes (hg38) in a multi-resolution mcool format.
- Source: Krietenstein et al. 2021
- Downloaded from: https://osf.io/3h9js/download
- Stored as:
test.mcool - Original MD5 checksum:
e4a0fc25c8dc3d38e9065fd74c565dd1
-
hESC_MicroC
- Description: Micro-C data from human ES cells for two chromosomes (hg38) in a multi-resolution mcool format.
- Source: Krietenstein et al. 2021
- Downloaded from: https://osf.io/3kdyj/download
- Stored as:
test_hESC.mcool - Original MD5 checksum:
ac0e636605505fb76fac25fa08784d5b
$ git clone https://github.com/dekkerlab/unionloops-nf.git
$ cd unionloops-nf/test/Step 2: Download two test .mcool files to test/data/ and generate a test_mcool_paths.tsv file in test/
$ bash ./run_download.shPlease be patient, for this test example the Nextflow pipeline may take up to 10 minutes to complete.
Note: You might need to replace ~/miniconda3/envs/unionloops-nf with the path to your unionloops-nf conda environment. You can find it by running:
$ conda env list | grep 'unionloops-nf'$ conda activate nextflow
$ nextflow run ../unionloops.nf \
> -ansi-log false \
> --input_cooler_paths /full/path/to/test/test_mcool_paths.tsv \
> --outfilename test_union_loop_list.tsv \
> --conda_env ~/miniconda3/envs/unionloops-nf- Update the provided LSF job script
test/run_test_example.shwith the necessary path adjustments:
#BSUB -q short
#BSUB -W 4:00
#BSUB -n 2
#BSUB -J unionloops-nf
#BSUB -R "span[hosts=1]"
#BSUB -R "rusage[mem=8000]"
#BSUB -eo dis.err
#BSUB -oo dis.out
# Load environment
if [ -f "$HOME/.bashrc" ]; then
source "$HOME/.bashrc"
elif [ -f "$HOME/.bash_profile" ]; then
source "$HOME/.bash_profile"
fi
# Activate nextflow conda environment
conda activate nextflow
# Run Nextflow
nextflow run /full/path/to/unionloops-nf/unionloops.nf \
-profile cluster \
-ansi-log false \
--input_cooler_paths /full/path/to/unionloops-nf/test/test_mcool_paths.tsv \
--outfilename test_union_loop_list.tsv \
--conda_env ~/miniconda3/envs/unionloops-nf \
--nproc 2- Submit the job from the
test/directory
bsub < run_test_example.sh$ head results/test_union_loop_list.tsv-
distiller: A modular Hi-C mapping pipeline for reproducible data analysis. https://github.com/open2c/distiller-nf
-
HiCCUPS: Rao et al. A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of Chromatin Looping. Cell, 2014. https://doi.org/10.1016/j.cell.2014.11.021
-
Nextflow: Tommaso et al. Nextflow enables reproducible computational workflows. Nature Biotechnology, 2017. https://doi.org/10.1038/nbt.3820
-
cooltools: Open2C. Cooltools: scalable analysis tools for Hi-C and other genome-wide contact maps. PLOS Computational Biology, 2024. https://doi.org/10.1371/journal.pcbi.1012067
-
cooler: Abdennur and Mirny. Cooler: scalable storage for Hi-C data and other genomically labeled arrays. Bioinformatics, 2020. https://doi.org/10.1093/bioinformatics/btz540
-
bioframe: Open2C et al. Bioframe: Operations on Genomic Intervals in Pandas Dataframes. Bioinformatics, 2024. https://doi.org/10.1093/bioinformatics/btae088