Skip to content

dekkerlab/unionloops-nf

Repository files navigation

unionloops-nf

DOI

Nextflow Version: 22.10.6

A HiCCUPS-based pipeline for simultaneous chromatin loop calling and cross-sample comparison, where both steps reinforce each other.

The unionloops pipeline provides:

  • Enhanced sensitivity of loop detection using cross-sample evidence.
  • Improved loop positional precision relative to CTCF/RAD21 sites.
  • Loop annotation via clustering: shared vs. sample-specific.
  • Loop strength quantification across all samples.

Setup

Step 1: Check your Conda version and solver

1. Show the configured solver

# Check Conda version
conda --version

# Show current solver configuration ("classic" or "libmamba")
conda config --show solver

# Note: On older Conda versions, if nothing is printed, it defaults to "classic".
# Make sure your current solver configuration is "libmamba", which is much faster than "classic".

2. Update Conda to the latest version

# Example: Update to Conda v25.7.0 and force reinstall in the base environment
conda install -n base -c defaults conda=25.7.0 --force-reinstall

3. Double-check versions and solver

# Verify Conda version after update
conda --version

# Check solver again (default on modern Conda is "libmamba")
conda config --show solver

Step 2: Create conda environment for Nextflow

conda env create -f nextflow_env.yml

Step 3: Create conda environment for unionloops pipeline

conda env create -f unionloops_env.yml

Input configuration

Required: prepare a TSV file (e.g., mcool_paths.tsv) with sample names and .mcool file paths:

name	path
sample1	/full/path/to/sample1.mcool
sample2	/full/path/to/sample2.mcool
sample3	/full/path/to/sample3.mcool
MEGA	/full/path/to/MEGA.mcool    # Optional: merged high-resolution map

For optimal performance, also consider including a single merged MEGA.mcool file that combines all samples. The high signal-to-noise ratio of the MEGA map can:

  • Improve the rescue of sample-specific loops.
  • Enhance the precision of loop detection.

Use .mcool files generated from distiller v0.3.3 (distiller-nf) for best compatibility.

Optional: UnionLoops (v1.1.0) now supports merging loops from multiple datasets generated by external loop callers. Users must provide a TSV file (e.g., external_loop_paths.tsv) containing sample names (matching those in mcool_paths.tsv) and paths to .bedpe files.

name	path
sample1	/full/path/to/sample1_loops.bedpe
sample2	/full/path/to/sample2_loops.bedpe
sample3	/full/path/to/sample3_loops.bedpe
MEGA	/full/path/to/MEGA_loops.bedpe   # Optional: high-confidence loops with a high signal-to-noise ratio

Running the pipeline

You can launch unionloops using different hardware profiles:

  1. Default hardware profile (configs/local.config) with your mcool_paths.tsv and conda env unionloops-nf:
nextflow run /full/path/to/unionloops-nf/unionloops.nf \
    -ansi-log false \
    --input_cooler_paths /full/path/to/mcool_paths.tsv \
    --outfilename union_loop_list.tsv \
    --conda_env /full/path/to/miniconda3/envs/unionloops-nf
  1. cluster hardware profile (configs/cluster.config) with your mcool_paths.tsv and conda env unionloops-nf:
nextflow run /full/path/to/unionloops-nf/unionloops.nf \
    -profile cluster \
    -ansi-log false \
    --input_cooler_paths /full/path/to/mcool_paths.tsv \
    --outfilename union_loop_list.tsv \
    --conda_env /full/path/to/miniconda3/envs/unionloops-nf
  1. custom hardware profile with your own configuration file with your mcool_paths.tsv and conda env unionloops-nf:
nextflow run /full/path/to/unionloops-nf/unionloops.nf \
    -profile custom --custom_config /full/path/to/your.config \
    -ansi-log false \
    --input_cooler_paths /full/path/to/mcool_paths.tsv \
    --outfilename union_loop_list.tsv \
    --conda_env /full/path/to/miniconda3/envs/unionloops-nf

You may override default parameters defined in nextflow.config as needed (see parameters section).


Example output

By default, output files will be saved in the results/ directory relative to your working directory.

File structure

results/
├── enriched_pixels/ # Enriched pixels per sample (only for built-in HiCCUPS)
│   ├── sample1.enriched.pixels.resolution.10kb.tsv
│   ├── sample2.enriched.pixels.resolution.10kb.tsv
│   ├── sample3.enriched.pixels.resolution.10kb.tsv
│   └── MEGA.enriched.pixels.resolution.10kb.tsv
│
├── clusters/ # Clustering results of pooled enriched pixels across all samples (only for built-in HiCCUPS)
│   ├── centroids_of_clusters_of_enriched_pixels.resolution.10kb.tsv # Without additional filtering
│   ├── clusters_of_enriched_pixels.resolution.10kb.tsv
│   └── enriched_pixels_meta.tsv
│
└── union_loop_list_10kb.tsv # Final union list of loops (for built-in HiCCUPS or external loops)
└── clusters_of_external_loops.resolution.10kb.tsv # only when external_loop_paths.tsv is provided

Columns in the dataframe of the final union list of loops

Column Description
chr1 Chromosome of anchor 1
start1 Start position of anchor 1
end1 End position of anchor 1
chr2 Chromosome of anchor 2
start2 Start position of anchor 2
end2 End position of anchor 2
sample_name Detected sample(s); joined with & if multiple
sample1 Loop strength in sample1
sample2 Loop strength in sample2
sample3 Loop strength in sample3
MEGA Loop strength in MEGA

Parameters

Required parameters

Parameter Description
input_cooler_paths TSV with sample names and .mcool paths
outfilename Output filename for union loop list
conda_env Path to conda environment for pipeline

Optional parameters

Parameter Default Description
assembly_name hg38 Genome assembly name from UCSC database
input_loop_paths null TSV with sample names and .bedpe paths for external loop lists
resolution 10000 Resolution (bp) must be ≥ 4,000 for built-in HiCCUPS, or match the resolution of provided external loops
outdir results Output directory
custom_config custom.config Custom Nextflow config
clr_weight_name weight Used by cooltools functions
max_loci_separation 10000000 Maximum loci separation for loop-calling (bp) (built-in HiCCUPS only)
max_nans_tolerated 1 Used in cooltools.dots() (built-in HiCCUPS only)
lambda_bin_fdr 0.1 Used in cooltools.dots() (built-in HiCCUPS only)
tile_size 5000000 Used in cooltools.dots() (built-in HiCCUPS only)
nproc 1 Number of processes used
dots_clustering_radius 20000 Clustering radius for HICCUPS enriched pixels or external loops (bp)
flank 100000 Flanking region for strength estimation (typically 10×resolution)

Test example

Test data details

  1. HFF_MicroC

    • Description: Micro-C data from HFF human cells for two chromosomes (hg38) in a multi-resolution mcool format.
    • Source: Krietenstein et al. 2021
    • Downloaded from: https://osf.io/3h9js/download
    • Stored as: test.mcool
    • Original MD5 checksum: e4a0fc25c8dc3d38e9065fd74c565dd1
  2. hESC_MicroC

    • Description: Micro-C data from human ES cells for two chromosomes (hg38) in a multi-resolution mcool format.
    • Source: Krietenstein et al. 2021
    • Downloaded from: https://osf.io/3kdyj/download
    • Stored as: test_hESC.mcool
    • Original MD5 checksum: ac0e636605505fb76fac25fa08784d5b

Run the pipeline using test data

Step 1: Clone this repository and change to the test directory

$ git clone https://github.com/dekkerlab/unionloops-nf.git
$ cd unionloops-nf/test/

Step 2: Download two test .mcool files to test/data/ and generate a test_mcool_paths.tsv file in test/

$ bash ./run_download.sh

Step 3: Run the pipeline with the downloaded test data

Please be patient, for this test example the Nextflow pipeline may take up to 10 minutes to complete.
Note: You might need to replace ~/miniconda3/envs/unionloops-nf with the path to your unionloops-nf conda environment. You can find it by running:

$ conda env list | grep 'unionloops-nf'
Option 1: Run it using the local hardware profile (configs/local.config).
$ conda activate nextflow
$ nextflow run ../unionloops.nf \
>  -ansi-log false \
>  --input_cooler_paths /full/path/to/test/test_mcool_paths.tsv \
>  --outfilename test_union_loop_list.tsv \
>  --conda_env ~/miniconda3/envs/unionloops-nf
Option 2: Run it using the cluster hardware profile (configs/cluster.config).
  1. Update the provided LSF job script test/run_test_example.sh with the necessary path adjustments:
#BSUB -q short
#BSUB -W 4:00
#BSUB -n 2
#BSUB -J unionloops-nf
#BSUB -R "span[hosts=1]"
#BSUB -R "rusage[mem=8000]"
#BSUB -eo dis.err
#BSUB -oo dis.out

# Load environment
if [ -f "$HOME/.bashrc" ]; then
    source "$HOME/.bashrc"
elif [ -f "$HOME/.bash_profile" ]; then
    source "$HOME/.bash_profile"
fi

# Activate nextflow conda environment
conda activate nextflow

# Run Nextflow
nextflow run /full/path/to/unionloops-nf/unionloops.nf \
        -profile cluster \
        -ansi-log false \
        --input_cooler_paths /full/path/to/unionloops-nf/test/test_mcool_paths.tsv \
        --outfilename test_union_loop_list.tsv \
        --conda_env ~/miniconda3/envs/unionloops-nf \
        --nproc 2
  1. Submit the job from the test/ directory
bsub < run_test_example.sh

Step 4: Take a look at the final union list of loops

$ head results/test_union_loop_list.tsv

Citations

About

A HiCCUPS-based pipeline for simultaneous chromatin loop calling and cross-sample comparison, where both steps reinforce each other.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors