nf-core · kubranarci · Feb 26, 2026 · Feb 20, 2026 · Feb 24, 2026 · Feb 24, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -10,11 +10,14 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - RTGtools update to 3.13 [#261](https://github.com/nf-core/variantbenchmarking/issues/261)
 - Transforming local modules with bcftools and tabix to standard nf-core modules [#267](https://github.com/nf-core/variantbenchmarking/pull/267)
 - Replace local modules SORT_BED and REFORMAT_HEADER with nf-core ones. [#268](https://github.com/nf-core/variantbenchmarking/pull/268)
+- Introducing a new subworkflow to generate truth vcf with an ensemble approach. Test VCFs are being merged and according to ensembl_truth rule (the minimum number of callers to agree) a new truth set is created. This apporach is especially important and needed for somatic benchmarks where truth is often missing. [#276](https://github.com/nf-core/variantbenchmarking/pull/276)
 
 ### `Fixed`
 
 - increasing font sizes, making labelling optional and some fixes around plots. Tests are editted to observe optional plot arguments. [#270](https://github.com/nf-core/variantbenchmarking/pull/270)
 - Improving the pipeline towards strict syntax health & adding topic channels - 1 [#272](https://github.com/nf-core/variantbenchmarking/pull/272)
+- Fixing the bed file bug in concordance analysis [#260](https://github.com/nf-core/variantbenchmarking/pull/275)
+- Missing --sample for meta.id is fixed in BCFTOOLS_REHEADER [#276](https://github.com/nf-core/variantbenchmarking/pull/276)
 
 ### `Dependencies`
 

diff --git a/README.md b/README.md
@@ -27,28 +27,45 @@
 
 The pipeline is built using [Nextflow](https://www.nextflow.io), a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible. The [Nextflow DSL2](https://www.nextflow.io/docs/latest/dsl2.html) implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies. Where possible, these processes have been submitted to and installed from [nf-core/modules](https://github.com/nf-core/modules) in order to make them available to all nf-core pipelines, and to everyone within the Nextflow community!
 
-<p align="center">
-    <img title="variantbenchmarking metro map" src="docs/images/variantbenchmarking_metromap.png" width=100%>
-</p>
+<picture>
+  <source media="(prefers-color-scheme: dark)" srcset="docs/images/variantbenchmarking.svg">
+  <img alt="nf-core/variantbenchmarking metro map" src="docs/images/variantbenchmarking.svg">
+</picture>
 
 The workflow involves several key processes to ensure reliable and reproducible results as follows:
 
-### Standardization and normalization of variants:
+### Standardization and normalization of test (query/comparison) variants:
 
 This initial step ensures consistent formatting and alignment of variants in test and truth VCF files for accurate comparison.
 
-- Subsample if input test vcf is multisample ([bcftools view](https://samtools.github.io/bcftools/bcftools.html#view))
+- Subsample if input vcf is multisample ([bcftools view](https://samtools.github.io/bcftools/bcftools.html#view))
 - Homogenization of multi-allelic variants, MNPs and SVs (including imprecise paired breakends and single breakends) ([variant-extractor](https://github.com/EUCANCan/variant-extractor))
-- Reformatting test VCF files from different SV callers ([svync](https://github.com/nvnieuwk/svync))
+- Reformatting VCF files from different SV callers ([svync](https://github.com/nvnieuwk/svync))
 - Standardize SV variants to BND ([SVTK standardize](https://github.com/broadinstitute/gatk-sv/blob/main/src/svtk/scripts/svtk))
 - Decompose SVs to BND [rtgtools svdecompose](https://cn.animalgenome.org/bioinfo/resources/manuals/RTGOperationsManual.pdf)
-- Rename sample names in test and truth VCF files ([bcftools reheader](https://samtools.github.io/bcftools/bcftools.html#reheader))
-- Splitting multi-allelic variants in test and truth VCF files ([bcftools norm](https://samtools.github.io/bcftools/bcftools.html#norm))
-- Deduplication of variants in test and truth VCF files ([bcftools norm](https://samtools.github.io/bcftools/bcftools.html#norm))
-- Left aligning of variants in test and truth VCF files ([bcftools norm](https://samtools.github.io/bcftools/bcftools.html#norm))
-- Use prepy in order to normalize test files. This option is only applicable for happy benchmarking of germline analysis ([prepy](https://github.com/Illumina/hap.py/tree/master))
+- Rename sample names ([bcftools reheader](https://samtools.github.io/bcftools/bcftools.html#reheader))
+- Splitting multi-allelic variants([bcftools norm](https://samtools.github.io/bcftools/bcftools.html#norm))
+- Deduplication of variants ([bcftools norm](https://samtools.github.io/bcftools/bcftools.html#norm))
+- Left aligning of variants ([bcftools norm](https://samtools.github.io/bcftools/bcftools.html#norm))
+- Use prepy in order to normalize. This option is only applicable for happy benchmarking of germline analysis ([prepy](https://github.com/Illumina/hap.py/tree/master))
 - Split SNVs and indels if the given test VCF contains both. This is only applicable for somatic analysis ([bcftools view](https://samtools.github.io/bcftools/bcftools.html#view))
 
+### Standardization and normalization of truth (baseline) variants:
+
+- Decompose SVs to BND [rtgtools svdecompose](https://cn.animalgenome.org/bioinfo/resources/manuals/RTGOperationsManual.pdf)
+- Rename sample names ([bcftools reheader](https://samtools.github.io/bcftools/bcftools.html#reheader))
+- Splitting multi-allelic variants ([bcftools norm](https://samtools.github.io/bcftools/bcftools.html#norm))
+- Deduplication of variants ([bcftools norm](https://samtools.github.io/bcftools/bcftools.html#norm))
+- Left aligning of variants([bcftools norm](https://samtools.github.io/bcftools/bcftools.html#norm))
+
+### Ensemble (majority rule) approcah to prepare truth variants:
+
+In cases where a gold standard truth VCF file is unavailable, a common approach is to create an ensemble of test variants using a majority rule. This method retains variants identified by more than one out of the $n$ total variant callers. If $--ensemble_thruth$ > 0:
+
+- Merge small (SNVs and INDELs) using ([bcftools merge](https://samtools.github.io/bcftools/bcftools.html#merge))
+- Merge Structual Variants using ([SURVIVOR merge](https://github.com/fritzsedlazeck/SURVIVOR/wiki))
+- Filtering the variants according to $--ensemble_thruth$.
+
 ### Filtering options:
 
 Applying filtering on the process of benchmarking itself might makes it impossible to compare different benchmarking strategies. Therefore, for whom like to compare benchmarking methods this subworkflow aims to provide filtering options for variants.
@@ -80,7 +97,7 @@ Available methods for germline and somatic _structural variant (SV)_ benchmarkin
 
 - Truvari ([truvari bench](https://github.com/acenglish/truvari/wiki/bench))
 - SVanalyzer ([svanalyzer benchmark](https://github.com/nhansen/SVanalyzer/blob/master/docs/svbenchmark.rst))
-- Rtgtools (only for BND) ([rtg bndeval](https://realtimegenomics.com/products/rtg-tools))
+- RTGtools (only for BND) ([rtg bndeval](https://realtimegenomics.com/products/rtg-tools))
 
 > [!NOTE]
 > Please note that there is no somatic specific tool for SV benchmarking in this pipeline.
@@ -201,6 +218,7 @@ We thank the following people for their extensive assistance in the development
 
 - Nicolas Vannieuwkerke ([@nvnienwk](https://github.com/nvnieuwk))
 - Maxime Garcia ([@maxulysse](https://github.com/maxulysse))
+- Georgia Kesisoglou ([@georgiakes](https://github.com/georgiakes))
 - Sameesh Kher ([@khersameesh24](https://github.com/khersameesh24))
 - Florian Heyl ([@heylf](https://github.com/heyl))
 - Krešimir Beštak ([@kbestak](https://github.com/kbestak))

diff --git a/conf/modules.config b/conf/modules.config
@@ -53,8 +53,27 @@ process {
     }
 
     withName: BCFTOOLS_VIEW_FILTERMISSING {
-        ext.prefix = {  vcf.baseName - ".vcf" + ".filtermissing" }
-        ext.args = """--output-type z --include 'GT="alt"'"""
+        ext.prefix = { vcf.baseName - ".vcf" + ".filtermissing" }
+        ext.args = {
+            if (params.analysis == "somatic" && (meta.caller == "strelka" || meta.caller == "manta")) {
+                "--output-type z --write-index=tbi --include 'SOMATIC=1'"
+            } else if (meta.caller == "manta" || meta.caller == "lumpy") {
+                "--output-type z --write-index=tbi"
+            } else {
+                "--output-type z --write-index=tbi --include 'GT=\"alt\"'"
+            }
+        }
+        publishDir = [
+            path: {"${params.outdir}/${params.variant_type}/${meta.id}/preprocess"},
+            pattern: "*{.vcf.gz,vcf.gz.tbi}",
+            mode: params.publish_dir_mode
+        ]
+    }
+
+    withName: INJECT_MISSING_GT {
+        ext.prefix = { input[0].baseName + '.withGT' }
+        ext.suffix = "vcf"
+        ext.args = '-v OFS=\'\\t\' \'/^##/ {print $0; next} /^#CHROM/ {print "##FORMAT=<ID=GT,Number=1,Type=String,Description=\\x22Genotype\\x22>"; print $0; next} { $9 = "GT:" $9; for(i=10; i<=NF; i++) $i = "0/1:" $i; print }\''
         publishDir = [
             enabled: false
         ]
@@ -149,11 +168,11 @@ process {
         publishDir = [
             enabled: false
         ]
-
     }
 
     withName: "BCFTOOLS_REHEADER*" {
-        ext.args2   = {"--output-type z --write-index=tbi" }
+        ext.args   = { "--samples <(echo '${meta.id}')" }
+        ext.args2  = {"--output-type z --write-index=tbi" }
         ext.prefix = { vcf.baseName - ".vcf" + ".reheader"}
         publishDir = [
             enabled: false
@@ -243,7 +262,59 @@ process {
         ]
     }
 
-    withName: 'PUBLISH_PROCESSED_VCF' {
+    // majority rule, ensemble analysis
+    withName: BCFTOOLS_ENSEMBLE {
+        ext.prefix = {"${meta.id}.ensemble"}
+        ext.args   = {"--output-type z --write-index=tbi --force-samples"}
+        publishDir = [
+            enabled: false
+        ]
+    }
+
+    withName: FILTER_MAJORITY {
+        ext.prefix = {"${meta.id}.majority"}
+        ext.args   = {"--output-type v -i 'COUNT(GT=\"alt\") >= ${params.ensemble_truth}'"}
+        publishDir = [
+            enabled: false
+        ]
+    }
+
+    withName: REFORMAT_TRUTH {
+        ext.prefix = { input[0].baseName + '.reformatted' }
+        ext.suffix = "vcf"
+        ext.args = '-v OFS=\'\\t\' \'/^##/ {print $0; next} /^#CHROM/ {print $1, $2, $3, $4, $5, $6, $7, $8, $9, "TRUTH"; next} {print $1, $2, $3, $4, $5, $6, $7, $8, "GT", "0/1"}\''
+        publishDir = [
+            enabled: false
+        ]
+    }
+
+    withName: SURVIVOR_ENSEMBLE {
+        ext.prefix = {"${meta.id}.ensemble"}
+        publishDir = [
+            enabled: false
+        ]
+    }
+
+    withName: REFORMAT_TRUTH_SV {
+        ext.prefix = { input[0].baseName + '.reformatted' }
+        ext.suffix = "vcf"
+        ext.args = '-v OFS=\'\\t\' \'/^##INFO=<ID=CIPOS/ {next} /^##INFO=<ID=CIEND/ {next} /^##/ {print $0; next} /^#CHROM/ {print "##INFO=<ID=CIPOS,Number=2,Type=Integer,Description=\\x22Confidence interval around POS\\x22>"; print "##INFO=<ID=CIEND,Number=2,Type=Integer,Description=\\x22Confidence interval around END\\x22>"; print $1, $2, $3, $4, $5, $6, $7, $8, $9, "TRUTH"; next} {print $1, $2, $3, $4, $5, $6, $7, $8, "GT", "0/1"}\''
+        publishDir = [
+            enabled: false
+        ]
+    }
+
+    withName: 'TABIX_BGZIPTABIX_SMALL' {
+        publishDir = [
+            path: {"${params.outdir}/${params.variant_type}/${meta.id}/preprocess"},
+            pattern: "*{.vcf.gz,vcf.gz.tbi}",
+            mode: params.publish_dir_mode
+        ]
+    }
+
+    withName: BCFTOOLS_SORT_SV {
+        ext.prefix = { vcf.baseName - ".vcf" + ".sort"}
+        ext.args   = {"--output-type z --write-index=tbi" }
         publishDir = [
             path: {"${params.outdir}/${params.variant_type}/${meta.id}/preprocess"},
             pattern: "*{.vcf.gz,vcf.gz.tbi}",
@@ -289,7 +360,7 @@ process {
 
     // squash-ploidy is necessary to be able to match het-hom changes
     withName: "RTGTOOLS_VCFEVAL" {
-        ext.prefix = {"${meta.id}.${params.truth_id}.${meta.caller}"}
+        ext.prefix = {params.truth_id ? "${meta.id}.${params.truth_id}.${meta.caller}" : "${meta.id}.truth.${meta.caller}" }
         ext.args   = {["--all-record ",
                     (params.analysis == somatic)      ? '--squash-ploidy' : ''
                     ].join('').trim()
@@ -302,7 +373,7 @@ process {
     }
 
     withName: "RTGTOOLS_BNDEVAL" {
-        ext.prefix = {"${meta.id}.${params.truth_id}.${meta.caller}"}
+        ext.prefix = {params.truth_id ? "${meta.id}.${params.truth_id}.${meta.caller}" : "${meta.id}.truth.${meta.caller}" }
         publishDir = [
             path: {"${params.outdir}/${params.variant_type}/${meta.id}/benchmarks/rtgtools"},
             pattern: "*{.vcf.gz,vcf.gz.tbi,tsv.gz,txt}",
@@ -311,7 +382,7 @@ process {
     }
 
     withName: "HAPPY_HAPPY" {
-        ext.prefix = {"${meta.id}.${params.truth_id}.${meta.caller}"}
+        ext.prefix = {params.truth_id ? "${meta.id}.${params.truth_id}.${meta.caller}" : "${meta.id}.truth.${meta.caller}" }
         //ext.args   = {""}
         publishDir = [
             path: {"${params.outdir}/${params.variant_type}/${meta.id}/benchmarks/happy"},
@@ -321,7 +392,7 @@ process {
     }
 
     withName: "HAPPY_SOMPY" {
-        ext.prefix = {"${meta.id}.${params.truth_id}.${meta.caller}"}
+        ext.prefix = {params.truth_id ? "${meta.id}.${params.truth_id}.${meta.caller}" : "${meta.id}.truth.${meta.caller}" }
         ext.args = { meta.caller.contains("strelka") || meta.caller.contains("varscan") || meta.caller.contains("pisces") ||  meta.caller == "mutect" ? "--feature-table hcc.${meta.caller}.${params.variant_type} --bin-afs" : "--feature-table generic" }
         publishDir = [
             path: {"${params.outdir}/${params.variant_type}/${meta.id}/benchmarks/sompy"},
@@ -331,22 +402,22 @@ process {
     }
 
     withName: "SPLIT_SOMPY_FEATURES" {
-        ext.prefix = {"${meta.id}.${params.truth_id}.${meta.caller}"}
+        ext.prefix = {params.truth_id ? "${meta.id}.${params.truth_id}.${meta.caller}" : "${meta.id}.truth.${meta.caller}" }
         publishDir = [
             enabled: false
         ]
     }
 
     withName: "HAPPY_PREPY" {
-        ext.prefix = {"${meta.id}.${params.truth_id}.${meta.caller}.prepy"}
+        ext.prefix = {"${meta.id}.${meta.caller}.prepy"}
         ext.args   = {"--fixchr --filter-nonref --bcftools-norm"}
         publishDir = [
             enabled: false
         ]
     }
 
     withName: "TRUVARI_BENCH" {
-        ext.prefix = {"${meta.id}.${params.truth_id}.${meta.caller}"}
+        ext.prefix = {params.truth_id ? "${meta.id}.${params.truth_id}.${meta.caller}" : "${meta.id}.truth.${meta.caller}" }
         ext.args = {[
                 "--sizemin 0 --sizefilt 0 --sizemax -1",
                 (meta.pctsize != null)      ? " --pctsize ${meta.pctsize}" : '',
@@ -365,7 +436,7 @@ process {
     }
 
     withName: SVANALYZER_SVBENCHMARK {
-        ext.prefix = {"${meta.id}.${params.truth_id}.${meta.caller}"}
+        ext.prefix = {params.truth_id ? "${meta.id}.${params.truth_id}.${meta.caller}" : "${meta.id}.truth.${meta.caller}" }
         ext.args = {[
                 (meta.normshift != null)      ? " -normshift ${meta.normshift}" : '',
                 (meta.normdist != null)       ? " -normdist ${meta.normdist}" : '',
@@ -433,7 +504,7 @@ process {
     }
 
     withName: WITTYER {
-        ext.prefix = {"${meta.id}.${params.truth_id}.${meta.caller}"}
+        ext.prefix = {params.truth_id ? "${meta.id}.${params.truth_id}.${meta.caller}" : "${meta.id}.truth.${meta.caller}" }
         ext.args = {[
                 "--includedFilters=''",
                 (meta.evaluationmode )      ? " -em ${meta.evaluationmode}" : '',
@@ -604,6 +675,7 @@ process {
         ]
     }
 
+
     withName: VCF_TO_CSV {
         ext.prefix = {"${meta.id}.${meta.tag}"}
         publishDir = [

diff --git a/conf/tests/germline_sv.config b/conf/tests/germline_sv.config
@@ -55,7 +55,6 @@ params {
     variant_type         = "structural"
     method               = 'svanalyzer,truvari,wittyer'
     preprocess           = "split_multiallelic,normalize,deduplicate"
-    sv_standardization   = "svtk"
     exclude_expression   = 'INFO/SVTYPE="BND" || INFO/SVTYPE="TRA"'
     min_sv_size          = 30
     truth_id             = "HG002"

diff --git a/conf/tests/liftover_test.config b/conf/tests/liftover_test.config
@@ -13,7 +13,7 @@ process {
     resourceLimits = [
         cpus: 4,
         memory: '15.GB',
-        time: '1.h'
+        time: '2.h'
     ]
 
     withName: 'BCFTOOLS_NORM*' {

diff --git a/conf/tests/liftover_truth.config b/conf/tests/liftover_truth.config
@@ -46,7 +46,7 @@ params {
     analysis             = 'germline'
     variant_type         = "small"
     method               = 'rtgtools,happy'
-    preprocess           = "normalize,deduplicate,prepy"
+    preprocess           = "normalize,deduplicate"
     skip_plots           = "svlength,upset,metrics"
 
     truth_id             = "HG002"

diff --git a/conf/tests/somatic_snv_ensemble.config b/conf/tests/somatic_snv_ensemble.config
@@ -0,0 +1,58 @@
+/*
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    Nextflow config file for running minimal tests
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+    Defines input files and everything required to run a fast and simple pipeline test.
+
+    Use as follows:
+        nextflow run nf-core/variantbenchmarking -profile somatic_snv_ensemble,<docker/singularity> --outdir <OUTDIR>
+
+----------------------------------------------------------------------------------------
+*/
+
+process {
+    resourceLimits = [
+        cpus: 4,
+        memory: '15.GB',
+        time: '1.h'
+    ]
+
+    withName: 'BCFTOOLS_NORM*' {
+        cpus   = { 1                   }
+        memory = { 6.GB * task.attempt }
+        time   = { 4.h  * task.attempt }
+    }
+    withName: 'BCFTOOLS_FILTER*' {
+        cpus   = { 1                   }
+        memory = { 6.GB * task.attempt }
+        time   = { 4.h  * task.attempt }
+    }
+    withName: 'BCFTOOLS_SORT*' {
+        cpus   = { 1                   }
+        memory = { 6.GB * task.attempt }
+        time   = { 4.h  * task.attempt }
+    }
+}
+
+params {
+    config_profile_name        = 'Test profile: somatic_snv_ensemble'
+    config_profile_description = 'Minimal test dataset to check pipeline function'
+
+    // Input data
+    test_data_base  = 'https://raw.githubusercontent.com/nf-core/test-datasets/variantbenchmarking'
+    input           = "${params.test_data_base}/samplesheet/samplesheet_snv_somatic_hg38.csv"
+    outdir          = 'results'
+
+    // Genome references
+    genome          = 'GRCh38'
+    analysis        = 'somatic'
+    method          = 'sompy'
+    preprocess      = "normalize,filter_contigs"
+    include_expression   = 'TYPE="snp"'
+
+    variant_type    = "snv"
+    ensemble_truth  = 2
+    regions_bed     = ""
+    truth_id        = ""
+
+}
diff --git a/conf/tests/somatic_sv.config b/conf/tests/somatic_sv.config
@@ -36,7 +36,7 @@ process {
     }
 }
 params {
-    config_profile_name        = 'Test profile'
+    config_profile_name        = 'Test profile: somatic_sv'
     config_profile_description = 'Minimal test dataset to check pipeline function'
 
     // Input data