[Fix] Make cpWER calculation identical to meeteval by tango4j · Pull Request #15573 · NVIDIA-NeMo/NeMo

tango4j · 2026-04-01T18:04:18Z

What does this PR do?

This PR fixes the cpWER (concatenated minimum-permutation word error rate) calculation in NeMo to produce results identical to MeetEval, the de-facto reference implementation.

Root cause of the bug: The previous implementation concatenated all words from each speaker pair into a single string and then computed WER on the concatenated result. This allowed edit operations to "cross" speaker boundaries, producing artificially low error rates. For example, hyp=["the cat sat", "on"] vs ref=["the cat", "sat on"] would incorrectly return cpWER=0.0 instead of the correct cpWER=0.5.

Fix: Each (ref_speaker, hyp_speaker) pair is now scored independently via edit distance, and cpWER = sum(errors) / sum(ref_word_counts). This matches MeetEval's algorithm exactly. The cost matrix is padded to max(num_hyp, num_ref) so the Hungarian algorithm (or brute-force search) handles mismatched speaker counts without special-casing.

Collection: ASR / Speaker Tasks

Changelog

Rewrote calculate_session_cpWER to use per-pair edit distance scoring with a padded square cost matrix + Hungarian algorithm (scipy.optimize.linear_sum_assignment), matching MeetEval's cpWER
Rewrote calculate_session_cpWER_bruteforce to use per-pair edit distance scoring (brute-force permutation search), serving as a reference/verification implementation
Removed the use_lsa_only parameter from calculate_session_cpWER (it was a no-op and is no longer needed since the square cost matrix naturally handles all speaker-count combinations)
Added comprehensive test suite (test_cpwer.py) with all expected values pre-verified against MeetEval's cp_word_error_rate
Improved variable naming for readability (e.g., N -> num_speakers_padded)

Usage

from nemo.collections.asr.metrics.der import calculate_session_cpWER

hyp = ["hey how are you we that's nice", "i'm good yes hi is your sister"]
ref = ["hi how are you well that's nice", "i'm good yeah how is your sister"]

cpwer, hyp_transcript, ref_transcript = calculate_session_cpWER(
    spk_hypothesis=hyp, spk_reference=ref
)
# cpwer == 4/14 ≈ 0.2857 (matches MeetEval)

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

Make sure you read and followed Contributor guidelines
Did you write any new necessary tests?
Did you add or update any necessary documentation?
Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
- N/A -- scipy and editdistance are existing required dependencies

PR Type:

New Feature
Bugfix
Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in NeMo Speech AI ASR

Additional Information

The old tests in test_diar_metrics.py used manually computed (_ins, _del, _sub) counts that matched the old (incorrect) concatenation-based algorithm. The new test_cpwer.py tests use expected values verified directly against MeetEval's cp_word_error_rate.
Breaking change: use_lsa_only parameter removed from calculate_session_cpWER. Any callers passing this argument will need to remove it.

Signed-off-by: taejinp <tango4j@gmail.com>

Signed-off-by: tango4j <tango4j@users.noreply.github.com>

Copilot

Pull request overview

This PR updates NeMo’s session-level cpWER computation to match MeetEval’s reference implementation by scoring each (ref speaker, hyp speaker) pair independently with edit distance and selecting the minimum-cost assignment via the Hungarian algorithm.

Changes:

Reworked calculate_session_cpWER() to use a padded square edit-distance cost matrix + scipy.optimize.linear_sum_assignment, and compute cpWER = sum(pair_errors) / sum(ref_words).
Reworked calculate_session_cpWER_bruteforce() to use per-pair edit distance over speaker permutations (reference implementation).
Added a new dedicated test suite (test_cpwer.py) with MeetEval-verified expected values, plus Hungarian vs brute-force agreement checks.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
`nemo/collections/asr/metrics/der.py`	Reimplements cpWER to match MeetEval using per-pair edit distance and Hungarian assignment; updates brute-force reference implementation accordingly.
`tests/collections/speaker_tasks/utils/test_cpwer.py`	Adds comprehensive MeetEval-verified cpWER tests and LSA vs brute-force agreement checks.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

nemo/collections/asr/metrics/der.py

Signed-off-by: taejinp <tango4j@gmail.com>

Signed-off-by: tango4j <tango4j@users.noreply.github.com>

ipmedenn

LGTM

tango4j added 2 commits April 1, 2026 08:54

Adding the new fixed cpWER that is identical to meeteval

f3ace29

Signed-off-by: taejinp <tango4j@gmail.com>

Finalize cpwer fix

f563bb5

Signed-off-by: taejinp <tango4j@gmail.com>

tango4j requested a review from Copilot April 1, 2026 18:04

github-actions bot added the ASR label Apr 1, 2026

Copilot started reviewing on behalf of tango4j April 1, 2026 18:04 View session

Apply isort and black reformatting

51fde9d

Signed-off-by: tango4j <tango4j@users.noreply.github.com>

Copilot AI reviewed Apr 1, 2026

View reviewed changes

nemo/collections/asr/metrics/der.py Outdated Show resolved Hide resolved

nemo/collections/asr/metrics/der.py Show resolved Hide resolved

nemo/collections/asr/metrics/der.py Show resolved Hide resolved

tango4j added 2 commits April 1, 2026 11:13

Adding fixed versions

5ed13da

Signed-off-by: taejinp <tango4j@gmail.com>

Adding fixed versions from Copilot comments and resolved conflicts

81084ff

Signed-off-by: taejinp <tango4j@gmail.com>

tango4j requested review from ipmedenn and weiqingw4ng April 1, 2026 18:15

Apply isort and black reformatting

2835668

Signed-off-by: tango4j <tango4j@users.noreply.github.com>

ipmedenn approved these changes Apr 2, 2026

View reviewed changes

tango4j enabled auto-merge (squash) April 2, 2026 14:58

tango4j added Run CICD and removed Run CICD labels Apr 2, 2026

tango4j temporarily deployed to test April 2, 2026 23:37 — with GitHub Actions Inactive

Merge branch 'main' into fix_cpwer

48d29ed

tango4j added Run CICD and removed Run CICD labels Apr 3, 2026

chtruong814 added Run CICD and removed Run CICD labels Apr 3, 2026

chtruong814 temporarily deployed to test April 3, 2026 23:46 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fix] Make cpWER calculation identical to meeteval #15573

[Fix] Make cpWER calculation identical to meeteval #15573
tango4j wants to merge 7 commits intoNVIDIA-NeMo:mainfrom
tango4j:fix_cpwer

tango4j commented Apr 1, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ipmedenn left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

tango4j commented Apr 1, 2026

What does this PR do?

Changelog

Usage

GitHub Actions CI

Before your PR is "Ready for review"

Who can review?

Additional Information

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ipmedenn left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants