Skip to content

fix(diarization): deterministic & robust offline VBx re-clustering (K-Means n_init)#735

Open
testfields wants to merge 1 commit into
FluidInference:mainfrom
testfields:fix/kmeans-ninit-deterministic-reclustering
Open

fix(diarization): deterministic & robust offline VBx re-clustering (K-Means n_init)#735
testfields wants to merge 1 commit into
FluidInference:mainfrom
testfields:fix/kmeans-ninit-deterministic-reclustering

Conversation

@testfields

@testfields testfields commented Jun 24, 2026

Copy link
Copy Markdown

Summary

The offline diarization speaker-count adjustment — re-clustering VBx-detected clusters down to the constrained count (numSpeakers / minmax) — calls KMeansClustering with a random seed and a single initialization. This makes the result both non-deterministic and fragile: small / boundary speakers collapse run-to-run.

Reproduction

A 4-speaker Japanese meeting clip (~7 min, 16 kHz mono), process --mode offline --num-speakers 4 --step-ratio 0.1, repeated runs on identical audio + config:

run speaker confusion vs hand-corrected reference smallest speaker (~41 s) recall
1 ~10.9% ~80% (kept)
2 ~32.0% ~15% (collapsed)
3 ~10.9% ~80% (kept)

The only thing varying between runs is the K-Means random seed; the smallest speaker flips between kept and merged-away.

Root cause

  • KMeansClustering.clusterWithCentroids initializes its RNG as SeededRNG(seed: seed ?? UInt64.random(in: 0...UInt64.max)) → a random seed whenever the caller doesn't pass one.
  • The VBxClustering speaker-count re-clustering (Speaker count N outside bounds […]; re-clustering to K) calls KMeansClustering.clusterWithCentroids(...) without a seed and with a single init (no n_init / best-of-N inertia selection).

So the final hard assignment of an over-segmented frame set down to K speakers depends on one random K-Means initialization, which is unstable for fragile / imbalanced speaker sets.

This re-clustering path was introduced in #236 (which made the numSpeakers constraint actually apply the K-Means centroids); it simply never seeded the K-Means or used n_init, so the constrained result was left non-deterministic.

Fix

  • KMeansClustering:
    • The unseeded fallback now uses a fixed seed (0) instead of UInt64.random → deterministic by default.
    • Added clusterWithCentroidsNInit(embeddings:numClusters:maxIterations:nInit:baseSeed:) which runs nInit deterministic initializations (seeds baseSeed … baseSeed+nInit-1) and returns the lowest-inertia result (sklearn-style n_init).
  • VBxClustering: the speaker-count re-clustering now calls clusterWithCentroidsNInit(nInit: 10, baseSeed: 0).

No breaking API changes — clusterWithCentroids keeps its signature (its unseeded path is just deterministic now); clusterWithCentroidsNInit is additive.

Result

Re-clustering is now fully deterministic and robustly retains fragile speakers. The 4-speaker clip scores ~9.2% consistently across 5+ CLI runs and on-device (CoreML / ANE, sandboxed macOS app).

Related

…st-of-N)

The offline VBx speaker-count adjustment (re-clustering detected clusters down to the
constrained count) called KMeansClustering with a random seed and a single
initialization. This is both non-deterministic and fragile: small/boundary speakers
collapse run-to-run (observed on a 4-speaker meeting clip, cause-(ii) swinging
~10%↔~30% across runs; the smallest speaker's recall flips 80%↔0%).

- KMeansClustering: default unseeded fallback now uses a fixed seed (0) instead of
  UInt64.random; add clusterWithCentroidsNInit which runs N deterministic
  initializations (seeds base..base+N-1) and returns the lowest-inertia result
  (sklearn-style n_init).
- VBxClustering: the speaker-count re-clustering now uses n_init=10, baseSeed=0.

Result: re-clustering is fully deterministic and robustly keeps fragile speakers
(the 4-speaker clip now scores ~9.2% consistently across 5+ runs and on-device).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants