Add new percentile aggregator and tests for comparing it with Byzantine by mina5rovic · Pull Request #1052 · epfml/disco

mina5rovic · 2026-02-18T18:47:55Z

Implementation of new percentile aggregator based on the previously implemented Byzantine (in previous version of disco). This one work faster and in some cases better than the Byzantine. Also implemented tests for comparing those two.

Changing Byzantine aggregator based on errors found in implementation while comparing them.

JulienVig

The comparison is very useful, thanks!

I have a bunch of questions, mostly due to my superficial knowledge of the algorithms. Overall, we're often getting exact values like 1.0000 which seems a bit fishy. Also, the new aggregator seems worse than the old both time- and robustness-wise. I believe it should at least be more robust so there are maybe some bugs that slipped into the implementation?

Finally, could you add a memory test case to ensure that the implementation is currently doing garbage disposal of unused TF tensors? You can do so by running multiple rounds/aggregations and keeping track of the number of tensors allocated in memory (tf.memory()), which ideally should stay constant throughout the training. You can read more on memory management here.

…percentile-aggregation

JulienVig

Thanks Mina, all good for me!

JulienVig · 2026-04-15T14:14:53Z

+      .map(t => tf.sum(tf.square(t)))
+      .reduce((a, b) => tf.add(a, b), tf.scalar(0));
+
+    return tf.sqrt(total).dataSync()[0];


Ideally we'd only use async computation to avoid blocking other operations. I guess here that'd required refactoring how the aggregate method into an async method is that right?

JulienVig · 2026-04-15T14:18:06Z

+    // Step 5: Clip weights based on tau
+    // Each peer gets one scale factor based on their Frobenius norm
+    const clippedWeights = centeredWeights.map((w, peerIdx) => {
+      //const scaleFactor = Math.min(1, tau / normArray[peerIdx]);


Suggested change

//const scaleFactor = Math.min(1, tau / normArray[peerIdx]);

Commented code

JulienVig · 2026-04-15T14:41:23Z

+    eps.dispose();
+    one.dispose();
+    radius.dispose();


Suggested change

eps.dispose();

one.dispose();

radius.dispose();

tf.dispose([eps, one, radius]);

fyi you can also dispose of multiple variables in one line

Add new percentile aggregator and tests for comparing it with Byzantine

f84d92c

mina5rovic requested review from JulienVig and tharvik February 18, 2026 18:47

JulienVig requested changes Mar 10, 2026

View reviewed changes

mina5rovic added 4 commits April 14, 2026 19:48

Fix Byzantine (paper-like) and compare the usecases with percentile

2ecd923

Merge branch 'develop' into percentile-aggregation

02f3f2e

Fix casting error

3f496cd

Merge branch 'percentile-aggregation' of github.com:epfml/disco into …

d5d28d5

…percentile-aggregation

mina5rovic requested a review from JulienVig April 14, 2026 18:25

JulienVig approved these changes Apr 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new percentile aggregator and tests for comparing it with Byzantine#1052

Add new percentile aggregator and tests for comparing it with Byzantine#1052
mina5rovic wants to merge 5 commits into
developfrom
percentile-aggregation

mina5rovic commented Feb 18, 2026 •

edited

Loading

Uh oh!

JulienVig left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JulienVig left a comment

Uh oh!

JulienVig Apr 15, 2026

Uh oh!

JulienVig Apr 15, 2026

Uh oh!

JulienVig Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mina5rovic commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JulienVig left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JulienVig left a comment

Choose a reason for hiding this comment

Uh oh!

JulienVig Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

JulienVig Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

JulienVig Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mina5rovic commented Feb 18, 2026 •

edited

Loading