Skip to content

Add new percentile aggregator and tests for comparing it with Byzantine#1052

Open
mina5rovic wants to merge 1 commit intodevelopfrom
percentile-aggregation
Open

Add new percentile aggregator and tests for comparing it with Byzantine#1052
mina5rovic wants to merge 1 commit intodevelopfrom
percentile-aggregation

Conversation

@mina5rovic
Copy link
Collaborator

Implementation of new percentile aggregator based on the previously implemented Byzantine (in previous version of disco). This one work faster and in some cases better than the Byzantine. Also implemented tests for comparing those two.

Copy link
Collaborator

@JulienVig JulienVig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comparison is very useful, thanks!

I have a bunch of questions, mostly due to my superficial knowledge of the algorithms. Overall, we're often getting exact values like 1.0000 which seems a bit fishy. Also, the new aggregator seems worse than the old both time- and robustness-wise. I believe it should at least be more robust so there are maybe some bugs that slipped into the implementation?

Finally, could you add a memory test case to ensure that the implementation is currently doing garbage disposal of unused TF tensors? You can do so by running multiple rounds/aggregations and keeping track of the number of tensors allocated in memory (tf.memory()), which ideally should stay constant throughout the training. You can read more on memory management here.

Comment on lines +86 to +87
expect(timingNew.result).toBeLessThan(50);
expect(timingOld.result).toBeLessThan(50);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both aggregators result in exactly 1, is this expected? Shouldn't it be at least slightly influenced by the Byzantine outlier?

=== Timing Comparison: Simple Outlier Rejection ===
  New ByzantineRobust    | 14.655ms | result: 1.0000
  Old PercentileClipping | 1.197ms | result: 1.0000

console.log(formatTiming([timingNew, timingOld]));
});

it("old aggregator with different percentiles", async () => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So for this test I'm getting these results:

=== Timing Comparison: Old Aggregator with Different Percentiles ===
  tau=0.05 | 2.350ms | result: 1.0000
  tau=0.1  | 1.645ms | result: 1.0000
  tau=0.2  | 2.607ms | result: 1.0000
  tau=0.5  | 1.826ms | result: 1.0000

=== Timing Comparison: New Aggregator with Different Clipping Radii ===
  radius=0.5 | 2.293ms | result: 0.5000
  radius=1   | 0.949ms | result: 1.0000
  radius=2   | 0.922ms | result: 1.4000
  radius=5   | 0.749ms | result: 2.6000

If I understand correctly, the old aggregator is more robust than the new (but slower) in this case? Why aren't the results changing with different tau for the old aggregator?

Comment on lines +172 to +177
// With centering on previous (5.0), updates to 10.0 should result in something close to 10.0
expect(timingRound2.result).toBeGreaterThan(5.0);

console.log("\n=== Timing Comparison: State Preservation Across Rounds ===");
console.log(formatTiming([timingRound1, timingRound2]));
});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm getting these:

=== Timing Comparison: State Preservation Across Rounds ===
  Round 1 | 0.290ms | result: 5.0000
  Round 2 | 0.305ms | result: 10.0000

Shouldn't it be a bit lower than 10 because of the previous round? Seems that round 1's state has been entirely discarded

Comment on lines +217 to +225
for (const iter of iterations) {
const agg = new ByzantineRobustAggregator(0, 5, "absolute", 1.0, iter, 0);
agg.setNodes(Set(allPeers));

const timing = await measureAggregation(agg, `iterations=${iter}`, peersWithValues);
timings.push(timing);
}

console.log("\n=== Performance Impact of Iterative Refinement ===");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

=== Performance Impact of Iterative Refinement ===
  iterations=1  | 0.762ms | result: 1.0000
  iterations=2  | 1.233ms | result: 1.4000
  iterations=5  | 5.417ms | result: 1.6496
  iterations=10 | 7.004ms | result: 1.6665

Intuitively I expected that more iterations would improve the result while it seems that it strays further from the optimal result, is this behavior correct?

console.log("\n=== Byzantine Robustness: High Ratio Attack (4/10 = 40% malicious) ===");
console.log(formatTiming([timingNew, timingOld]));
console.log(` Result gap: new=${timingNew.result.toFixed(4)}, old=${timingOld.result.toFixed(4)}`);
console.log(` Winner: ${timingNew.result < timingOld.result ? "NEW (closer to honest 1.0)" : "OLD (closer to honest 1.0)"}`);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's missing the case where both results are equal and it's a tie

Comment on lines +309 to +312
console.log("\n=== Gradient Poisoning Attack (coordinated Byzantine values) ===");
console.log(formatTiming([timingNew, timingOld]));
console.log(` Result gap: new=${timingNew.result.toFixed(4)}, old=${timingOld.result.toFixed(4)}`);
console.log(` Expected honest value: 1.0000`);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

=== Gradient Poisoning Attack (coordinated Byzantine values) ===
  New (5 iterations) | 8.113ms | result: 2.3094
  Old (tau=0.2)      | 0.473ms | result: 1.0000
  Result gap: new=2.3094, old=1.0000
  Expected honest value: 1.0000
  Winner: OLD (closer to honest)

So the old aggregator is both faster and more robust? Doesn't this go against the theory for the robustness?

console.log(formatTiming([timing2New, timing2Old]));
console.log(` New result: ${timing2New.result.toFixed(4)} (expected ~10.0)`);
console.log(` Old result: ${timing2Old.result.toFixed(4)} (expected ~10.0)`);
console.log(` Winner: ${Math.abs(timing2New.result - 10.0) < Math.abs(timing2Old.result - 10.0) ? "NEW (better rejects attack)" : "OLD (better rejects attack)"}`);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Address potential tie


const timing2New = await measureAggregation(newAgg, "New Round 2 (attack)", round2Values);
const timing2Old = await measureAggregation(oldAgg, "Old Round 2 (attack)", round2Values);

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

=== Adaptive Multi-Round Attack ===
Round 1 (cooperation):
  New Round 1 | 4.476ms | result: 5.0000
  Old Round 1 | 2.309ms | result: 5.0000

Round 2 (adaptive Byzantine attack):
  New Round 2 (attack) | 32.874ms | result: 10.0000
  Old Round 2 (attack) | 17.731ms | result: 10.0000
  New result: 10.0000 (expected ~10.0)
  Old result: 10.0000 (expected ~10.0)
  Winner: OLD (better rejects attack)

Smells fishy that the results are exactly the expected value no?

const newError = Math.abs((arrNew[0][0] - 100) / 100) + Math.abs((arrNew[0][1] - 10) / 10) + Math.abs(arrNew[0][2] - 1);
const oldError = Math.abs((arrOld[0][0] - 100) / 100) + Math.abs((arrOld[0][1] - 10) / 10) + Math.abs(arrOld[0][2] - 1);
console.log(`\nTotal relative error: new=${newError.toFixed(3)}, old=${oldError.toFixed(3)}`);
console.log(`Winner: ${newError < oldError ? "NEW (better handles multi-scale)" : "OLD (better handles multi-scale)"}`);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adress tie

Comment on lines +404 to +415
console.log("\n=== Heterogeneous Gradients: Multi-Tensor Federated Model ===");
console.log("Gradient structure: [layer1=100×value, layer2=10×value, layer3=value]");
console.log("Honest peers send: [100, 10, 1]");
console.log("Byzantine sends: [10000, 1000, 100]");
console.log(`\nNew aggregator (${timeNew.toFixed(2)}ms):`);
console.log(` Layer 1: ${arrNew[0][0].toFixed(2)} (expected ~100)`);
console.log(` Layer 2: ${arrNew[0][1].toFixed(2)} (expected ~10)`);
console.log(` Layer 3: ${arrNew[0][2].toFixed(2)} (expected ~1)`);
console.log(`\nOld aggregator (${timeOld.toFixed(2)}ms):`);
console.log(` Layer 1: ${arrOld[0][0].toFixed(2)} (expected ~100)`);
console.log(` Layer 2: ${arrOld[0][1].toFixed(2)} (expected ~10)`);
console.log(` Layer 3: ${arrOld[0][2].toFixed(2)} (expected ~1)`);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

=== Heterogeneous Gradients: Multi-Tensor Federated Model ===
Gradient structure: [layer1=100×value, layer2=10×value, layer3=value]
Honest peers send: [100, 10, 1]
Byzantine sends: [10000, 1000, 100]

New aggregator (1.98ms):
  Layer 1: 14.92 (expected ~100)
  Layer 2: 1.49 (expected ~10)
  Layer 3: 0.15 (expected ~1)

Old aggregator (0.45ms):
  Layer 1: 100.00 (expected ~100)
  Layer 2: 10.00 (expected ~10)
  Layer 3: 1.00 (expected ~1)

Total relative error: new=2.552, old=0.000
Winner: OLD (better handles multi-scale)

The new aggregator is quite far off the expected values, potentially there is an implementation issue to fix there

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants