Zachary Larkin
Vanderbilt University
October 2025
This repository contains a complete theoretical and experimental validation of byzantine-resilient consensus algorithms for distributed systems where messages are randomly lost
the core innovation: using confidence intervals instead of point estimates, with provable worst-case bounds that work on real networks with packet loss
status: 73 tests passing, 48 experiments validated, zero bound violations, ready for defense
Distributed sensors need to reach consensus on a measured value, but ->
- some sensors are byzantine (faulty or malicious) - they lie
- network is stochastic (messages get lost randomly) - you can't rely on all messages arriving
- you need provable guarantees, not just "it probably works"
existing solutions either assume reliable communication (doesn't exist in iot) or give no formal guarantees on accuracy
i wanted: mathematically provable bounds on consensus error that hold even when messages are lost and byzantine nodes actively try to inject bias
Key insight: sensors report confidence intervals like [24.8, 25.8] instead of point values like 25.3
then filter byzantine nodes using intersection-over-union (iou) - if honest nodes measure roughly the same thing, their intervals will overlap significantly, byzantine nodes trying to inject bias will have low overlap with honest nodes
What makes this truley novel:
- first byzantine consensus with provable bias bounds on confidence intervals
- first to use iou filtering (borrowed from computer vision, applied to distributed consensus)
- first validation on actual lpwan network physics (lora phy/mac simulation)
- characterized the gap: worst-case theory says bias ≤ 3.6, typical experiments achieve bias ≈ 0.06 (50x better)
figure 1: the core contribution - all 48 experiments show actual performance is ~2% of theoretical worst case (50x safety margin). every single point is below the red line (theoretical bound), proving the theory holds. but typical performance is in the green zone (excellent), not just "barely safe"
this is the story: provable worst-case guarantees + excellent typical-case performance
Theorem 1 - bias bound: under f < n/2 byzantine nodes, honest ci width W_h, iou threshold τ, sensor noise σ:
|consensus - truth| ≤ W_h(1-τ) + σ√(2log(2n/δ)) with probability ≥ 1-δ
this is the main contribution - a computable worst-case bound on consensus error
theorem 2 - convergence time: under stochastic communication with delivery probability p_s and graph connectivity λ₂:
T_ε = O(1/(λ₂·p_s) · log(W₀/ε))
shows how packet loss (p_s) and network topology (λ₂) affect convergence speed
Theorem 3 - adaptive contraction stability: dispersion-based adaptive contraction rate λ(t) remains stable and provides up to 2x speedup
Theorem 4 - information-theoretic lower bound: proves no algorithm can converge faster than Ω(f/(n·p_s·λ₂) log(W₀/ε)), so were near-optimal
see: docs/MATHEMATICAL_FOUNDATIONS.md (358 lines, formal proofs)
Core algorithm (src/algorithms/):
- interval arithmetic with iou computation
- robust aggregation using geometric median, trimmed mean, median-of-means
- adaptive contraction that adjusts speed based on dispersion
- byzantine attack models: random, spike, coordinated, mimicry
Network models (src/network_models/):
- lora physical layer: path loss (urban/rural/indoor), snr calculation, packet error rates
- lora mac layer: aloha collisions, 1% duty cycle enforcement, time-on-air computation
- ~2000 lines of realistic network simulation
Baseline comparisons (src/benchmarks/):
- classical averaging (no defense)
- vaidya+ 2012 (trimmed mean)
- leblanc+ 2013 (geometric median only)
- median-of-means
Mathematical property tests (17 tests):
- verify theorem 1 bound formula is correct
- check bias decomposition into adversarial + statistical terms
- validate assumptions (sub-gaussian noise, iou properties)
- test against optimal adversaries
Integration tests (30 tests):
- hard scenarios: 48% byzantine nodes, slow convergence, persistent attacks
- convergence dynamics: multi-round evolution, initial clustering
- algorithm comparison: our method vs baselines on different attacks
- adversarial scenarios: geometric adversary, high noise, varying parameters
Experimental validation (48 multi-seed experiments):
- scaling: 30, 50, 100, 200 nodes
- byzantine fraction: 10%, 20%, 30%, 40%, 48%
- network types: ideal, stochastic (p_s=0.7), lora rural
- attack types: random, spike, coordinated, mimicry
- 3 independent seeds per configuration for statistical validity
see: EXPERIMENTAL_VALIDATION_SUMMARY.md (complete results table)
figure 4: shows every configuration tested across two dimensions (network size and byzantine fraction). all points are in the green "safe" region below the theoretical f/n=50% limit. color shows performance margin - greener means better (lower bias relative to bound). this is complete coverage of the parameter space up to near-theoretical-maximum adversarial presence
experiment 1 - scaling study:
- question: does the algorithm scale to large networks?
- setup: n ∈ {30, 50, 100, 200} nodes, fixed f/n=20%, mimicry attack
- measures: bias vs network size, convergence rounds
- result: scales well, larger networks actually achieve lower bias (better filtering)
Experiment 2 - byzantine fraction sweep:
- question: how much adversarial presence can the algorithm handle?
- setup: n=100 fixed, f/n ∈ {10%, 20%, 30%, 40%, 48%}
- measures: bias vs byzantine fraction, convergence rate
- result: robust up to f/n=48% (near theoretical max of 50%), bias increases with f but stays within bounds
Experiment 3 - network comparison:
- question: does the algorithm work on realistic networks with packet loss?
- setup: ideal (p_s=1.0), stochastic (p_s=0.7), lora rural (real phy/mac)
- measures: bias vs network quality, effective delivery probability
- result: works on all network types, theory → practice connection validated
figure 3: (left) shows bias distribution across network types - performance degrades gracefully as network quality decreases but stays well below theoretical bound (red line). (right) shows the relationship between message delivery rate and bias, validating theorem 2's prediction that bias scales with 1/p_s
experiment 4 - attack comparison:
- question: which attacks are hardest to defend against?
- setup: n=100, f=20, attacks ∈ {random, spike, coordinated, mimicry}
- measures: bias per attack type, convergence behavior
- result: mimicry attack is sophisticated (passes iou threshold) but still defended, spike attacks are hardest
figure 2: heatmap showing vulnerability across attack types and byzantine fractions. darker = more challenging. numbers show actual bias values. key finding: even the darkest cells (hardest attacks at high f/n) are defended within bounds. mimicry attack at 48% byzantine is the worst case, but still achieves bias ~0.16 vs bound of 3.6
each configuration runs 3 independent seeds to ensure:
- results aren't flukes
- statistical validity (report mean ± std)
- reproducibility
| metric | value | interpretation |
|---|---|---|
| convergence rate | 48/48 (100%) | algorithm is reliable |
| mean bias | 0.064 ± 0.051 | typical performance |
| theoretical bound | 3.610 | worst-case guarantee |
| ratio | 1.78% | typical is 50x better than worst case |
| bound violations | 0/48 | theory holds in practice |
the gap between typical (1.78%) and worst case (100%) is the research insight: worst-case analysis provides safety guarantees, but typical performance is vastly better due to sub-gaussian concentration + geometric median synergy
- Theory is solid: 4 theorems proven, 17 property tests validate formulas
- experiments are rigorous: multi-seed, zero cherry-picking, all 48 experiments reported
- bounds hold in practice: zero violations across all experiments
- scales well: tested up to 200 nodes, theory says it goes further
- byzantine resilient: handles up to 48% adversarial nodes (near theoretical maximum)
- network realistic: real lora phy/mac simulation, not just "assume p_s"
- typical performance excellent: 1-2% of worst-case bound (50x margin)
- converges fast: 1-50 rounds depending on network quality
- urban lora scenario: high packet loss (per=18.5%), low connectivity (3.7 avg degree), didn't converge to tight agreement but stayed within safety bounds - this is honest failure mode
- limited scale testing: only validated up to 200 nodes (theory says it scales, but haven't tested 500+)
- parameter tuning: tau=0.2, lambda=0.2 worked for these scenarios but might need adjustment for different networks
- convergence speed: can be slow (50 rounds) if network quality is poor
- byzantine jamming: assumes byzantines can't physically jam transmissions (physical layer attack i don't address)
- test runtime: some integration tests take 5+ minutes because simulating lora mac collision dynamics is expensive
- figures: matplotlib defaults are functional but not publication quality (need latex pgfplots for paper)
- code quality: network-aware consensus adapter has some repeated code, could refactor for cleaner interfaces
- incomplete theory: can prove worst-case bias bound but can't fully explain why typical case is 50x better (sub-gaussian concentration + geometric median synergy, but not tight analysis)
- research vs production: this is validated research code, not deployable system code
README.md (this file): overview and honest assessment
STATUS.md: current state summary - what's complete, what's validated, publication readiness
EXPERIMENTAL_VALIDATION_SUMMARY.md: complete results from 48 experiments with statistical analysis
quick_results_20251021_162801.csv: raw experimental data (48 rows, one per experiment)
network_experiment_results.csv: lora network experiments showing p_s values
requirements.txt: dependencies (numpy, scipy, pandas, pytest, matplotlib)
.gitignore: excludes pycache, temp files
MATHEMATICAL_FOUNDATIONS.md (358 lines):
- formal statements of theorems 1-4
- proof sketches with mathematical derivations
- complexity analysis
- discussion of assumptions and when they hold
REAL_NETWORK_APPLICATION.md:
- how theorems apply to real networks
- lora phy details (path loss, snr, per)
- lora mac details (aloha, duty cycle, collisions)
- mapping between theory (abstract p_s) and reality (actual packet delivery)
primitives/interval_arithmetic.py:
- confidence interval dataclass
- iou computation (intersection over union)
- interval operations (intersect, union, contract)
- epsilon-agreement checking
primitives/iou_filtering.py:
- iou-based message acceptance
- consistency voting using clustering
- filtering statistics tracking
primitives/robust_aggregation.py:
- geometric median (weiszfeld algorithm)
- trimmed mean
- median-of-means
- catoni m-estimator
primitives/adaptive_contraction.py:
- dispersion-based lambda(t) computation
- stability checks
- weak support throttling
consensus_protocols/interval_consensus.py (~400 lines):
- main consensus algorithm
- per-round update logic
- convergence checking
- can run standalone or with network model
consensus_protocols/network_aware_consensus.py:
- adapter between consensus algorithm and network models
- message serialization
- network statistics tracking
- integrates real packet loss into consensus
byzantine_attacks/attack_interface.py:
- abstract attack class
- attack configuration
- metrics tracking
byzantine_attacks/mimicry_attack.py (~300 lines):
- sophisticated attack that passes iou threshold
- geometric positioning to maximize bias while staying accepted
- adaptive based on honest cluster
byzantine_attacks/spike_attacks.py:
- random byzantine (random noise)
- spike attack (extreme outliers)
- coordinated bias (all byzantines push same direction)
- on-off attack (intermittent)
byzantine_attacks/advanced_adversaries.py:
- optimal adversary (uses theorem 1 construction)
- geometric adversary (worst-case positioning)
- adaptive learning adversary
abstract/network_interface.py:
- abstract network class
- message delivery interface
- ideal network (perfect communication)
- stochastic network (iid delivery probability p_s)
lpwan_physics/lora_phy.py (~500 lines):
- path loss models (free-space, urban, rural, indoor)
- shadow fading (log-normal)
- rssi computation
- snr calculation from rssi and noise floor
- per (packet error rate) from snr
- spreading factor effects
- time-on-air calculation
lpwan_physics/lora_mac.py (~400 lines):
- lorawan class a (aloha)
- duty cycle tracking (eu868 1% limit)
- collision detection and modeling
- capture effect
- per-message delivery outcomes
- network statistics (effective p_s)
robust_statistics/iou_bias_theorem.py:
- computes theoretical bias bound (theorem 1)
- decomposes bias into adversarial + statistical terms
- validates assumptions (sub-gaussian noise)
- can be run standalone to verify formula
classical_algorithms/baseline_consensus.py:
- classical averaging (no defense)
- vaidya trimmed mean (drop top/bottom k)
- leblanc geometric median
- median-of-means
- used for comparison in experiments
runners/parameter_sweep.py:
- infrastructure for running multiple configurations
- handles seeds, saves results to csv
- not currently used (quick_experiments.py does this)
scenarios/lora_scenarios.py:
- smart agriculture (soil moisture, 1km field, rural)
- smart city (air quality, 3km urban, sf9)
- industrial iot (factory monitoring, indoor, sf10)
- environmental monitoring (10km sparse, sf12)
- defines realistic deployment scenarios with proper parameters
mathematical_properties/test_theorem_1_holds.py (276 lines):
- validates bias bound formula is correct
- checks decomposition into adversarial + statistical terms
- verifies assumptions
- tests integration points
mathematical_properties/test_theorem_1_mimicry.py (345 lines):
- tests against mimicry attack specifically
- validates iou-passing attack defense
- checks adaptive adversary
- worst-case scenario testing
integration/test_algorithm_comparison.py:
- compares our algorithm vs baselines
- runs against all attack types
- reports which algorithm performs best per attack
- validates we're competitive or better
integration/test_adversarial_scenarios.py (488 lines):
- geometric adversary (theorem 1 construction)
- high byzantine fraction (48%)
- collapsed ci attack
- varying noise levels
integration/test_hard_scenarios.py:
- slow convergence (lambda=0.05, 50+ rounds)
- extreme byzantine majority communication
- persistent attack over many rounds
- high byzantine fraction with smart attacks
integration/test_convergence_dynamics.py:
- initial separation analysis
- multi-round evolution tracking
- weak initial clustering
- explains why convergence happens fast (1 round often)
integration/test_realistic_scenarios.py:
- diverse initial beliefs (not all start same)
- learning adversary (adapts to rejections)
- asynchronous updates
- real-world noise models
integration/test_network_aware_consensus.py:
- ideal network baseline
- stochastic network (validates theorem 2)
- lora rural (real phy/mac)
- lora urban (challenging)
- scenario-specific tests
quick_experiments.py (~200 lines):
- runs 48 experiments with 3 seeds each
- takes ~5 minutes
- generates csv with all results
- this is what produced the main experimental data
comprehensive_experiments.py:
- full parameter sweep with 10 seeds per config
- takes hours (overnight recommended)
- more statistical power for publication
run_network_experiments.py:
- specifically tests network scenarios
- ideal, stochastic, lora rural, lora urban
- reports delivery rates and network stats
generate_plots.py:
- reads csv results
- generates 4 figures (scaling, byzantine fraction, networks, attacks)
- saves to figures/ directory
final_summary.py:
- demo script showing key findings
- can be run standalone to see algorithm in action
pytest src/tests/ -vthis validates the implementation works correctly
python scripts/quick_experiments.pygenerates quick_results_[timestamp].csv with 48 experiments
python scripts/generate_plots.pyreads latest csv and creates 4 publication-ready plots
# just theory validation
pytest src/tests/mathematical_properties/ -v
# just integration tests
pytest src/tests/integration/ -v
# specific test file
pytest src/tests/integration/test_algorithm_comparison.py -vtheorem 1 gives a computable bound on consensus error that holds with high probability (1-δ)
the bound has two terms:
- adversarial term: W_h(1-τ) - how much bias byzantines can inject
- statistical term: σ√(2log(2n/δ)) - uncertainty from sensor noise
i can compute this bound for any scenario and my experiments show actual bias is always ≤ bound
48 experiments across 4 dimensions:
- different network sizes (30-200 nodes)
- different byzantine fractions (10-48%)
- different network types (ideal, stochastic, lora)
- different attack types (random, spike, coordinated, mimicry)
every single experiment satisfied the theoretical bound with typical performance 50x better than worst case
to theory: first byzantine consensus with provable bounds on confidence intervals, first analysis under joint packet loss + byzantine faults
to practice: validated on real lora network simulation showing it works despite harsh communication constraints
to understanding: characterized the theory-practice gap (why typical performance vastly exceeds worst-case predictions)
honest about where this doesn't work:
- urban lora with very poor connectivity struggled (but stayed safe)
- only tested up to 200 nodes
- assumes byzantines can't physically jam the network
- convergence can be slow if network is very bad
these are not failures - they're honest characterization of when/where the algorithm works well vs struggles
pip install -r requirements.txtrequires: numpy, scipy, pandas, pytest, matplotlib
runs on python 3.8+, no exotic dependencies
all experimental data is visualized in publication-quality figures in figures/
main figures (for thesis/defense):
fig1_hero_theory_practice_gap.png- the core contribution (theory vs reality)fig2_vulnerability_matrix.png- threat model coverage heatmapfig3_network_degradation.png- graceful degradation under packet lossfig4_safe_operating_region.png- validated parameter space
these figures tell the story visually: provable guarantees + excellent typical performance + comprehensive validation
generate fresh figures:
python scripts/generate_publication_plots.pyolder baseline figures (fig1_scaling.png, fig2_byzantine_fraction.png, etc) are also available but less visually compelling
tests: 73/73 passing (100%)
experiments: 48/48 converged within bounds
bound violations: 0/48
publication readiness: conference submission ready
thesis readiness: defense ready
production readiness: research code, not production system
@mastersthesis{larkin2025byzantine,
author = {Zachary Larkin},
title = {Byzantine-Resilient Consensus in Stochastic Networks},
school = {Vanderbilt University},
year = {2025}
}


