A catalogue of failure modes discovered while building and debugging complex systems.
Most notes start with a simple engineering question:
what invariant is supposed to hold here — and how could it fail?
My work often puts me close to systems where correctness matters: cryptographic protocols, distributed infrastructure, financial workflows, and increasingly AI systems.
Across these environments the pattern is consistent:
systems rarely fail where designers expect.
Failures usually appear as small violations of invariants:
- hidden assumptions
- ambiguous state transitions
- adversarial or malformed inputs
- edge conditions under load
This repository records short engineering investigations into those failures.
Each note follows the same structure:
- Question — what assumption might fail?
- Context — where the assumption appears
- Invariant — the property that must hold
- Observation — evidence suggesting a violation
- Failure Scenario — how the invariant could break
- Detection — how the violation could be detected
- Guardrail — the engineering change that prevents it
The goal is simple:
make failure modes visible before they become incidents.
These failures appear in different domains but follow similar patterns:
- distributed infrastructure
- cryptographic protocols
- permission and identity systems
- AI model behavior and evaluation
- adversarial system interactions
Failure modes are grouped by system type.
- ai-systems — model behavior and evaluation failures
- distributed-systems — consensus, identity, and coordination failures
- application-systems — lifecycle, permissions, and state management failures
- protocols — transport and API boundary failures
Failure is inevitable.
Correctness is engineered.