FailproofAI · nk-ag · Jan 22, 2026 · Jan 21, 2026 · Jan 22, 2026
diff --git a/README.md b/README.md
@@ -3,56 +3,163 @@
 [![License: CC BY 4.0](https://img.shields.io/badge/License-CC%20BY%204.0-lightgrey.svg)](https://creativecommons.org/licenses/by/4.0/)
 [![Status: Draft](https://img.shields.io/badge/Status-Draft%20v0.1-orange)]()
 
-> **Architectural standards and best practices for building reliable AI Agents and LLM workflows. Defining the framework for AI Reliability Engineering (AIRE).**
+> **An open implementation guide for building reliable AI Agents at scale. Defining the practices for AI Reliability Engineering (AIRE).**
 
 ---
 
 ## Introduction
 
-As AI systems move from "experimental" prototypes to "mission-critical" production environments, reliability has emerged as the single biggest barrier to adoption. 
+As AI systems move from "experimental" prototypes to "mission-critical" production environments, reliability has emerged as the single biggest barrier to adoption.
 
-This repository serves as the **Open Standard for AI Reliability Engineering (AIRE)**. It documents the architectural patterns, testing frameworks, and guardrails that engineering teams use to achieve 99.9% reliability in non-deterministic systems and what does it even mean to be reliable in a non-deterministic system?. Further akin to SRE principles, this repository also documents principles for AI Reliability Engineering (AIRE)
+This repository serves as the **Open Standard for AI Reliability Engineering (AIRE)**. It documents the architectural patterns, testing frameworks, and operational practices that engineering teams use to achieve production-grade reliability in non-deterministic systems.
 
-It is not a theoretical academic paper. It is a living collection of **"Success Patterns"** gathered from the top 1% of engineering teams currently running agents at scale.
+It is not a theoretical academic paper. It is a living collection of **"Success Patterns"** gathered from practitioners running agents at scale.
+
+---
+
+## AIRE Principles
+
+*Guiding tenets inspired by SRE:*
+
+These five principles define the philosophical foundation of AIRE. They inform the practices detailed in the five pillars and help teams make trade-off decisions when designing reliable AI systems.
+
+### 1. Embrace Non-Determinism
+
+Accept that identical inputs will produce variable outputs. Design systems that succeed despite variance, not systems that assume consistency.
+
+**Key Insight:** AI systems are probabilistic reasoners. Don't try to make them deterministic-build resilience around their non-determinism through structured outputs, guardrails, and fallback paths.
+
+### 2. Reliability is a Feature
+
+Reliability competes with velocity for engineering resources. Treat it as a first-class product requirement with explicit budgets, not an afterthought.
+
+**Key Insight:** Allocate dedicated engineering time (e.g., 20% of sprints) to reliability work: golden dataset updates, eval pipeline maintenance, incident reviews.
+
+### 3. Measure, Don't Assume
+
+If you cannot quantify the reliability of your AI system, you do not have a reliable AI system. Intuition is not evidence.
+
+**Key Insight:** Track concrete metrics (hallucination rate <0.1%, HITL rate <10%, uptime >99.9%). Block deployments if metrics degrade.
+
+### 4. Fail Gracefully, Fail Informatively
+
+Every failure should preserve context, enable recovery, and generate learnings. Silent failures are unacceptable.
+
+**Key Insight:** Save checkpoints, log Chain of Thought reasoning, return user-friendly errors, and ensure workflows can resume after crashes.
+
+### 5. Humans as Fallback, Not Crutch
+
+Design for autonomous operation. Human escalation is a safety net for edge cases, not a substitute for robust engineering.
+
+**Key Insight:** Reduce HITL rate over time through active learning. Start at 100% human review, target <10% through continuous improvement.
+
+📖 **[Read the detailed AIRE Principles guide →](docs/principles.md)**
+
+---
 
 ## Core Pillars of AIRE
 
-We define the stability of an Agentic System through four core pillars:
+We define the reliability of an Agentic System through five core pillars:
+
+### 1. Resilient Architecture
+
+*Building systems that gracefully handle failures, scale under load, and recover from errors.*
+
+Resilient architecture establishes the structural foundation for reliable AI systems. It encompasses:
+
+- **Elastic Auto-Scaling** - Horizontal and vertical scaling strategies for unpredictable AI workloads
+- **State Management** - Checkpoint-based recovery enabling workflows to resume from last checkpoint after failures (not restart from scratch)
+- **Circuit Breakers** - Fault tolerance patterns that prevent cascading failures by failing fast when services degrade
+- **Fallback Paths** - Multi-tier fallback strategies (GPT-4 → GPT-3.5 → Rules → Human)
+- **The Reliability Stack Pattern** - Separating probabilistic reasoning (LLM) from deterministic safety (guardrails)
+
+**Key Metrics:** Resumability Rate >99%, Circuit Breaker Activations <10/day, Fallback Usage Rate <15%, MTTR <5 minutes
+
+📖 **[Read the full Resilient Architecture guide →](docs/pillars/resilient-architecture.md)**
 
-### 1. The Reliability Stack (Architecture)
-*Separating the "Brain" from the "Governor".*
+---
+
+### 2. Cognitive Reliability
+
+*Ensuring AI agents produce accurate, consistent, and trustworthy outputs.*
+
+Cognitive reliability addresses the correctness problem - ensuring outputs are grounded, validated, and trustworthy:
 
-* **The Core Stack:** Probabilistic components (LLMs, Prompts) focused on reasoning.
-* **The Reliability Stack:** Deterministic components (Guardrails, Durable Queues, Verifiers) focused on safety.
-* **Principle:** Never rely on the LLM to police itself.
+- **Self-Reflection & Correction** - Chain-of-thought with reflection, multi-agent debate for high-stakes decisions
+- **Structured Outputs** - JSON schema validation, forced choice enums, regex-constrained generation
+- **Human-in-the-Loop (HITL) Protocols** - Confidence-based escalation with design patterns to reduce HITL over time through active learning
+- **Drift Detection** - Input drift (distribution changes), output drift (confidence shifts), model drift (version changes)
 
-### 2. Eval-Driven Development (EDD)
-*Moving from "Vibes" to Engineering.*
+**Key Metrics:** Hallucination Rate <0.1%, Groundedness >95%, HITL Rate <10%, Confidence Calibration within 10%
 
-* **Golden Datasets:** Regression suites of 100+ questions run before every deploy.
-* **Unit Testing Agents:** Synthetic data tests for specific skills (e.g., API calling syntax).
-* **Metrics:** Standardized scoring for Hallucination Rate (<0.1%) and Groundedness.
+📖 **[Read the full Cognitive Reliability guide →](docs/pillars/cognitive-reliability.md)**
 
-### 3. Durable Execution & State
-*Fault tolerance for long-running workflows.*
+---
+
+### 3. Quality & Lifecycle
 
-* **Resumability:** If an agent crashes on Step 4 of 10, it must resume at Step 4, not restart.
-* **Graceful Degradation:** Protocols for handing off to humans with full context when confidence drops.
+*Moving from "vibes-based" development to rigorous testing and continuous improvement.*
 
-### 4. Observability 2.0
-*Tracing the "Thought" Process.*
+Quality & Lifecycle practices define how to test, deploy, and continuously improve AI systems:
 
-* **Chain of Thought (CoT) Logging:** Tracing logic, not just I/O.
-* **Cost Observability:** Real-time token tracking per tenant/workflow.
+- **Evals-Driven Deployments** - CI/CD gates with golden datasets, staged rollouts (canary → gradual → full), automatic rollback triggers
+- **Golden Datasets** - Curated regression suites (60% core capabilities, 30% edge cases, 10% adversarial), versioned in Git, continuously updated
+- **Unit Testing Agents** - Tool calling tests, prompt adherence tests, synthetic data tests
+- **Online vs Offline Evals** - Pre-deployment regression testing (offline) + post-deployment drift detection (online)
+- **Feedback Loops** - Production failures → HITL corrections → golden dataset updates → model retraining
 
-### 5. Principles of AIRE
-*Guiding tenets for AI Reliability Engineering, inspired by SRE.*
+**Key Metrics:** Golden Dataset Accuracy >95%, Deployment Success Rate >90%, User Satisfaction >80%, Feedback Loop Latency <7 days
 
-* **Embrace Non-Determinism:** Accept that identical inputs will produce variable outputs. Design systems that succeed despite variance, not systems that assume consistency.
-* **Reliability is a Feature:** Reliability competes with velocity for engineering resources. Treat it as a first-class product requirement with explicit budgets, not an afterthought.
-* **Measure, Don't Assume:** If you cannot quantify the reliability of your AI system, you do not have a reliable AI system. Intuition is not evidence.
-* **Fail Gracefully, Fail Informatively:** Every failure should preserve context, enable recovery, and generate learnings. Silent failures are unacceptable.
-* **Humans as Fallback, Not Crutch:** Design for autonomous operation. Human escalation is a safety net for edge cases, not a substitute for robust engineering.
+📖 **[Read the full Quality & Lifecycle guide →](docs/pillars/quality-lifecycle.md)**
+
+---
+
+### 4. Security
+
+*Protecting systems, data, and users from risks introduced by autonomous agents.*
+
+Security for AI agents differs from traditional software-agents are autonomous decision-makers that can be manipulated to exceed intended authority:
+
+- **Just-in-Time (JIT) Privilege Access** - Scoped tokens (action + resourceId) with automatic expiration (<5 minutes), step-up authentication for high-risk actions
+- **Audit Logs for Internal Thinking** - Logging reasoning (Chain of Thought), not just inputs/outputs; structured logs for incident investigation
+- **Guardrails** - Deterministic hard stops at three layers: input guardrails (prompt injection detection, PII redaction), output guardrails (sensitive data leakage prevention), action guardrails (rate limits, monetary limits)
+- **Prompt Injection Defenses** - Instruction hierarchy, input sanitization, multi-model validation, sandboxing
+- **Data Privacy in Context Windows** - Context isolation per session, PII redaction, ephemeral context for sensitive data, encryption at rest, GDPR compliance
+
+**Key Metrics:** Prompt Injection Attempts <10/day, Jailbreak Success Rate <0.1%, PII Leakage Incidents 0, MTTD <5 minutes
+
+📖 **[Read the full Security guide →](docs/pillars/security.md)**
+
+---
+
+### 5. Operational Excellence & Team Culture
+
+*Establishing SLAs, error budgets, team structures, and operational practices that enable reliable AI systems to scale.*
+
+Operational Excellence bridges the gap between technical architecture and organizational culture. While the first four pillars define *what* to build, this pillar defines *how* teams operate, measure, and continuously improve AI systems at scale:
+
+- **AI-Specific SLAs & Error Budgets** - Service Level Objectives for availability, latency, quality, safety, and efficiency; error budget policies for balancing reliability with innovation velocity
+- **Team Structure & Shared Responsibility** - Product teams own agents end-to-end; embedded AI Reliability Engineers (AIREs) with 20% time allocation; central platform team provides infrastructure
+- **Progressive Autonomy Maturity Model** - Five levels of agent autonomy (L0: Human-Driven → L4: Autonomous), reducing HITL rate from 100% to <5% over time
+- **Reliability Reviews** - Weekly metric reviews, monthly postmortems, error budget tracking, SLO compliance monitoring
+
+**Key Metrics:** SLO Compliance >95%, Error Budget Remaining >25%, HITL Rate <10%, Autonomy Level L3+, Time to Autonomy <6 months
+
+📖 **[Read the full Operational Excellence guide →](docs/pillars/operational-excellence.md)**
+
+---
+
+## Getting Started
+
+**New to AIRE?** Start with the **[Getting Started Guide →](docs/getting-started.md)** for a step-by-step adoption roadmap:
+
+- **Phase 1 (Week 1-2):** Assess current state, measure baseline metrics
+- **Phase 2 (Month 1):** Quick wins - golden dataset, guardrails, audit logging
+- **Phase 3 (Month 2-3):** Foundation - circuit breakers, state persistence, CI/CD evals
+- **Phase 4 (Month 4-6):** Maturity - feedback loops, drift detection, JIT access
+- **Phase 5 (Month 6+):** Excellence - hallucination rate <0.1%, HITL rate <10%, uptime 99.9%+
+
+**Want to dive deep?** Explore the [complete documentation →](https://aire.exosphere.host)
 
 ---
 
@@ -71,16 +178,29 @@ You get to shape the future of AI reliability engineering and get recognized for
 | Benefit | Details |
 |---------|---------|
 | **Shape the Standard** | Your operational insights become codified best practices. Influence how the industry approaches AI reliability for years to come |
-| **Industry Recognition** | Listed in the [Contributors Registry](CONTRIBUTORS.md) as a contributor to the standards of AI relibility |
+| **Industry Recognition** | Listed in the [Contributors Registry](CONTRIBUTORS.md) as a contributor to the standards of AI reliability |
 | **Peer Network** | Join a private forum of engineering leaders exchanging reliability patterns across enterprises |
 | **Early Access** | Preview new sections and reference architectures before public release |
 | **Thank you gift** | We will send you a gift hamper courtesy to our sponsors |
 
 ---
 
-## Repository Structure (Coming Soon)
-
-We are actively populating this repository with the success patterns from the study and the playbook.
+## Repository Structure
+
+```
+docs/
+├── getting-started.md             # Adoption roadmap for organizations
+├── pillars/
+│   ├── resilient-architecture.md  # Pillar 1: Fault tolerance, scaling, recovery
+│   ├── cognitive-reliability.md   # Pillar 2: Accuracy, consistency, drift detection
+│   ├── quality-lifecycle.md       # Pillar 3: Testing, deployment, feedback loops
+│   ├── security.md                # Pillar 4: JIT access, guardrails, audit logs
+│   └── operational-excellence.md  # Pillar 5: SLAs, team structure, progressive autonomy
+└── appendix/
+    ├── principles.md              # AIRE Principles (5 guiding tenets)
+    ├── metrics-framework.md       # Three-tier metrics framework
+    └── glossary.md                # Key terms and definitions
+```
 
 ---
 
@@ -97,10 +217,10 @@ We welcome Pull Requests (PRs) from engineers who have solved specific reliabili
 
 <a href="https://exosphere.host"><img src="./assets/sponsors/exosphere.png" alt="ExosphereHost Inc." width="75"></a>
 
-Contact nivedit@exosphere.host to sponsor this work.
+Contact nikita@exosphere.host to sponsor this work.        
 
 ## License
 
 This work is licensed under a [Creative Commons Attribution 4.0 International License](http://creativecommons.org/licenses/by/4.0/).
 
-You are free to share and adapt this material for any purpose, even commercially, as long as you give appropriate credit.
+You are free to share and adapt this material for any purpose, even commercially, as long as you give appropriate credit.