Skip to content

feat: Selective context loading via ContextRouter hook to reduce 83KB base context #690

@jlacour-git

Description

@jlacour-git

Problem

Related: #678

PAI v3.0 injects ~83KB of context at SessionStart via LoadContext.hook.ts. The Algorithm spec alone (v1.5.0.md) is 78,606 bytes. On Opus with a 200K token window, this means ~25K tokens (~12-15%) consumed before the user types anything. With hook outputs, steering rules, relationship context, and active work listings, the real baseline is 25-33%.

This is a significant regression from v2.5, where the Algorithm spec was much smaller. Issue #678 reports users seeing 78% context usage after a single interaction.

The core issue: every interaction pays the full 83KB cost, regardless of whether a greeting, a skill invocation, or a deep algorithm task is being performed.

Proposal: Tiered Context Loading via a ContextRouter Hook

Split context loading into two stages:

Stage 1: SessionStart — Core Context Only (~10-15KB)

LoadContext.hook.ts injects only what every interaction needs:

  • Identity (principal name, DA name, voice ID)
  • Minimal format rules (the 3 format modes: FULL / ITERATION / MINIMAL)
  • Effort level selection table
  • Phase names (OBSERVE → LEARN) without full phase details
  • AI Steering Rules
  • A routing instruction: "Full Algorithm details will be provided based on task classification"

Stage 2: UserPromptSubmit — Selective Injection (~0-70KB)

A new ContextRouter.hook.ts classifies each prompt and injects only the relevant Algorithm sections:

Classification Extra Context Loaded Estimated Size
Greeting / acknowledgment Nothing 0 KB
Skill invocation Skill system docs ~3 KB
Standard algorithm task ISC rules + phase details ~25 KB
Extended+ algorithm task Full Algorithm + PRD + capabilities ~70 KB

Classification can be keyword-based (fast, no AI inference needed):

  • Starts with greeting words → greeting
  • Contains /skillname → skill invocation
  • Contains "Extended"/"Advanced"/"Deep" or is complex → full spec
  • Default → standard

Implementation Path

  1. Split the Algorithm monolith. The v1.5.0.md component (78KB) contains conceptually modular sections that are already separated by headers: ISC system, phase definitions, PRD template, loop mode, agent teams, capability registry. Split these into separate component files (e.g., algorithm-core.md, algorithm-isc.md, algorithm-prd.md, algorithm-capabilities.md, algorithm-loop.md, algorithm-teams.md).

  2. Modify RebuildPAI.ts to produce multiple output files (or one combined file plus individual section files) instead of a single monolithic SKILL.md.

  3. Create ContextRouter.hook.ts as a UserPromptSubmit hook that classifies and injects.

  4. Slim down LoadContext.hook.ts to only inject the core ~10KB.

Estimated Impact

Interaction Type Current Proposed Reduction
Greeting 83 KB ~10 KB 88%
Skill invocation 83 KB ~13 KB 84%
Standard task 83 KB ~35 KB 58%
Extended+ task 83 KB ~80 KB ~4%
Weighted average 83 KB ~25 KB ~70%

Additional Benefit: Deterministic Documentation Loading

Currently, SKILL.md's "Context Loading" section instructs the LLM to read additional files (PAISYSTEMARCHITECTURE.md, MEMORYSYSTEM.md, etc.) when relevant. This is non-deterministic — the LLM may or may not follow the instruction. Moving these reads into the ContextRouter hook makes them deterministic: if the prompt mentions "memory", the hook injects MEMORYSYSTEM.md content directly.

Risks & Mitigations

  • Misclassification: If the router under-classifies, the LLM lacks needed context. Mitigation: default to "standard" (not minimal), and include an escape hatch instruction in core context: "If you need full Algorithm details not present in your current context, state this and they will be provided on the next turn."
  • Cross-references: Algorithm sections reference each other (ISC rules reference PRD template). Mitigation: include brief summaries/pointers in each section, and load dependent sections together.
  • Latency: Keyword matching adds negligible latency (<5ms). AI-based classification would add ~300ms — not recommended for v1.

Happy to submit a PR for this if there's interest.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions