Problem
Related: #678
PAI v3.0 injects ~83KB of context at SessionStart via LoadContext.hook.ts. The Algorithm spec alone (v1.5.0.md) is 78,606 bytes. On Opus with a 200K token window, this means ~25K tokens (~12-15%) consumed before the user types anything. With hook outputs, steering rules, relationship context, and active work listings, the real baseline is 25-33%.
This is a significant regression from v2.5, where the Algorithm spec was much smaller. Issue #678 reports users seeing 78% context usage after a single interaction.
The core issue: every interaction pays the full 83KB cost, regardless of whether a greeting, a skill invocation, or a deep algorithm task is being performed.
Proposal: Tiered Context Loading via a ContextRouter Hook
Split context loading into two stages:
Stage 1: SessionStart — Core Context Only (~10-15KB)
LoadContext.hook.ts injects only what every interaction needs:
- Identity (principal name, DA name, voice ID)
- Minimal format rules (the 3 format modes: FULL / ITERATION / MINIMAL)
- Effort level selection table
- Phase names (OBSERVE → LEARN) without full phase details
- AI Steering Rules
- A routing instruction: "Full Algorithm details will be provided based on task classification"
Stage 2: UserPromptSubmit — Selective Injection (~0-70KB)
A new ContextRouter.hook.ts classifies each prompt and injects only the relevant Algorithm sections:
| Classification |
Extra Context Loaded |
Estimated Size |
| Greeting / acknowledgment |
Nothing |
0 KB |
| Skill invocation |
Skill system docs |
~3 KB |
| Standard algorithm task |
ISC rules + phase details |
~25 KB |
| Extended+ algorithm task |
Full Algorithm + PRD + capabilities |
~70 KB |
Classification can be keyword-based (fast, no AI inference needed):
- Starts with greeting words → greeting
- Contains
/skillname → skill invocation
- Contains "Extended"/"Advanced"/"Deep" or is complex → full spec
- Default → standard
Implementation Path
-
Split the Algorithm monolith. The v1.5.0.md component (78KB) contains conceptually modular sections that are already separated by headers: ISC system, phase definitions, PRD template, loop mode, agent teams, capability registry. Split these into separate component files (e.g., algorithm-core.md, algorithm-isc.md, algorithm-prd.md, algorithm-capabilities.md, algorithm-loop.md, algorithm-teams.md).
-
Modify RebuildPAI.ts to produce multiple output files (or one combined file plus individual section files) instead of a single monolithic SKILL.md.
-
Create ContextRouter.hook.ts as a UserPromptSubmit hook that classifies and injects.
-
Slim down LoadContext.hook.ts to only inject the core ~10KB.
Estimated Impact
| Interaction Type |
Current |
Proposed |
Reduction |
| Greeting |
83 KB |
~10 KB |
88% |
| Skill invocation |
83 KB |
~13 KB |
84% |
| Standard task |
83 KB |
~35 KB |
58% |
| Extended+ task |
83 KB |
~80 KB |
~4% |
| Weighted average |
83 KB |
~25 KB |
~70% |
Additional Benefit: Deterministic Documentation Loading
Currently, SKILL.md's "Context Loading" section instructs the LLM to read additional files (PAISYSTEMARCHITECTURE.md, MEMORYSYSTEM.md, etc.) when relevant. This is non-deterministic — the LLM may or may not follow the instruction. Moving these reads into the ContextRouter hook makes them deterministic: if the prompt mentions "memory", the hook injects MEMORYSYSTEM.md content directly.
Risks & Mitigations
- Misclassification: If the router under-classifies, the LLM lacks needed context. Mitigation: default to "standard" (not minimal), and include an escape hatch instruction in core context: "If you need full Algorithm details not present in your current context, state this and they will be provided on the next turn."
- Cross-references: Algorithm sections reference each other (ISC rules reference PRD template). Mitigation: include brief summaries/pointers in each section, and load dependent sections together.
- Latency: Keyword matching adds negligible latency (<5ms). AI-based classification would add ~300ms — not recommended for v1.
Happy to submit a PR for this if there's interest.
Problem
Related: #678
PAI v3.0 injects ~83KB of context at SessionStart via
LoadContext.hook.ts. The Algorithm spec alone (v1.5.0.md) is 78,606 bytes. On Opus with a 200K token window, this means ~25K tokens (~12-15%) consumed before the user types anything. With hook outputs, steering rules, relationship context, and active work listings, the real baseline is 25-33%.This is a significant regression from v2.5, where the Algorithm spec was much smaller. Issue #678 reports users seeing 78% context usage after a single interaction.
The core issue: every interaction pays the full 83KB cost, regardless of whether a greeting, a skill invocation, or a deep algorithm task is being performed.
Proposal: Tiered Context Loading via a ContextRouter Hook
Split context loading into two stages:
Stage 1: SessionStart — Core Context Only (~10-15KB)
LoadContext.hook.tsinjects only what every interaction needs:Stage 2: UserPromptSubmit — Selective Injection (~0-70KB)
A new
ContextRouter.hook.tsclassifies each prompt and injects only the relevant Algorithm sections:Classification can be keyword-based (fast, no AI inference needed):
/skillname→ skill invocationImplementation Path
Split the Algorithm monolith. The
v1.5.0.mdcomponent (78KB) contains conceptually modular sections that are already separated by headers: ISC system, phase definitions, PRD template, loop mode, agent teams, capability registry. Split these into separate component files (e.g.,algorithm-core.md,algorithm-isc.md,algorithm-prd.md,algorithm-capabilities.md,algorithm-loop.md,algorithm-teams.md).Modify
RebuildPAI.tsto produce multiple output files (or one combined file plus individual section files) instead of a single monolithic SKILL.md.Create
ContextRouter.hook.tsas a UserPromptSubmit hook that classifies and injects.Slim down
LoadContext.hook.tsto only inject the core ~10KB.Estimated Impact
Additional Benefit: Deterministic Documentation Loading
Currently, SKILL.md's "Context Loading" section instructs the LLM to read additional files (PAISYSTEMARCHITECTURE.md, MEMORYSYSTEM.md, etc.) when relevant. This is non-deterministic — the LLM may or may not follow the instruction. Moving these reads into the ContextRouter hook makes them deterministic: if the prompt mentions "memory", the hook injects MEMORYSYSTEM.md content directly.
Risks & Mitigations
Happy to submit a PR for this if there's interest.