Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
182 commits
Select commit Hold shift + click to select a range
f7d4c6c
docs(experimental): add native protocol roadmap
Mirrowel May 30, 2026
58415f5
feat(protocols): add native protocol core
Mirrowel May 30, 2026
004d471
feat(protocols): add OpenAI chat adapter
Mirrowel May 30, 2026
1a6b564
feat(protocols): add Anthropic messages adapter
Mirrowel May 30, 2026
7df4ae9
feat(protocols): add Gemini adapter
Mirrowel May 30, 2026
aaefc64
feat(protocols): add Responses adapter
Mirrowel May 30, 2026
8a25309
fix(protocols): harden Phase 1 review findings
Mirrowel May 30, 2026
cb66027
fix(protocols): preserve native block and output fidelity
Mirrowel May 30, 2026
46696d5
docs(experimental): plan transform trace logging
Mirrowel May 30, 2026
1b84321
feat(logging): add transform trace writer
Mirrowel May 30, 2026
78fac93
feat(logging): trace transaction transform passes
Mirrowel May 30, 2026
f201626
fix(logging): harden transform trace correlation
Mirrowel May 30, 2026
ef3cc19
docs(experimental): plan adapters and field cache
Mirrowel May 30, 2026
80f4506
feat(adapters): add payload adapter registry
Mirrowel May 30, 2026
eef4383
feat(field-cache): add rules and path helpers
Mirrowel May 30, 2026
ce9d660
feat(field-cache): add scoped cache engine
Mirrowel May 30, 2026
790e52e
test(field-cache): cover adapter and cache trace passes
Mirrowel May 30, 2026
bcb4a58
feat(providers): declare protocol adapter hooks
Mirrowel May 30, 2026
27aa7c1
fix(field-cache): close Phase 3 review gaps
Mirrowel May 30, 2026
4caf56a
docs(experimental): plan responses api
Mirrowel May 30, 2026
c881602
feat(responses): add response storage
Mirrowel May 30, 2026
f3541ab
feat(responses): add chat bridge
Mirrowel May 30, 2026
f1839cd
feat(responses): add response service
Mirrowel May 30, 2026
e7f0e7d
feat(responses): add non-stream routes
Mirrowel May 30, 2026
1b02727
feat(responses): stream HTTP SSE events
Mirrowel May 30, 2026
05108ff
fix(responses): close Phase 4 review gaps
Mirrowel May 30, 2026
456a6bb
docs(experimental): plan provider protocol overhaul
Mirrowel May 30, 2026
d3028ad
feat(native-provider): add opt-in executor foundation
Mirrowel May 30, 2026
6eef5fa
feat(native-provider): add streaming foundation
Mirrowel May 30, 2026
34483ef
feat(providers): add Claude Code native skeleton
Mirrowel May 30, 2026
d0abac0
feat(providers): add Codex native skeleton
Mirrowel May 30, 2026
eec7544
feat(providers): add Copilot native skeleton
Mirrowel May 30, 2026
ae7dbab
feat(providers): restore Antigravity native skeleton
Mirrowel May 30, 2026
4c2b34b
feat(providers): declare Gemini CLI native protocol metadata
Mirrowel May 30, 2026
42168ce
fix(native-provider): align provider adapters and trace passes
Mirrowel May 30, 2026
b38054c
docs(experimental): plan routing fallback groups
Mirrowel May 30, 2026
ce9d085
feat(routing): add fallback group primitives
Mirrowel May 30, 2026
036eecf
feat(routing): add target context cloning
Mirrowel May 30, 2026
589603c
feat(routing): add fallback attempt runner
Mirrowel May 30, 2026
9293035
feat(routing): integrate non-streaming fallback attempts
Mirrowel May 30, 2026
744cf59
feat(routing): select native custom and fallback execution
Mirrowel May 30, 2026
9fdaf31
feat(routing): add streaming fallback policy
Mirrowel May 30, 2026
94d800f
fix(routing): wire resolver and align fallback errors
Mirrowel May 30, 2026
2fbfca9
docs(experimental): plan retry cooldown failover cleanup
Mirrowel May 31, 2026
6498eef
feat(retry): add retry policy helpers
Mirrowel May 31, 2026
3998038
fix(cooldown): preserve longer provider cooldowns
Mirrowel May 31, 2026
0ac6c50
feat(retry): activate provider cooldowns
Mirrowel May 31, 2026
207b7a2
feat(routing): honor fallback group policies
Mirrowel May 31, 2026
f197d2e
feat(routing): summarize fallback target failures
Mirrowel May 31, 2026
f954668
fix(retry): activate cooldowns for stream failures
Mirrowel May 31, 2026
4f5d650
docs(experimental): plan streaming library upgrade
Mirrowel May 31, 2026
9bd1ef7
feat(streaming): add stream event primitives
Mirrowel May 31, 2026
e756795
feat(streaming): centralize stream retry policy
Mirrowel May 31, 2026
b6760d3
feat(streaming): add stream error decisions
Mirrowel May 31, 2026
1eb3868
feat(streaming): trace stream lifecycle metrics
Mirrowel May 31, 2026
1374388
feat(streaming): add native streaming opt-in seam
Mirrowel May 31, 2026
1c67bb8
feat(streaming): trace responses stream metrics
Mirrowel May 31, 2026
052a62c
docs(experimental): plan usage quota cost
Mirrowel May 31, 2026
ec38cc0
feat(usage): add normalized usage records
Mirrowel May 31, 2026
e2fd825
feat(usage): add advisory cost calculator
Mirrowel May 31, 2026
4d194e4
feat(usage): account for executor responses
Mirrowel May 31, 2026
cf964f2
feat(usage): account for stream usage
Mirrowel May 31, 2026
665b14a
feat(usage): add quota snapshots
Mirrowel May 31, 2026
07c7a21
feat(usage): trace responses and native usage
Mirrowel May 31, 2026
f047bf1
fix(usage): close Phase 9 review gaps
Mirrowel May 31, 2026
751ef72
chore(usage): remove unused accounting wrapper
Mirrowel May 31, 2026
a9f5729
docs(experimental): plan config polish
Mirrowel May 31, 2026
3059fcf
feat(config): add experimental config loader
Mirrowel May 31, 2026
3996cab
feat(config): support json routing config
Mirrowel May 31, 2026
c2d5dbb
feat(config): add configured model pricing
Mirrowel May 31, 2026
422f8fb
feat(config): add stream runtime settings
Mirrowel May 31, 2026
f417c08
docs(config): document experimental knobs
Mirrowel May 31, 2026
b65f948
test(config): cover runtime pricing config
Mirrowel May 31, 2026
6e641ae
docs(experimental): plan protocol breadth correction
Mirrowel May 31, 2026
fd2f6a2
feat(protocols): add operation model
Mirrowel May 31, 2026
93487be
feat(protocols): add non-chat protocol adapters
Mirrowel May 31, 2026
41b6284
fix(protocols): stamp operation metadata consistently
Mirrowel May 31, 2026
90e26ce
fix(protocols): close audio and ollama review gaps
Mirrowel May 31, 2026
495904e
fix(protocols): harden count tokens and ollama semantics
Mirrowel May 31, 2026
77f7666
fix(protocols): close final operation hardening notes
Mirrowel May 31, 2026
6099709
docs(experimental): plan transform trace coverage correction
Mirrowel May 31, 2026
bd4a904
feat(logging): trace provider transform boundaries
Mirrowel May 31, 2026
25e631d
feat(logging): trace executor response boundaries
Mirrowel May 31, 2026
4642916
feat(logging): trace native adapter cache boundaries
Mirrowel May 31, 2026
6b261b7
feat(logging): align responses trace boundaries
Mirrowel May 31, 2026
c0e745c
fix(logging): close transform trace review gaps
Mirrowel May 31, 2026
55d6737
fix(logging): harden trace redaction and done events
Mirrowel May 31, 2026
0880df1
fix(logging): close phase 2b acceptance gaps
Mirrowel May 31, 2026
1cc1aba
fix(logging): avoid responses trace work when disabled
Mirrowel May 31, 2026
0810c18
docs(experimental): plan field cache runtime correction
Mirrowel May 31, 2026
ba79340
fix(field-cache): persist native cache per executor
Mirrowel May 31, 2026
8d60abd
feat(field-cache): implement runtime cache modes
Mirrowel May 31, 2026
b810c5c
feat(field-cache): merge configured native rules
Mirrowel May 31, 2026
626f2d0
fix(field-cache): close runtime integration gaps
Mirrowel May 31, 2026
3ff29ac
fix(field-cache): harden native trace and config semantics
Mirrowel May 31, 2026
7ea21ec
fix(field-cache): preserve config errors and provider-state redaction
Mirrowel May 31, 2026
f00b48a
fix(field-cache): redact executor native response traces
Mirrowel May 31, 2026
6740ed3
fix(field-cache): redact native stream traces
Mirrowel May 31, 2026
7c969aa
fix(field-cache): redact stream-event envelope paths
Mirrowel May 31, 2026
3055003
docs(experimental): plan responses correction
Mirrowel May 31, 2026
41cfede
feat(responses): add internal session continuation hints
Mirrowel May 31, 2026
ab70413
feat(responses): add storage policy controls
Mirrowel May 31, 2026
9995558
feat(responses): emit transport neutral stream events
Mirrowel May 31, 2026
d647c60
fix(responses): harden continuation hints and stream failures
Mirrowel May 31, 2026
ad417b7
fix(responses): bind continuations to response anchors
Mirrowel May 31, 2026
12cbc01
fix(responses): align continuation anchor namespace
Mirrowel May 31, 2026
86d3503
docs(experimental): plan provider native correction
Mirrowel May 31, 2026
d366b84
feat(providers): resolve native operations per provider
Mirrowel May 31, 2026
60fd7c1
test(providers): cover priority native execution paths
Mirrowel May 31, 2026
95b9135
test(providers): preserve gemini cli custom path
Mirrowel May 31, 2026
2166844
fix(providers): harden native request safety
Mirrowel May 31, 2026
4247fa5
fix(providers): close native streaming and antigravity gaps
Mirrowel May 31, 2026
f0b9920
fix(providers): make antigravity envelope idempotence explicit
Mirrowel May 31, 2026
124141c
docs(experimental): plan routing fallback correction
Mirrowel May 31, 2026
a5d3464
fix(routing): enforce hard-stop fallback policy
Mirrowel May 31, 2026
681f2dc
fix(routing): harden structured fallback decisions
Mirrowel May 31, 2026
e15c185
fix(streaming): classify control frames before fallback
Mirrowel May 31, 2026
a957f08
fix(routing): close fallback review gaps
Mirrowel May 31, 2026
bbe704d
fix(routing): preserve stream hard-stop semantics
Mirrowel May 31, 2026
79c9633
fix(routing): complete structured error classification
Mirrowel May 31, 2026
49e07f8
fix(routing): preserve context-window dict classification
Mirrowel May 31, 2026
e29b15d
docs(experimental): plan retry cooldown correction
Mirrowel May 31, 2026
4181413
feat(cooldown): add provider model scoped cooldowns
Mirrowel May 31, 2026
9535bae
feat(retry): add scoped cooldown decisions
Mirrowel May 31, 2026
309f5a1
feat(retry): wire scoped cooldowns into executor
Mirrowel May 31, 2026
0b8551c
feat(streaming): expose scoped cooldown decisions
Mirrowel May 31, 2026
45ebfa9
fix(retry): guard stream retries after visible output
Mirrowel May 31, 2026
b1e0ade
fix(retry): latch streaming policy decisions
Mirrowel May 31, 2026
6b791ca
docs(experimental): plan streaming hardening correction
Mirrowel May 31, 2026
508d13b
feat(streaming): add heartbeat runtime settings
Mirrowel May 31, 2026
664e322
feat(streaming): harden stream lifecycle handling
Mirrowel May 31, 2026
ec7b1aa
feat(native): support httpx stream transport
Mirrowel May 31, 2026
1946c75
fix(streaming): preserve retry safety around heartbeats
Mirrowel May 31, 2026
24e988f
fix(streaming): harden native stream sentinels
Mirrowel May 31, 2026
9e082a5
fix(streaming): initialize stream timeout state
Mirrowel May 31, 2026
f13a3b8
docs(experimental): plan usage cost correction
Mirrowel May 31, 2026
5d8da9c
feat(usage): preserve provider reported costs
Mirrowel May 31, 2026
aa9ba90
feat(usage): prefer reported provider costs
Mirrowel May 31, 2026
bf13508
feat(streaming): account for SSE cost events
Mirrowel May 31, 2026
ac2e920
feat(usage): trace reported costs across surfaces
Mirrowel May 31, 2026
d127a08
fix(usage): preserve streaming and native costs
Mirrowel May 31, 2026
41eebe8
docs(experimental): plan config wiring correction
Mirrowel May 31, 2026
6859ec5
feat(config): add retry and responses settings
Mirrowel May 31, 2026
bbeba37
feat(config): wire runtime settings surfaces
Mirrowel May 31, 2026
2f47569
fix(config): harden experimental settings parsing
Mirrowel May 31, 2026
36c11ce
fix(config): reject more secret-like keys
Mirrowel May 31, 2026
8b1f1bc
docs(experimental): record third-pass audit findings
Mirrowel May 31, 2026
c947273
docs(experimental): plan protocol guardrail fixes
Mirrowel May 31, 2026
5d2113f
fix(protocols): format public usage shapes
Mirrowel May 31, 2026
bfcfa33
fix(protocols): close Phase 1c review gaps
Mirrowel May 31, 2026
9d69ce1
docs(experimental): plan transform trace completion
Mirrowel May 31, 2026
8d2c2fd
fix(trace): complete transform trace coverage
Mirrowel May 31, 2026
3ea6113
fix(trace): address Phase 2c review gaps
Mirrowel May 31, 2026
005685e
fix(trace): trace responses stream store failures
Mirrowel May 31, 2026
d3e2bfd
fix(trace): normalize dict responses before final trace
Mirrowel May 31, 2026
5a4d42b
docs(experimental): plan field-cache runtime completion
Mirrowel May 31, 2026
e29cd9e
fix(field-cache): complete native runtime coverage
Mirrowel May 31, 2026
58ea43a
fix(field-cache): close Phase 3c review gaps
Mirrowel May 31, 2026
855486c
docs(experimental): plan responses correction
Mirrowel May 31, 2026
fdabe10
fix(responses): wire storage and lineage
Mirrowel May 31, 2026
8d0c88f
fix(responses): preserve tool-call lineage
Mirrowel May 31, 2026
4d727a3
fix(responses): replay tool-result lineage
Mirrowel May 31, 2026
6ee91b9
docs(experimental): plan provider-native correction
Mirrowel May 31, 2026
342a62e
fix(providers): return client protocol from native calls
Mirrowel May 31, 2026
f53cb86
fix(providers): harden native selection and streaming
Mirrowel May 31, 2026
ff51efe
fix(providers): close native streaming and cache gaps
Mirrowel May 31, 2026
29d66ff
fix(providers): preserve plain antigravity preview alias
Mirrowel May 31, 2026
c225c26
docs(experimental): plan routing fallback correction
Mirrowel May 31, 2026
1fe74bf
fix(routing): harden fallback selection
Mirrowel May 31, 2026
bb2c497
fix(routing): honor native stream opt outs
Mirrowel May 31, 2026
c40c6f2
docs(experimental): plan retry cooldown correction
Mirrowel May 31, 2026
f7a6aea
fix(retry): enforce cooldown backoff semantics
Mirrowel May 31, 2026
81984ed
docs(experimental): plan streaming hardening correction
Mirrowel May 31, 2026
f8600a6
fix(streaming): harden responses runtime streams
Mirrowel May 31, 2026
177e4f4
fix(streaming): preserve response timeout deadlines
Mirrowel May 31, 2026
f458603
fix(streaming): consume completed heartbeat tasks
Mirrowel May 31, 2026
a5f2731
fix(streaming): keep responses ttfb end-to-end
Mirrowel May 31, 2026
46e4015
docs(experimental): plan usage cost correction
Mirrowel May 31, 2026
d8ad65c
fix(usage): preserve provider cost metadata
Mirrowel May 31, 2026
ee7b9b5
fix(usage): handle reference stream cost shapes
Mirrowel May 31, 2026
41e60e0
fix(usage): preserve all stream cost metadata
Mirrowel May 31, 2026
96ef2f6
fix(usage): normalize estimated cost fields
Mirrowel May 31, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
75 changes: 75 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
Expand Up @@ -333,6 +333,81 @@
# Default: false
# STREAM_RETRY_ON_REASONING_ONLY=false

# --- Optional Structured Config File ---
# Optional JSON config for structured experimental settings such as fallback
# groups, model routes, advisory pricing, streaming observability, and field
# cache rules. Environment variables override JSON config. Do not put API keys,
# OAuth tokens, bearer tokens, or authorization headers in this file.
# Supported top-level sections include: routing, pricing, streaming, retry,
# responses, field_cache, and providers. The providers section is for non-secret
# provider tuning only; credentials must stay in env or provider credential files.
# LLM_PROXY_CONFIG_FILE=./config/llm-proxy.json

# --- Fallback Groups and Model Routes ---
# Ordered fallback groups try targets in order when the previous target fails
# with a retryable provider/category error. Execution suffixes:
# @auto Let the provider choose custom/native/LiteLLM behavior
# @native Require native protocol execution
# @custom Require provider custom execution
# @litellm_fallback Explicitly use LiteLLM fallback
# FALLBACK_GROUPS=code_chain
# FALLBACK_GROUP_CODE_CHAIN=codex/gpt-5.1-codex@native,openai/gpt-5.1@litellm_fallback
# MODEL_ROUTE_CODE=group:code_chain

# --- Provider Cooldown Activation ---
# Provider-level cooldown is conservative and only intended for large/global
# retry-after events, not every per-credential quota error. Model-capacity
# errors can start a model-scoped cooldown without blocking unrelated models.
# PROVIDER_COOLDOWN_MIN_SECONDS=10
# PROVIDER_COOLDOWN_DEFAULT_SECONDS=30
# PROVIDER_COOLDOWN_ON_QUOTA=false
# Repeated transient provider/model failures can increase bounded backoff.
# PROVIDER_BACKOFF_WINDOW_SECONDS=60
# PROVIDER_BACKOFF_THRESHOLD=3
# PROVIDER_BACKOFF_BASE_SECONDS=0 # 0/unset means use provider cooldown default
# PROVIDER_BACKOFF_MAX_SECONDS=300
# FAILURE_HISTORY_MAX_ENTRIES=200

# --- Responses API Store Policy ---
# Responses are stored in-memory by default. Use provider_cache for durable JSON
# storage via the existing provider-cache layer. TTL/max limits are disabled
# when unset or <= 0. Failed stream responses are stored by default; in-progress
# streaming state is disabled unless explicitly enabled.
# RESPONSES_STORE_BACKEND=memory
# RESPONSES_STORE_CACHE_NAME=responses
# RESPONSES_STORE_CACHE_PREFIX=responses
# RESPONSES_STORE_CACHE_DIR=
# RESPONSES_STORE_CACHE_MEMORY_TTL_SECONDS=3600
# RESPONSES_STORE_CACHE_DISK_TTL_SECONDS=172800
# RESPONSES_STORE_TTL_SECONDS=0
# RESPONSES_STORE_MAX_ITEMS=0
# RESPONSES_STORE_FAILED=true
# RESPONSES_STORE_IN_PROGRESS=false

# --- Streaming Observability ---
# Stream lifecycle metrics are traced by default. TTFB/stall timeout values are
# active when set to >0; keep at 0 to disable. Heartbeats are SSE comments and
# do not count as visible output.
# STREAM_TRACE_METRICS=true
# STREAM_TTFB_TIMEOUT_SECONDS=0
# STREAM_STALL_TIMEOUT_SECONDS=0
# STREAM_HEARTBEAT_INTERVAL_SECONDS=0
# STREAM_HEARTBEAT_SECONDS=0 # Legacy alias
# STREAM_CANCEL_UPSTREAM_ON_DISCONNECT=true

# --- Advisory Model Pricing ---
# Per-token advisory prices used only when providers do not report actual cost.
# Precedence: skip-cost provider setting > provider-reported cost/SSE cost event
# > provider explicit pricing > env pricing > JSON pricing > LiteLLM metadata.
# Streaming providers can report actual cost with `: cost {...}` comments or
# `event: cost` frames.
# Env names sanitize provider/model by replacing non-alphanumerics with `_`.
# MODEL_PRICE_OPENAI_GPT_5_1_INPUT=0.000001
# MODEL_PRICE_OPENAI_GPT_5_1_OUTPUT=0.00001
# MODEL_PRICE_OPENAI_GPT_5_1_CACHE_READ=0.0000001
# MODEL_PRICE_OPENAI_GPT_5_1_CACHE_WRITE=0.000001
# MODEL_PRICE_OPENAI_GPT_5_1_REASONING=0.00001

# ------------------------------------------------------------------------------
# | [ADVANCED] HTTP Timeout Configuration |
# ------------------------------------------------------------------------------
Expand Down
132 changes: 132 additions & 0 deletions docs/experimental/00-master-plan.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
# Experimental Native Protocol Roadmap

This branch is for a long-running experimental rewrite that makes native protocol support the first-class extension point of `rotator_library`, while preserving the existing credential rotation, quota, fair-cycle, session tracking, and provider plugin strengths.

## Operating Rules

- Work only on the `experimental` branch.
- Keep all repository work inside `C:\Projects\test\LLM-API-Key-Proxy` and child paths.
- Treat commits as checkpoints. A phase may contain many commits.
- Commit messages must include a body describing what changed, why, tests run, and follow-up considerations.
- Do not commit phase reports written for the user unless explicitly requested. Planning docs under `docs/experimental/` are committed.
- Before each phase implementation, first produce a fresh exhaustive phase plan in conversation text, based on the current code state. Only after that plan is settled should it be written to `docs/experimental/phase-N-*.md`.
- After each phase implementation, call both `explore` and `explore-heavy` agents to review the work against the phase plan, external reference areas, and current proxy behavior. Fix findings and re-review as needed.
- Keep LiteLLM as a fallback path for protocols/providers that are not natively covered yet. Native protocol support should be preferred when available.

## Strategic Goal

The target architecture is:

```text
client API request
-> protocol parse into unified representation
-> field-cache injection
-> adapter chain
-> provider override hooks
-> provider-native request build
-> provider execution and credential rotation
-> provider-native response/stream parse
-> field-cache extraction
-> adapter chain
-> protocol formatting for the client
-> transaction logging for every transform state
```

Providers should be able to declare an existing protocol and only override the parts that are genuinely provider-specific. A custom provider should usually be configurable through protocol choice, adapters, field-cache rules, auth strategy, and model options rather than requiring a large bespoke provider implementation.

## Priority Order

1. Native protocol foundations, unified types, transformers, adapters, and field-cache rules.
2. OpenAI Responses API support, including future WebSocket extension points.
3. Provider work following the protocol layer: Claude Code, Codex, Copilot, Antigravity, and Gemini CLI parity review.
4. Routing and fallback groups, with optional target-group selectors later.
5. Retry, provider/model cooldown, and failover cleanup.
6. Protocol-aware quota, usage, and cost normalization.
7. Streaming library hardening: SSE now, WebSocket-ready later.
8. Config polish using `.env` and optional JSON. No SQLite dependency for now.
9. Extensive staged tests and review-agent verification.

## Non-Goals For This Branch

- Do not make the proxy a full multi-user admin product yet.
- Do not require SQLite or Postgres for the main feature set.
- Do not remove LiteLLM before native coverage exists.
- Do not replace the existing `UsageManager`, fair-cycle, custom caps, or evidence-based `SessionTracker`.
- Do not port frontend/UI work from the external reference gateway.

## Current Strengths To Preserve

- Credential-level rotation and priority-aware selection.
- Fair cycle and custom caps.
- Windowed quota tracking and quota groups.
- Evidence-based session tracking with compaction handling.
- Provider plugin discovery.
- Gemini CLI provider behavior unless a reviewed change is clearly better.
- Resilient file/JSON state writing.
- Dynamic OpenAI-compatible provider discovery.

## Reference Gateway Ideas To Import Carefully

- Unified protocol/transformer style.
- Adapter registry and configurable provider/model adapters.
- Target groups and direct routing syntax, adapted into fallback-first routing.
- Responses API transformer and storage concepts.
- Stream TTFB/stall detection concepts, implemented with Python-native async primitives.
- Provider/model cooldown and retry-history concepts.
- Usage/cost normalization and provider-reported cost extraction.
- Broader provider support patterns for Claude Code, Codex, Copilot, and Antigravity.

## Phase Index

1. Protocol Core.
2. Transform Pass Logging.
3. Adapter and Field Cache System.
4. Responses API and WebSocket-Ready Transport Shape.
5. Provider Protocol Overhaul.
6. Routing and Fallback Groups.
7. Retry/Cooldown/Failover Cleanup.
8. Streaming Library Upgrade.
9. Usage, Quota, and Cost Accuracy.
10. Config Polish.

Each phase may be subdivided if implementation scope becomes too large.

## Completeness Matrix

This matrix exists so the branch does not lose any requested scope while phases evolve. The phase plans are still refreshed before implementation, but every item below must remain accounted for.

| Requested area | Planned coverage |
| --- | --- |
| Protocols are priority #1 | Phases 1 and 4 create native protocol foundations and Responses support before provider work. |
| Protocols are bases, not gospel | Phase 1 requires override-friendly protocol methods, subclassing, copy/mutate registration, and provider-specific overrides. |
| Move away from LiteLLM | Phase 1 adds a `litellm_fallback` protocol path; later providers should prefer native protocols and use LiteLLM only for unsupported coverage. |
| Add protocols automatically like providers | Phase 1 adds protocol auto-discovery and registry behavior modeled after provider discovery. |
| Cover current providers and reference providers | Phase 1 protocols must cover shapes used by current providers; Phase 5 covers Claude Code, Codex, Copilot, Antigravity, and Gemini CLI parity. |
| Responses API is very needed | Phase 4 is dedicated to Responses, `previous_response_id`, storage, SSE, and WebSocket-ready transport shape. |
| WebSocket support later | Phases 1, 4, and 8 require transport separation so WebSocket can be added without rewriting protocol logic. |
| Adapters/transformers tied to protocols | Phases 1, 2, and 3 define protocol parse/build plus transform tracing, adapter registry, and field-cache rules. |
| Cache and return provider fields | Phase 3 implements configurable extraction/injection rules for request, response, and stream fields with scope and mode controls. |
| Reasoning content and similar fields | Phase 3 explicitly covers reasoning content, thinking signatures, prompt cache keys, response IDs, and provider session IDs. |
| Return all possible or last user/assistant use | Phase 3 modes include `last`, `all`, `last_user_turn`, `last_assistant_turn`, and `per_tool_call`. |
| Per-model custom provider behavior | Phases 3, 5, and 10 cover provider/model field cache rules, adapters, model options, and optional JSON config. |
| Transaction logging after every transform | Phase 2 adds ordered request, response, and stream transform trace passes and integrates them with transaction logging. |
| Comments, docstrings, and key decisions | All implementation phases require docstrings for public abstractions and comments for non-obvious transform, protocol, and future-extension decisions. |
| Providers are priority #2 | Phase 5 follows protocol foundations with Claude Code, Codex, Copilot, Antigravity, and Gemini CLI parity review. |
| Antigravity comparison | Phase 5 explicitly compares the reference Antigravity behavior against `src/rotator_library/providers/_retired/`. |
| Routing is interesting | Phase 6 implements fallback chains first, with target-group selectors later if useful. |
| Fallback groups preferred over target groups | Phase 6 starts with ordered fallback groups and only adds target-group-style selectors after that base works. |
| Retry/cooldown/failover cleanup | Phase 7 makes provider/model cooldown real, adds retry history, backoff, retry-after precedence, and success reset. |
| Quota/usage/cost improvements | Phase 9 adds protocol-aware normalizers, provider-reported cost extraction, structured cost fields, and checker abstractions while keeping existing usage engines. |
| Streaming as library capability | Phase 8 hardens streaming below the proxy route layer with TTFB, TTFT, stall detection, cancellation, and transport-aware stream events. |
| Config via env/json, no SQLite | Phase 10 adds optional JSON config with env overrides and validation. SQLite remains out of scope. |
| Multi-user proxy later | The branch keeps multi-user/admin features as a future expansion and only preserves extension points where natural. |
| Exhaustive tests in stages | Every phase requires tests alongside implementation and phase-end review by both `explore` and `explore-heavy`. |
| Reports are for the user, not git | `06-phase-workflow.md` says planning docs are committed, but phase reports are not committed by default. |

## Code Quality Expectations

- Public protocol, adapter, transport, field-cache, and provider-extension classes must have docstrings that explain intent, override points, and future expansion hooks.
- Non-obvious transformations must have comments explaining why data is changed, preserved, reordered, or intentionally dropped.
- Lossy protocol conversions must be documented at the conversion site.
- Future WebSocket, target-group, and multi-user extension seams should be noted in comments where they affect today's design.
- Tests should prefer golden fixtures for protocol shapes and focused unit tests for transform edge cases.
132 changes: 132 additions & 0 deletions docs/experimental/01-protocol-architecture.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
# Native Protocol Architecture

Protocols are reusable bases, not rigid gospel. Providers can subclass, wrap, copy, or override protocol behavior when a provider deviates from an otherwise standard protocol.

## Why Protocols First

The current code relies heavily on LiteLLM and provider-specific transforms. That works, but it makes new protocols hard to reason about and makes debugging transformations difficult. The experimental goal is to make a provider mostly declarative:

```text
provider = protocol + auth + adapters + field cache rules + model options + quota behavior
```

If a provider needs custom behavior, it should override a narrow protocol method instead of forcing an entirely bespoke request path.

## Auto-Discovery

Protocols should follow the provider plugin style:

- protocol modules live under `src/rotator_library/protocols/`.
- modules register concrete protocol classes by name.
- a registry exposes names such as `openai_chat`, `anthropic_messages`, `gemini`, `responses`, and `litellm_fallback`.
- third-party or local protocol modules can be added later with minimal registry changes.

## Core Types

The unified representation should be explicit enough to cover all existing providers and the external reference protocols without losing important data.

Suggested types:

- `UnifiedRequest`
- `UnifiedResponse`
- `UnifiedStreamEvent`
- `UnifiedMessage`
- `ContentBlock`
- `ToolDefinition`
- `ToolCall`
- `ToolResult`
- `ReasoningBlock`
- `Usage`
- `CostDetails`
- `ProtocolMetadata`

These types should retain unknown provider-specific metadata in explicit extension dictionaries instead of dropping it. Robustness matters more than a narrow perfect schema.

## Protocol Interface

The base protocol should provide default methods that can be overridden:

- `parse_request(raw_request, context) -> UnifiedRequest`
- `build_request(unified_request, context) -> raw_provider_request`
- `parse_response(raw_response, context) -> UnifiedResponse`
- `format_response(unified_response, context) -> raw_client_response`
- `parse_stream_event(raw_event, context) -> UnifiedStreamEvent`
- `format_stream_event(unified_event, context) -> raw_stream_payload`
- `extract_usage(raw_or_unified, context) -> Usage | None`
- `supports_transport(transport_name) -> bool`

Provider-specific overrides should receive context that includes provider name, model, credential identity, source protocol, target protocol, request ID, and session tracking information.

## Initial Protocols

### OpenAI Chat

Must support:

- chat completions request/response.
- stream chunks.
- tools and tool calls.
- function-call legacy shapes.
- reasoning fields from OpenAI-compatible providers.
- cached token and reasoning token usage details.

### Anthropic Messages

Must support:

- messages request/response.
- system content extraction.
- text, image, thinking, redacted thinking, tool_use, tool_result blocks.
- stream lifecycle events.
- count_tokens path later if needed.

### Gemini

Must support:

- generateContent and streamGenerateContent shapes.
- content parts.
- functionCall/functionResponse.
- thought signatures.
- safety settings passthrough without unsafe auto-injection.
- Google/Gemini usage metadata.

### Responses

Must support:

- OpenAI Responses request/response.
- `previous_response_id`.
- output items.
- event streams.
- storage-friendly response objects.
- future WebSocket transport.

### LiteLLM Fallback

Must preserve existing behavior for providers/protocols not yet native. This path should be explicit and transaction-logged as a fallback, not hidden.

## Transport Separation

Protocol formatting must not be tied only to HTTP SSE. Define a transport boundary so the same unified stream events can be emitted through:

- non-streaming HTTP JSON.
- HTTP SSE.
- future WebSocket.

The Responses phase should leave clear extension points for WebSocket even if WebSocket is implemented later.

## Error Handling

Protocols should preserve provider error bodies where safe, but format client-facing errors consistently. Parsing errors should include transform-pass names and request IDs to make transaction logs useful.

## Docstrings And Comments

Protocol code should include docstrings explaining:

- which external API shape it models.
- what fields are intentionally preserved in metadata.
- where provider overrides are expected.
- future expansion hooks.

Comments should explain non-obvious transformations, especially lossy conversions between protocols.
Loading
Loading