Experimental Native Protocol Rewrite by Mirrowel · Pull Request #162 · Mirrowel/LLM-API-Key-Proxy

Mirrowel · 2026-05-31T10:47:19Z

Experimental Native Protocol Roadmap

This branch is for a long-running experimental rewrite that makes native protocol support the first-class extension point of rotator_library, while preserving the existing credential rotation, quota, fair-cycle, session tracking, and provider plugin strengths.

Operating Rules

Work only on the experimental branch.
Keep all repository work inside C:\Projects\test\LLM-API-Key-Proxy and child paths.
Treat commits as checkpoints. A phase may contain many commits.
Commit messages must include a body describing what changed, why, tests run, and follow-up considerations.
Do not commit phase reports written for the user unless explicitly requested. Planning docs under docs/experimental/ are committed.
Before each phase implementation, first produce a fresh exhaustive phase plan in conversation text, based on the current code state. Only after that plan is settled should it be written to docs/experimental/phase-N-*.md.
After each phase implementation, call both explore and explore-heavy agents to review the work against the phase plan, external reference areas, and current proxy behavior. Fix findings and re-review as needed.
Keep LiteLLM as a fallback path for protocols/providers that are not natively covered yet. Native protocol support should be preferred when available.

Strategic Goal

The target architecture is:

client API request
  -> protocol parse into unified representation
  -> field-cache injection
  -> adapter chain
  -> provider override hooks
  -> provider-native request build
  -> provider execution and credential rotation
  -> provider-native response/stream parse
  -> field-cache extraction
  -> adapter chain
  -> protocol formatting for the client
  -> transaction logging for every transform state

Providers should be able to declare an existing protocol and only override the parts that are genuinely provider-specific. A custom provider should usually be configurable through protocol choice, adapters, field-cache rules, auth strategy, and model options rather than requiring a large bespoke provider implementation.

Priority Order

Native protocol foundations, unified types, transformers, adapters, and field-cache rules.
OpenAI Responses API support, including future WebSocket extension points.
Provider work following the protocol layer: Claude Code, Codex, Copilot, Antigravity, and Gemini CLI parity review.
Routing and fallback groups, with optional target-group selectors later.
Retry, provider/model cooldown, and failover cleanup.
Protocol-aware quota, usage, and cost normalization.
Streaming library hardening: SSE now, WebSocket-ready later.
Config polish using .env and optional JSON. No SQLite dependency for now.
Extensive staged tests and review-agent verification.

Non-Goals For This Branch

Do not make the proxy a full multi-user admin product yet.
Do not require SQLite or Postgres for the main feature set.
Do not remove LiteLLM before native coverage exists.
Do not replace the existing UsageManager, fair-cycle, custom caps, or evidence-based SessionTracker.
Do not port frontend/UI work from the external reference gateway.

Current Strengths To Preserve

Credential-level rotation and priority-aware selection.
Fair cycle and custom caps.
Windowed quota tracking and quota groups.
Evidence-based session tracking with compaction handling.
Provider plugin discovery.
Gemini CLI provider behavior unless a reviewed change is clearly better.
Resilient file/JSON state writing.
Dynamic OpenAI-compatible provider discovery.

Reference Gateway Ideas To Import Carefully

Unified protocol/transformer style.
Adapter registry and configurable provider/model adapters.
Target groups and direct routing syntax, adapted into fallback-first routing.
Responses API transformer and storage concepts.
Stream TTFB/stall detection concepts, implemented with Python-native async primitives.
Provider/model cooldown and retry-history concepts.
Usage/cost normalization and provider-reported cost extraction.
Broader provider support patterns for Claude Code, Codex, Copilot, and Antigravity.

Phase Index

Protocol Core.
Transform Pass Logging.
Adapter and Field Cache System.
Responses API and WebSocket-Ready Transport Shape.
Provider Protocol Overhaul.
Routing and Fallback Groups.
Retry/Cooldown/Failover Cleanup.
Streaming Library Upgrade.
Usage, Quota, and Cost Accuracy.
Config Polish.

Each phase may be subdivided if implementation scope becomes too large.

Completeness Matrix

This matrix exists so the branch does not lose any requested scope while phases evolve. The phase plans are still refreshed before implementation, but every item below must remain accounted for.

Requested area	Planned coverage
Protocols are priority #1	Phases 1 and 4 create native protocol foundations and Responses support before provider work.
Protocols are bases, not gospel	Phase 1 requires override-friendly protocol methods, subclassing, copy/mutate registration, and provider-specific overrides.
Move away from LiteLLM	Phase 1 adds a `litellm_fallback` protocol path; later providers should prefer native protocols and use LiteLLM only for unsupported coverage.
Add protocols automatically like providers	Phase 1 adds protocol auto-discovery and registry behavior modeled after provider discovery.
Cover current providers and reference providers	Phase 1 protocols must cover shapes used by current providers; Phase 5 covers Claude Code, Codex, Copilot, Antigravity, and Gemini CLI parity.
Responses API is very needed	Phase 4 is dedicated to Responses, `previous_response_id`, storage, SSE, and WebSocket-ready transport shape.
WebSocket support later	Phases 1, 4, and 8 require transport separation so WebSocket can be added without rewriting protocol logic.
Adapters/transformers tied to protocols	Phases 1, 2, and 3 define protocol parse/build plus transform tracing, adapter registry, and field-cache rules.
Cache and return provider fields	Phase 3 implements configurable extraction/injection rules for request, response, and stream fields with scope and mode controls.
Reasoning content and similar fields	Phase 3 explicitly covers reasoning content, thinking signatures, prompt cache keys, response IDs, and provider session IDs.
Return all possible or last user/assistant use	Phase 3 modes include `last`, `all`, `last_user_turn`, `last_assistant_turn`, and `per_tool_call`.
Per-model custom provider behavior	Phases 3, 5, and 10 cover provider/model field cache rules, adapters, model options, and optional JSON config.
Transaction logging after every transform	Phase 2 adds ordered request, response, and stream transform trace passes and integrates them with transaction logging.
Comments, docstrings, and key decisions	All implementation phases require docstrings for public abstractions and comments for non-obvious transform, protocol, and future-extension decisions.
Providers are priority #2	Phase 5 follows protocol foundations with Claude Code, Codex, Copilot, Antigravity, and Gemini CLI parity review.
Antigravity comparison	Phase 5 explicitly compares the reference Antigravity behavior against `src/rotator_library/providers/_retired/`.
Routing is interesting	Phase 6 implements fallback chains first, with target-group selectors later if useful.
Fallback groups preferred over target groups	Phase 6 starts with ordered fallback groups and only adds target-group-style selectors after that base works.
Retry/cooldown/failover cleanup	Phase 7 makes provider/model cooldown real, adds retry history, backoff, retry-after precedence, and success reset.
Quota/usage/cost improvements	Phase 9 adds protocol-aware normalizers, provider-reported cost extraction, structured cost fields, and checker abstractions while keeping existing usage engines.
Streaming as library capability	Phase 8 hardens streaming below the proxy route layer with TTFB, TTFT, stall detection, cancellation, and transport-aware stream events.
Config via env/json, no SQLite	Phase 10 adds optional JSON config with env overrides and validation. SQLite remains out of scope.
Multi-user proxy later	The branch keeps multi-user/admin features as a future expansion and only preserves extension points where natural.
Exhaustive tests in stages	Every phase requires tests alongside implementation and phase-end review by both `explore` and `explore-heavy`.
Reports are for the user, not git	`06-phase-workflow.md` says planning docs are committed, but phase reports are not committed by default.

Code Quality Expectations

Public protocol, adapter, transport, field-cache, and provider-extension classes must have docstrings that explain intent, override points, and future expansion hooks.
Non-obvious transformations must have comments explaining why data is changed, preserved, reordered, or intentionally dropped.
Lossy protocol conversions must be documented at the conversion site.
Future WebSocket, target-group, and multi-user extension seams should be noted in comments where they affect today's design.
Tests should prefer golden fixtures for protocol shapes and focused unit tests for transform edge cases.

Captures the experimental branch workflow, protocol architecture goals, transform logging requirements, field-cache rules, provider priorities, routing, retry, usage, streaming, and config direction. Documents that every phase must be freshly planned in conversation, written as planning docs, reviewed by explore and explore-heavy agents, and reported to the user without committing reports by default.

Introduces protocol-neutral request, response, stream event, content, tool, reasoning, usage, cost, and context dataclasses with JSON-safe serialization for future transform tracing. Adds the override-friendly ProtocolAdapter base and auto-discovery registry with alias handling, duplicate detection, shared stateless instances, and tests for serialization, default preservation, registration, aliases, and protocol errors. Tests: python -m pytest tests/test_protocol_registry.py

Adds the explicit LiteLLM fallback protocol marker and a native OpenAI Chat Completions adapter for request parsing/building, response parsing/formatting, usage and provider-reported cost extraction, reasoning preservation, tool calls, multimodal content blocks, and SSE chunk parsing. The adapter remains isolated from runtime execution and preserves unknown extension fields for future adapter, field-cache, and transform logging phases. Tests: python -m pytest tests/test_protocol_registry.py tests/test_protocol_openai_chat.py

Adds a native Anthropic Messages protocol adapter for request parsing/building, response formatting, stream event parsing, tool_use/tool_result blocks, thinking and redacted-thinking signature preservation, and cache usage normalization. The existing compatibility routes remain untouched; this adapter is an isolated base for later native provider execution, field-cache rules, and transform logging. Tests: python -m pytest tests/test_protocol_registry.py tests/test_protocol_openai_chat.py tests/test_protocol_anthropic_messages.py

Adds a native Gemini generateContent adapter for request parsing/building, response formatting, stream event parsing, content parts, function calls/responses, thought signatures, generation config, safety settings, tools, and Gemini usage metadata. The adapter preserves raw Gemini-native fields and remains isolated from runtime execution so provider migration can happen in later checkpoints. Tests: python -m pytest tests/test_protocol_registry.py tests/test_protocol_openai_chat.py tests/test_protocol_anthropic_messages.py tests/test_protocol_gemini.py

Adds a native Responses protocol adapter for request parsing/building, response formatting, event-stream parsing, previous_response_id preservation, input and output item handling, reasoning items, function calls, usage details, provider-reported costs, and a WebSocket-ready transport capability flag. Routes, storage, and runtime wiring remain deferred to later checkpoints; this commit only adds the reusable protocol base and tests. Tests: python -m pytest tests/test_protocol_registry.py tests/test_protocol_openai_chat.py tests/test_protocol_anthropic_messages.py tests/test_protocol_gemini.py tests/test_protocol_responses.py

Addresses Phase 1 review findings by treating raw payloads as provenance instead of stale formatting authority in native adapters, tightening registry alias/name collision handling, adding JSON-safe serialization fallbacks, and avoiding default reasoning-token double counting. Adds nested raw preservation for tool, result, and reasoning structures, exposes WebSocket as a future Responses transport seam rather than current formatting support, expands Gemini tool declaration parsing, and switches protocol tests to the public package import path through a local test path fixture. Tests: python -m pytest tests/test_protocol_registry.py tests/test_protocol_openai_chat.py tests/test_protocol_anthropic_messages.py tests/test_protocol_gemini.py tests/test_protocol_responses.py Tests: python -m pytest tests/test_session_tracking.py tests/test_selection_engine.py

Preserves Anthropic system block shape and metadata during rebuilds, keeps unknown Responses output items while still applying unified-message mutations, and groups Gemini multi-declaration tools back into their original native tool container. Adds tests for Anthropic system cache metadata, Responses future output-item preservation, and Gemini multi-declaration rebuild fidelity. Tests: python -m pytest tests/test_protocol_registry.py tests/test_protocol_openai_chat.py tests/test_protocol_anthropic_messages.py tests/test_protocol_gemini.py tests/test_protocol_responses.py Tests: python -m pytest tests/test_session_tracking.py tests/test_selection_engine.py

Adds the Phase 2 plan for additive transform-pass transaction logging, including trace entry shape, writer behavior, request/response/stream pass names, sanitization, TransactionLogger and ProviderLogger integration, tests, risks, and review checkpoints. The Phase 1 report remains uncommitted for user-facing review only.

Introduces transform trace entries, a local-sequence JSONL/snapshot writer, recursive key-based redaction, filesystem-safe snapshot names, and JSON-safe payload serialization for future protocol and adapter pass logging. The trace writer is observability-only and isolated from runtime transaction logging in this checkpoint. Tests: python -m pytest tests/test_transform_trace.py

Wires the transform trace writer into TransactionLogger and ProviderLogger while preserving legacy request, transformed request, response, streaming chunk, metadata, and provider log files. Adds trace entries for raw client requests, prepared provider requests, raw and parsed stream chunks, assembled stream responses, final client responses, provider request payloads, provider raw stream chunks, provider final responses, and provider errors. Includes transaction logger tests for legacy compatibility, redaction, equality-skipped transformed requests, provider traces, streaming wrapper traces, and disabled logging. Tests: python -m pytest tests/test_transform_trace.py tests/test_transaction_logger_transform_trace.py Tests: python -m pytest tests/test_protocol_registry.py tests/test_protocol_openai_chat.py tests/test_protocol_anthropic_messages.py tests/test_protocol_gemini.py tests/test_protocol_responses.py Tests: python -m pytest tests/test_session_tracking.py tests/test_selection_engine.py

Hardens Phase 2 tracing after review by adding request, session, scope, classifier, exact model, and credential correlation to trace entries where available. Expands redaction for cookies and credential-bearing headers, extracts structured fields from SDK-like objects before repr fallback, scrubs header-like secrets from provider error text, and adds a standardized transform_log_error helper. Prevents provider snapshot collisions by namespacing provider writer snapshots while keeping stream chunks in JSONL only. Tests: python -m pytest tests/test_transform_trace.py tests/test_transaction_logger_transform_trace.py Tests: python -m pytest tests/test_protocol_registry.py tests/test_protocol_openai_chat.py tests/test_protocol_anthropic_messages.py tests/test_protocol_gemini.py tests/test_protocol_responses.py Tests: python -m pytest tests/test_session_tracking.py tests/test_selection_engine.py

Adds the Phase 3 plan for the adapter registry, built-in adapter bases, field-cache rule schema, path engine, store abstractions, scoped key behavior, transform trace integration, tests, risks, and review checkpoints. Reports remain uncommitted for user-facing review only.

Adds the Phase 3 adapter foundation with an override-friendly async base adapter, adapter context, ordered chain runner, auto-discovered registry, aliases, duplicate collision checks, and built-in base adapters for no-op, model override, developer-role suppression, and reasoning content normalization. Runtime request execution is not wired to the adapter chain yet; this checkpoint keeps behavior unchanged while establishing the extension point for native protocols and providers. Tests: python -m pytest tests/test_adapter_registry.py

Adds field-cache rule and injection dataclasses, cache context scope values, default provider/model/classifier/session scoping, and a small JSON-path-like engine for extraction and predictable injection. The path helper supports dict keys, list indexes, wildcard extraction, tail indexes, missing-path no-ops, and explicit errors for malformed paths or wildcard injection. Tests: python -m pytest tests/test_field_cache_paths.py

Adds async field-cache stores, a ProviderCache-backed wrapper, scoped cache key construction, and the extraction/injection engine for last, all, turn-compatible, stream-event, and per-tool-call-validated rules. The engine copies payloads by default, isolates values by provider/model/session/classifier/credential scope, skips missing required session scope, and emits transform trace metadata when a transaction logger is supplied. Tests: python -m pytest tests/test_field_cache_engine.py tests/test_field_cache_paths.py

Adds the missing before_field_cache_extraction and before_field_cache_injection trace passes so field-cache operations now emit both before and after states. Adds trace-focused tests for adapter chains, field-cache extraction/injection, rule metadata, cache hits, mutation flags, and transform_log_error emission on failed injection. Tests: python -m pytest tests/test_field_cache_trace.py tests/test_field_cache_engine.py tests/test_adapter_registry.py

Adds optional provider declarations for native protocol name, ordered adapter names, adapter config, and field-cache rules, all defaulting to empty/no-op behavior so existing providers remain on the current execution path until they opt in. These methods are the Phase 3 bridge that later provider work will use to attach native protocols, adapter chains, and provider-specific field-cache rules per model. Tests: python -m pytest tests/test_provider_protocol_declarations.py tests/test_adapter_registry.py tests/test_field_cache_engine.py tests/test_field_cache_paths.py tests/test_field_cache_trace.py

Adds the planned field_rename adapter, fixes field-cache trace direction for stream-sourced request injection, caps trace sample values, and documents the current limits of turn/tool-cache modes. Expands coverage for credential/provider scope isolation, stream-sourced injection trace direction, large sample truncation, field_rename behavior, and plain provider no-op protocol defaults. Tests: python -m pytest tests/test_adapter_registry.py tests/test_field_cache_paths.py tests/test_field_cache_engine.py tests/test_field_cache_trace.py tests/test_provider_protocol_declarations.py Tests: python -m pytest tests/test_protocol_registry.py tests/test_protocol_openai_chat.py tests/test_protocol_anthropic_messages.py tests/test_protocol_gemini.py tests/test_protocol_responses.py tests/test_transform_trace.py tests/test_transaction_logger_transform_trace.py tests/test_session_tracking.py tests/test_selection_engine.py

Adds the Phase 4 plan for Responses routes, response storage, previous_response_id continuation, bridge execution through the current client path, HTTP SSE conversion, WebSocket extension seams, tests, risks, and review checkpoints. Reports remain uncommitted for user-facing review only.

Adds the Phase 4 Responses storage foundation with StoredResponse, local response ID generation, an in-memory store, and a ProviderCache-backed wrapper that accepts an injected cache instead of constructing one globally. The store supports save, get, delete, and input item listing, preserves JSON-safe response metadata for previous_response_id continuation, and avoids SQLite or new persistence dependencies. Tests: python -m pytest tests/test_responses_store.py

Adds the temporary Responses-to-chat bridge for Phase 4, converting parsed Responses requests into current chat-completions kwargs and converting chat-completion responses back into Responses objects. The bridge preserves previous_response_id metadata, parent response messages, tool definitions, generation parameters, and unsupported extension fields for trace/debugging until native provider execution is wired in later phases. Tests: python -m pytest tests/test_responses_bridge.py tests/test_responses_store.py

Adds the non-streaming Responses service around the protocol adapter, bridge, and response store with validation, previous_response_id loading, get/delete/input-items helpers, and transform trace passes. The service keeps Phase 4 runtime conservative by bridging through the existing chat completion client path while preserving response storage and lineage metadata for later native provider work. Tests: python -m pytest tests/test_responses_service.py tests/test_responses_bridge.py tests/test_responses_store.py

Adds FastAPI routes for POST /v1/responses, GET /v1/responses/{id}, DELETE /v1/responses/{id}, and GET /v1/responses/{id}/input_items using the Phase 4 ResponsesService. The create route currently handles non-streaming requests through the bridge and returns a documented 501 for streaming until the SSE checkpoint lands next. Tests: python -m pytest tests/test_responses_routes.py tests/test_responses_service.py tests/test_responses_bridge.py tests/test_responses_store.py

Adds Responses HTTP SSE formatting, chat-stream conversion, streamed response accumulation/storage, response.failed events on stream errors, and a WebSocket formatter seam that is explicit but not exposed as a runtime route. Updates POST /v1/responses to return text/event-stream for stream=true while preserving the existing non-stream route behavior. Tests: python -m pytest tests/test_responses_streaming.py tests/test_responses_routes.py tests/test_responses_service.py tests/test_responses_bridge.py tests/test_responses_store.py

Wires Responses routes into the transform trace logger when request logging is enabled, adds coverage for unsupported Responses fields preserved in bridge metadata, and strengthens streaming tests to assert SSE event order. Tests: python -m pytest tests/test_responses_store.py tests/test_responses_bridge.py tests/test_responses_service.py tests/test_responses_routes.py tests/test_responses_streaming.py Tests: python -m pytest tests/test_protocol_registry.py tests/test_protocol_openai_chat.py tests/test_protocol_anthropic_messages.py tests/test_protocol_gemini.py tests/test_protocol_responses.py tests/test_transform_trace.py tests/test_transaction_logger_transform_trace.py tests/test_adapter_registry.py tests/test_field_cache_paths.py tests/test_field_cache_engine.py tests/test_field_cache_trace.py tests/test_provider_protocol_declarations.py tests/test_session_tracking.py tests/test_selection_engine.py

Adds the Phase 5 plan for native provider execution, provider declarations, HTTP and streaming seams, priority provider order, Antigravity restoration constraints, Gemini CLI parity review, fallback policy, transform tracing, field-cache rules, tests, and review checkpoints. Reports remain uncommitted for user-facing review only.

Adds the Phase 5 native provider foundation with execution context, HTTP transport wrapper, and non-streaming executor that runs protocol selection, adapter chains, field-cache injection/extraction, provider HTTP calls, and transform tracing. The foundation is not wired into live request execution yet, preserving current provider behavior while giving priority provider work a testable native path. Tests: python -m pytest tests/test_native_provider_executor.py tests/test_responses_store.py tests/test_responses_bridge.py tests/test_responses_service.py tests/test_responses_routes.py tests/test_responses_streaming.py

Adds opt-in native provider streaming support with streaming-capable transport seam, raw chunk tracing, protocol stream parsing, field-cache stream extraction, formatted client stream events, and transform error logging. The streaming foundation remains isolated from live provider routing so existing providers keep current behavior while Phase 5 provider implementations gain a mocked native stream path. Tests: python -m pytest tests/test_native_provider_executor.py tests/test_native_provider_streaming.py tests/test_responses_store.py tests/test_responses_bridge.py tests/test_responses_service.py tests/test_responses_routes.py tests/test_responses_streaming.py tests/test_protocol_responses.py tests/test_transform_trace.py tests/test_transaction_logger_transform_trace.py

Adds the first priority Phase 5 provider as an explicit native integration skeleton with Anthropic Messages protocol declaration, adapter config, thinking-signature field-cache rule, native header/endpoint helpers, and mock-friendly model discovery. This does not assume undocumented live behavior or wire the provider into the runtime native executor yet; it establishes a tested provider declaration path for later native routing. Tests: python -m pytest tests/test_claude_code_provider.py tests/test_provider_protocol_declarations.py tests/test_native_provider_executor.py tests/test_native_provider_streaming.py tests/test_protocol_anthropic_messages.py tests/test_adapter_registry.py tests/test_field_cache_engine.py

mirrobot-agent · 2026-05-31T22:22:13Z

Starting my review of the third-pass audit remediation changes — covering protocol output correctness, native operation guardrails, streaming hardening, Responses runtime improvements, cost accounting, routing parity, and much more across ~100 files. I'll analyze each subsystem and report back with a bundled review.

mirrobot-agent

Overall Assessment

This is a massive third-pass audit remediation that addresses findings from the explore and explore-heavy reviewers across all 10 phases of the experimental rewrite. The changeset spans ~9,000 lines across 100 files and is impressively well-scoped — each change maps back to a specific finding in third-pass-audit-findings.md.

The code quality is high. The streaming hardening (TTFB/stall timeouts, heartbeats, upstream close safety, CancelledError handling) is thorough and consistent across both the chat-completions StreamingHandler and the Responses stream_events path. The protocol output correctness fixes (OpenAI Chat/Responses usage formatting, legacy function_call handling, Ollama format_response, native operation guardrails) are all well-tested. The cost accounting improvements (provider-reported cost preservation, cache-write double-count fix, SSE cost comment parsing, structured cost breakdowns) are comprehensive and correctly prioritized.

Status of Previous Feedback

All 12 issues from the initial review and 5 suggestions from the follow-up have been addressed in this changeset:

service.py CancelledError: Now caught by the Responses finally block and the Anthropic wrapper's finally block.
Streaming adapter chain gap: The NativeProviderExecutor.stream() now runs stream-event adapter chains and extracts usage/cost.
Bare except Exception: pass: Replaced with structured error handling and _can_retry_stream_after_error with explicit emitted_output tracking.
Multi-turn tool use: _parent_output_to_messages now handles function_call and custom_tool_call items, and _message_to_chat handles tool role messages.
Auth header mismatch: ClaudeCodeProvider now supports auto/x-api-key/bearer modes via CLAUDE_CODE_AUTH_HEADER.
Hard litellm dependency: Not directly addressed in this diff, but the cost system now has provider-reported cost that bypasses litellm pricing — worth verifying this is tracked.

Key Suggestions

Model-scoped cooldown bypasses failure-history threshold — contradicts Phase 7c plan (see inline)
finally block recovery can raise — acquire_task.result() can replace the original exception (see inline)
Polling logic duplication — next_upstream_chunk and acquire_upstream_stream share ~90% of their structure (see inline)
Disconnect task leak — if is_disconnected() raises, the task is never cleaned up (see inline)
Gemini cache-write double-count inconsistency — OpenAI path fixed but Gemini path not (see inline)
Private _save_to_disk coupling — fragile dependency on provider cache internals (see inline)

Architectural Notes

The new streaming runtime settings system (get_stream_runtime_settings()) is well-designed — lazy-loaded via _retry_settings() to avoid import cycles, with env overriding JSON. The routing attempt history (_append_routing_attempt_history) is a solid observability addition that correctly sanitizes error types and tracks timing. The native provider cross-protocol formatting (client_protocol_name on NativeProviderContext) is a clean solution to the "responses formatted in provider protocol" blocker.

One architectural observation: the Responses streaming path now has its own independent polling/timeout/heartbeat implementation rather than reusing the StreamingHandler infrastructure. This is understandable given the different event model (Responses events vs SSE chunks), but it means streaming policy changes must be applied in two places.

This review was generated by an AI assistant.

mirrobot-agent · 2026-05-31T22:24:37Z

+    if error_type in {"server_error", "api_connection"} and default_duration >= provider_cooldown_min_seconds:
+        backoff_level = 0
+        duration = int(default_duration)
+        if scope == "model":
+            return ProviderCooldownDecision(True, duration=duration, reason="model_capacity_cooldown", scope=scope, model=model, backoff_level=backoff_level)
+        if failure_history is None:
+            return ProviderCooldownDecision(False, reason="missing_failure_history", scope=scope, model=model if scope == "model" else None)
+        backoff = failure_history.backoff_for(provider=provider, error_type=error_type, scope=scope, model=model if scope == "model" else None, default_duration=duration)


This hard early return for scope == "model" bypasses the failure-history threshold check entirely:

if scope == "model": return ProviderCooldownDecision(True, duration=duration, reason="model_capacity_cooldown", scope=scope, model=model, backoff_level=backoff_level)

This means a single server_error without retry_after will always start a model-scoped cooldown immediately, even if it's the first transient failure. The Phase 7c plan states: "No-retry_after server_error and api_connection should start cooldown only after FailureHistory.backoff_for() crosses the configured threshold." The provider-scoped path (below) correctly checks the threshold, but the model-scoped path does not.

If this is intentional (model-capacity errors are more actionable than generic transients), consider adding a comment explaining the policy difference. Otherwise, the model-scoped path should also check backoff_level > 0 before returning True.

Suggested change

if error_type in {"server_error", "api_connection"} and default_duration >= provider_cooldown_min_seconds:

backoff_level = 0

duration = int(default_duration)

if scope == "model":

return ProviderCooldownDecision(True, duration=duration, reason="model_capacity_cooldown", scope=scope, model=model, backoff_level=backoff_level)

if failure_history is None:

return ProviderCooldownDecision(False, reason="missing_failure_history", scope=scope, model=model if scope == "model" else None)

backoff = failure_history.backoff_for(provider=provider, error_type=error_type, scope=scope, model=model if scope == "model" else None, default_duration=duration)

if scope == "model":

if backoff_level <= 0 and failure_history is not None:

return ProviderCooldownDecision(False, reason="transient_backoff_threshold_not_met", scope=scope, model=model)

return ProviderCooldownDecision(True, duration=duration, reason="model_capacity_cooldown", scope=scope, model=model, backoff_level=backoff_level)

mirrobot-agent · 2026-05-31T22:24:38Z

+            yield ResponsesStreamEvent("response.failed", failed)
+            self._trace(transaction_logger, "stream_done_event", {"raw": "done"}, direction="stream", stage="final", metadata={"transport": transport, "failed": True})
+            yield ResponsesStreamEvent("done", {}, terminal=True)
+        finally:
+            if chat_stream is None and acquire_task is not None and acquire_task.done() and not acquire_task.cancelled():
+                try:
+                    chat_stream = acquire_task.result()
+                    stream_iterator = chat_stream.__aiter__()
+                except Exception:
+                    chat_stream = None
+            await cancel_task(pending_next_task)


This recovery path in the finally block can itself raise an exception, replacing the original one:

if chat_stream is None and acquire_task is not None and acquire_task.done() and not acquire_task.cancelled(): chat_stream = acquire_task.result() # Can raise!

If acquire_task completed with a ResponsesServiceError (e.g., from a TTFB timeout), calling .result() will re-raise that error inside the finally block, masking whatever exception triggered the finally in the first place.

Suggested change

yield ResponsesStreamEvent("response.failed", failed)

self._trace(transaction_logger, "stream_done_event", {"raw": "done"}, direction="stream", stage="final", metadata={"transport": transport, "failed": True})

yield ResponsesStreamEvent("done", {}, terminal=True)

finally:

if chat_stream is None and acquire_task is not None and acquire_task.done() and not acquire_task.cancelled():

try:

chat_stream = acquire_task.result()

stream_iterator = chat_stream.__aiter__()

except Exception:

chat_stream = None

await cancel_task(pending_next_task)

if chat_stream is None and acquire_task is not None and acquire_task.done() and not acquire_task.cancelled():

try:

chat_stream = acquire_task.result()

stream_iterator = chat_stream.__aiter__()

except Exception:

chat_stream = None

mirrobot-agent · 2026-05-31T22:24:38Z

+        usage = None
+        item_started = False
+        monitor = StreamMonitor(clock=time.monotonic)
+        stream_settings = get_stream_runtime_settings()
+        chat_stream = None
+        stream_iterator = None
+        upstream_closed = False
+        pending_next_task = None
+        pending_next_started_at = None
+        pending_next_last_heartbeat_at = None
+        acquire_task = None
+        acquire_started_at = None
+        acquire_last_heartbeat_at = None
+        ttfb_started_at = time.monotonic()
+
+        async def cancel_task(task: Any) -> None:
+            """Cancel and await an in-flight stream task before closing its source."""
+
+            if task is None or task.done():
+                return
+            task.cancel()
+            try:
+                await task
+            except (asyncio.CancelledError, StopAsyncIteration):
+                return
+            except Exception:
+                return
+
+        async def close_upstream(reason: str) -> None:
+            """Best-effort close for upstream Responses bridge streams."""
+
+            nonlocal upstream_closed
+            if upstream_closed:
+                return
+            attempted = False
+            for candidate in (stream_iterator, chat_stream):
+                if candidate is None:
+                    continue
+                attempted = True
+                try:
+                    closer = getattr(candidate, "aclose", None)
+                    if callable(closer):
+                        await closer()
+                        upstream_closed = True
+                        self._trace(transaction_logger, "responses_stream_upstream_closed", {"reason": reason}, direction="stream", stage="provider", metadata={"transport": transport})
+                        return
+                    closer = getattr(candidate, "close", None)
+                    if callable(closer):
+                        closer()
+                        upstream_closed = True
+                        self._trace(transaction_logger, "responses_stream_upstream_closed", {"reason": reason}, direction="stream", stage="provider", metadata={"transport": transport})
+                        return
+                except Exception as exc:
+                    self._trace(transaction_logger, "responses_stream_upstream_close_failed", {"reason": reason, "error_type": type(exc).__name__}, direction="stream", stage="provider", metadata={"transport": transport})
+                    continue
+            if attempted:
+                self._trace(transaction_logger, "responses_stream_upstream_close_failed", {"reason": reason, "error_type": "no_close_method"}, direction="stream", stage="provider", metadata={"transport": transport})
+
+        async def next_upstream_chunk(*, first: bool) -> tuple[str, Any]:
+            """Return the next upstream chunk or a control marker."""
+
+            timeout = stream_settings.ttfb_timeout_seconds if first else stream_settings.stall_timeout_seconds
+            heartbeat = stream_settings.heartbeat_seconds
+            nonlocal pending_next_task, pending_next_started_at, pending_next_last_heartbeat_at
+            if pending_next_task is None:
+                pending_next_task = asyncio.create_task(stream_iterator.__anext__())
+                pending_next_started_at = time.monotonic()
+                pending_next_last_heartbeat_at = pending_next_started_at
+            next_task = pending_next_task
+            started_at = ttfb_started_at if first else (pending_next_started_at or time.monotonic())
+            while True:
+                if next_task.done():
+                    pending_next_task = None
+                    pending_next_started_at = None
+                    pending_next_last_heartbeat_at = None
+                    return "chunk", next_task.result()
+                if request is not None and await request.is_disconnected():
+                    self._trace(transaction_logger, "responses_stream_disconnected", {"reason": "client_disconnected"}, direction="stream", stage="client", metadata={"transport": transport})
+                    if stream_settings.cancel_upstream_on_disconnect:
+                        await cancel_task(next_task)
+                        pending_next_task = None
+                        await close_upstream("client_disconnected")
+                    return "disconnect", None
+                elapsed = time.monotonic() - started_at
+                waits = []
+                if timeout is not None:
+                    remaining_timeout = timeout - elapsed
+                    if remaining_timeout <= 0:
+                        await cancel_task(next_task)
+                        pending_next_task = None
+                        await close_upstream("ttfb_timeout" if first else "stall_timeout")
+                        raise ResponsesServiceError(
+                            f"Responses stream {'TTFB' if first else 'stall'} timeout",
+                            status_code=504,
+                            error_type="api_connection",
+                        )
+                    waits.append(remaining_timeout)
+                if heartbeat is not None:
+                    last_heartbeat_at = pending_next_last_heartbeat_at or started_at
+                    remaining_heartbeat = heartbeat - (time.monotonic() - last_heartbeat_at)
+                    if remaining_heartbeat <= 0:
+                        pending_next_last_heartbeat_at = time.monotonic()
+                        return "heartbeat", None
+                    waits.append(remaining_heartbeat)
+                wait_timeout = min(waits) if waits else None
+                if wait_timeout is None:
+                    chunk = await next_task
+                    pending_next_task = None
+                    pending_next_started_at = None
+                    pending_next_last_heartbeat_at = None
+                    return "chunk", chunk
+                done, _ = await asyncio.wait({next_task}, timeout=wait_timeout)
+                if done:
+                    pending_next_task = None
+                    pending_next_started_at = None
+                    pending_next_last_heartbeat_at = None
+                    return "chunk", next_task.result()
+                last_heartbeat_at = pending_next_last_heartbeat_at or started_at
+                if heartbeat is not None and time.monotonic() - last_heartbeat_at >= heartbeat:
+                    pending_next_last_heartbeat_at = time.monotonic()
+                    return "heartbeat", None
+
+        async def acquire_upstream_stream() -> tuple[str, Any]:
+            """Acquire the upstream stream under the same TTFB/disconnect policy."""
+
+            nonlocal acquire_task, acquire_started_at, acquire_last_heartbeat_at
+            if acquire_task is None:
+                acquire_task = asyncio.create_task(client.acompletion(request=request, **chat_kwargs))
+                acquire_started_at = ttfb_started_at
+                acquire_last_heartbeat_at = acquire_started_at
+            task = acquire_task
+            started_at = acquire_started_at or time.monotonic()
+            timeout = stream_settings.ttfb_timeout_seconds
+            heartbeat = stream_settings.heartbeat_seconds
+            while True:
+                if task.done():
+                    acquire_task = None
+                    acquire_started_at = None
+                    acquire_last_heartbeat_at = None
+                    return "stream", task.result()
+                if request is not None and await request.is_disconnected():
+                    self._trace(transaction_logger, "responses_stream_disconnected", {"reason": "client_disconnected", "phase": "acquire"}, direction="stream", stage="client", metadata={"transport": transport})
+                    await cancel_task(task)
+                    acquire_task = None
+                    acquire_started_at = None
+                    acquire_last_heartbeat_at = None
+                    return "disconnect", None
+                waits = []
+                elapsed = time.monotonic() - started_at
+                if timeout is not None:
+                    remaining_timeout = timeout - elapsed
+                    if remaining_timeout <= 0:
+                        await cancel_task(task)
+                        acquire_task = None
+                        acquire_started_at = None
+                        acquire_last_heartbeat_at = None
+                        raise ResponsesServiceError("Responses stream TTFB timeout", status_code=504, error_type="api_connection")
+                    waits.append(remaining_timeout)
+                if heartbeat is not None:
+                    last_heartbeat_at = acquire_last_heartbeat_at or started_at
+                    remaining_heartbeat = heartbeat - (time.monotonic() - last_heartbeat_at)
+                    if remaining_heartbeat <= 0:
+                        acquire_last_heartbeat_at = time.monotonic()
+                        return "heartbeat", None
+                    waits.append(remaining_heartbeat)
+                wait_timeout = min(waits) if waits else None
+                if wait_timeout is None:
+                    stream = await task
+                    acquire_task = None
+                    acquire_started_at = None
+                    acquire_last_heartbeat_at = None
+                    return "stream", stream
+                done, _ = await asyncio.wait({task}, timeout=wait_timeout)
+                if done:
+                    acquire_task = None
+                    acquire_started_at = None
+                    acquire_last_heartbeat_at = None
+                    return "stream", task.result()
+                last_heartbeat_at = acquire_last_heartbeat_at or started_at
+                if heartbeat is not None and time.monotonic() - last_heartbeat_at >= heartbeat:
+                    acquire_last_heartbeat_at = time.monotonic()
+                    return "heartbeat", None
+
+        if transaction_logger:
+            self._trace(
+                transaction_logger,
+                "stream_started",
+                {"event": StreamEvent("started", protocol="responses").to_dict(), "metrics": monitor.metrics.to_dict()},
+                direction="stream",
+                stage="client",


next_upstream_chunk (~90 lines) and acquire_upstream_stream (~55 lines) share nearly identical polling/disconnect/timeout/heartbeat logic. The structural pattern is the same: create task -> loop with asyncio.wait -> check disconnect -> check timeout -> emit heartbeat -> get result.

Any future change to the polling pattern (e.g., adding a new policy check) must be applied in both places. Consider extracting a shared _poll_upstream_async(task_factory, *, first: bool) helper that both call sites use, with the task factory being the only difference (stream_iterator.__anext__ vs client.acompletion).

mirrobot-agent · 2026-05-31T22:24:38Z

+                            done, _ = await asyncio.wait(wait_tasks, timeout=wait_seconds)
+                            if disconnect_task is not None:
+                                if disconnect_task in done and disconnect_task.result():
+                                    stream_cancelled = True
+                                    next_task.cancel()
+                                    with contextlib.suppress(asyncio.CancelledError, StopAsyncIteration):
+                                        await next_task
+                                    await close_upstream("client_disconnect")
+                                    return
+                                if not disconnect_task.done():
+                                    disconnect_task.cancel()
+                                    with contextlib.suppress(asyncio.CancelledError):
+                                        await disconnect_task
+                            if next_task in done:
+                                chunk = next_task.result()
+                                break
+
+                            timeout_error = _stream_timeout_error(monitor, stream_settings)
+                            if timeout_error:
+                                next_task.cancel()
+                                with contextlib.suppress(asyncio.CancelledError, StopAsyncIteration):
+                                    await next_task
+                                await close_upstream(timeout_error[0], force=True)


If request.is_disconnected() raises an unexpected exception, the disconnect_task created via asyncio.create_task(request.is_disconnected()) will never be awaited or cancelled. Python 3.9+ warns about unretrieved task exceptions, and Python 3.12+ logs Task was destroyed but it is pending! warnings.

Consider wrapping the disconnect task creation and cleanup in a try/finally pattern within the wait loop, or cancelling it explicitly in all exit paths (including the except Exception re-raise below).

Suggested change

done, _ = await asyncio.wait(wait_tasks, timeout=wait_seconds)

if disconnect_task is not None:

if disconnect_task in done and disconnect_task.result():

stream_cancelled = True

next_task.cancel()

with contextlib.suppress(asyncio.CancelledError, StopAsyncIteration):

await next_task

await close_upstream("client_disconnect")

return

if not disconnect_task.done():

disconnect_task.cancel()

with contextlib.suppress(asyncio.CancelledError):

await disconnect_task

if next_task in done:

chunk = next_task.result()

break

timeout_error = _stream_timeout_error(monitor, stream_settings)

if timeout_error:

next_task.cancel()

with contextlib.suppress(asyncio.CancelledError, StopAsyncIteration):

await next_task

await close_upstream(timeout_error[0], force=True)

disconnect_task = None

try:

if request is not None:

disconnect_task = asyncio.create_task(request.is_disconnected())

wait_tasks.add(disconnect_task)

done, _ = await asyncio.wait(wait_tasks, timeout=wait_seconds)

if disconnect_task is not None:

if disconnect_task in done and disconnect_task.result():

stream_cancelled = True

next_task.cancel()

with contextlib.suppress(asyncio.CancelledError, StopAsyncIteration):

await next_task

await close_upstream("client_disconnect")

return

if not disconnect_task.done():

disconnect_task.cancel()

with contextlib.suppress(asyncio.CancelledError):

await disconnect_task

finally:

if disconnect_task is not None and not disconnect_task.done():

disconnect_task.cancel()

with contextlib.suppress(asyncio.CancelledError):

await disconnect_task

mirrobot-agent · 2026-05-31T22:24:38Z

+    )
+    if reasoning and completion_tokens >= reasoning:
+        completion_tokens -= reasoning


The OpenAI-like path now correctly subtracts both cache_read and cache_write from prompt_tokens to avoid double-counting:

input_tokens = max(0, prompt_tokens - cache_read - cache_write)

However, _from_gemini_usage still only subtracts cache_read:

input_tokens=max(0, prompt_tokens - cache_read)

If a Gemini provider starts reporting cache_write_tokens (or cachedWriteContentTokens), those tokens would be double-counted in the normalized total. The Gemini protocol already parses cache_write_tokens from the response, so this is a latent consistency gap rather than a hypothetical.

mirrobot-agent · 2026-05-31T22:24:38Z

+
+    async def save(self, response: StoredResponse) -> None:
+        await self._cache.store_async(self._key(response.id), json.dumps(response.to_dict(), ensure_ascii=False))
+        flush = getattr(self._cache, "_save_to_disk", None)
+        if callable(flush):
+            await flush()
+


This directly accesses a private method on the provider cache:

flush = getattr(self._cache, "_save_to_disk", None) if callable(flush): await flush()

If the provider cache implementation refactors _save_to_disk to a different name (e.g., _flush_to_disk, _persist), saves will silently stop flushing to disk, and the Responses store will appear to lose data between restarts.

Consider either: (a) adding a public flush() or sync() method to the cache interface that this code calls, or (b) adding a comment documenting the coupling so future maintainers know to update this call site.

Adds the Phase 10c remediation plan for startup secret masking, provider config wiring, pricing env tolerance, routing validation, and JSON secret-key hardening.

Masks proxy API key display, validates and wires safe providers JSON fields into provider runtime behavior, merges JSON quota groups before env overrides, tolerates malformed pricing env values, validates direct routing targets at config load, rejects generic credential keys, and documents the safe provider config surface. Tests: pytest tests/test_startup_display.py tests/test_provider_runtime_config.py tests/test_experimental_config.py tests/test_config_pricing.py tests/test_config_routing_json.py tests/test_env_example_experimental_config.py tests/test_config_stream_settings.py tests/test_responses_streaming.py tests/test_responses_store.py tests/test_routing_config.py tests/test_fallback_resolver.py tests/test_fallback_policy.py tests/test_request_executor_native_routing.py tests/test_claude_code_provider.py tests/test_codex_provider.py tests/test_copilot_provider.py tests/test_antigravity_provider_restore.py tests/test_provider_protocol_declarations.py

Validates provider protocol/adapter/native-streaming values at config load, merges JSON quota groups with provider defaults before env overrides, and ensures priority provider overrides respect accepted JSON adapter/native-streaming config instead of silently ignoring it. Tests: pytest tests/test_provider_runtime_config.py tests/test_experimental_config.py tests/test_config_pricing.py tests/test_config_routing_json.py tests/test_startup_display.py; pytest tests/test_startup_display.py tests/test_provider_runtime_config.py tests/test_experimental_config.py tests/test_config_pricing.py tests/test_config_routing_json.py tests/test_env_example_experimental_config.py tests/test_config_stream_settings.py tests/test_responses_streaming.py tests/test_responses_store.py tests/test_routing_config.py tests/test_fallback_resolver.py tests/test_fallback_policy.py tests/test_request_executor_native_routing.py tests/test_claude_code_provider.py tests/test_codex_provider.py tests/test_copilot_provider.py tests/test_antigravity_provider_restore.py tests/test_provider_protocol_declarations.py

greptile-apps · 2026-06-16T22:22:36Z

Too many files changed for review. (231 files found, 100 file limit)

Bypass the limit by tagging @greptile-apps to review.

mirrobot-agent

Follow-up Review: Phase 10c Config Completion

This incremental changeset delivers Phase 10c — startup secret masking, provider runtime config wiring (protocol/adapter/field-cache/quota-groups from JSON), pricing tolerance for invalid env values, routing target validation, cost accounting double-count fix, and the bot-setup workflow rework. Well-scoped with good test coverage (new tests for provider runtime config, startup masking, pricing fallback, routing validation, and cost accounting).

Assessment of New Changes:

Startup secret masking (startup_display.py, main.py): Clean implementation. The mask_secret_for_display helper correctly redacts short values fully and exposes only edge fragments for longer ones — a solid improvement over printing the raw PROXY_API_KEY.
Cost accounting fix (accounting.py): The _sum_cost_breakdown change correctly prevents double-counting when both upstream_inference_cost (total) and its split fields are present. Good test coverage with test_upstream_inference_total_wins_over_split_fields.
Provider runtime config wiring (provider_interface.py, config/experimental.py): The JSON override layer is well-designed — provider declarations remain the base, JSON config overrides, and env vars take final precedence for quota groups. The subclass streaming override pattern (check JSON → delegate to base or return False) is a reasonable safety gate. Test coverage is thorough.
Pricing tolerance (config/experimental.py): The try/except ExperimentalConfigError: continue in _pricing_from_env correctly makes invalid values non-fatal while preserving valid components.
Routing validation (routing/config.py): Direct route targets are now validated via parse_route_target at config load — good early failure.
Bot-setup rework (.github/actions/bot-setup/action.yml): Moving model inputs to env: context (avoiding ${{ }} interpolation in run script) and adding chmod 600 on the generated config are both security improvements. The jq -c . validation on the base config is a good guardrail.

Performance Concern (see inline comments):

The new _get_runtime_config() on ProviderInterface loads the full JSON config from disk on every call, with no caching. Since it's invoked from 6+ provider methods that are each called multiple times per request, a single request path can trigger 5+ file reads + JSON parses + validations. Two additional redundancy points compound this (double validation, throwaway config round-trip for field cache parsing).

Status of Previous Feedback:

The 6 suggestions from the third-pass review remain unaddressed in this incremental diff (none of the affected files — retry_policy.py, responses/service.py, client/streaming.py, responses/store.py, and the Gemini _from_gemini_usage path in accounting.py — were modified here). The most impactful still-open items are the model-scoped cooldown threshold bypass, the finally block recovery that can mask exceptions, and the disconnect task leak in streaming.

This review was generated by an AI assistant.

mirrobot-agent · 2026-06-16T22:54:56Z

+    def _get_runtime_config(self, model: str = "") -> Any:
+        """Return optional JSON runtime config for this provider.
+
+        The helper keeps config loading lazy so provider imports do not depend on
+        the experimental config layer during startup discovery.
+        """
+
+        from ..config.experimental import get_provider_runtime_config
+
+        return get_provider_runtime_config(self._provider_config_key(), model)


Performance concern: _get_runtime_config() triggers load_experimental_config() on every invocation — a full file read + JSON parse + validation cycle. This method is called from get_protocol_name, get_adapter_names, get_adapter_config, get_field_cache_rules, supports_native_streaming, and _get_effective_quota_groups. A single call to should_use_native_protocol() alone triggers 3+ disk reads (it chains supports_native_streaming → _get_runtime_config, then get_protocol_name → _get_runtime_config, then supports_native_operation → get_protocol_name → _get_runtime_config).

Since the JSON config file rarely changes at runtime, consider memoizing the config load per process (or with a short TTL/mtime check).

Suggested change

def _get_runtime_config(self, model: str = "") -> Any:

"""Return optional JSON runtime config for this provider.

The helper keeps config loading lazy so provider imports do not depend on

the experimental config layer during startup discovery.

"""

from ..config.experimental import get_provider_runtime_config

return get_provider_runtime_config(self._provider_config_key(), model)

_runtime_config_cache: Any = None

def _get_runtime_config(self, model: str = "") -> Any:

"""Return optional JSON runtime config for this provider.

The helper keeps config loading lazy so provider imports do not depend on

the experimental config layer during startup discovery.

"""

if type(self)._runtime_config_cache is not None:

return type(self)._runtime_config_cache

from ..config.experimental import get_provider_runtime_config

type(self)._runtime_config_cache = get_provider_runtime_config(self._provider_config_key(), model)

return type(self)._runtime_config_cache

mirrobot-agent · 2026-06-16T22:54:56Z

+    raw = providers.get(provider, {})
+    if not isinstance(raw, Mapping) or not raw:
+        return ProviderRuntimeConfig()
+    _validate_provider_sections({provider: raw})


This _validate_provider_sections({provider: raw}) call is redundant: load_experimental_config (line 127) already calls _validate_provider_sections on ALL providers, including this one. Since get_provider_runtime_config is called on every provider method invocation (see the caching concern on ProviderInterface._get_runtime_config), this re-runs protocol/adapter registry lookups unnecessarily. Consider removing this line since the config was already validated at load time.

mirrobot-agent · 2026-06-16T22:54:56Z

+def _configured_provider_field_cache(provider: str, model: str, value: Any) -> tuple[FieldCacheRule, ...]:
+    if value in (None, {}, []):
+        return ()
+    if isinstance(value, list):
+        section = {provider: {"*": value}}
+    elif isinstance(value, Mapping):
+        section = {provider: dict(value)}
+    else:
+        raise ExperimentalConfigError("providers.field_cache must be an object or list")
+    return parse_field_cache_rules(load_config_from_mapping({"field_cache": section}), provider, model)


_configured_provider_field_cache creates a throwaway ExperimentalConfig via load_config_from_mapping just to call parse_field_cache_rules. This triggers another _reject_secret_keys scan and _validate_provider_sections pass on data that was already validated when get_provider_runtime_config loaded the full config. Consider refactoring parse_field_cache_rules to accept raw rule data directly, or extracting the rule-parsing logic into a helper that doesn't require a full config round-trip.

- Introduce `_make_json_safe` to recursively convert complex objects like LiteLLM/Pydantic models, dataclasses, paths, and timestamps into JSON-serializable formats. - Implement circular reference tracking to prevent infinite loops and serialization failures. - Apply the safe conversion helper to stream chunks, final client responses, headers, and general file logging. - Add type guards in metadata extraction to safely process non-dictionary response and usage structures.

…et time When an authoritative quota API reports an exhausted bucket without a reset timestamp, the system previously logged an error and retried repeatedly. This commonly occurs when an account lacks entitlement to a specific model group. This commit introduces a configurable policy to handle such scenarios, allowing providers to opt into a scoped fallback cooldown instead of continuous retries. - Add `no_reset_exhaustion_policy` (warn_only, cooldown, disable_scope) and `no_reset_exhaustion_cooldown_seconds` to provider usage config, overridable via environment variables. - Introduce the `exhaustion_reason` attribute to the quota tracking pipeline to identify buckets with no reset time. - Implement the `_handle_no_reset_quota_exhaustion` method in the usage manager to apply the configured fallback cooldown. - Configure Gemini CLI provider to default to a 24-hour cooldown for these cases.

Document observed false-compaction patterns and define structural replacement semantics, response-event provenance, replay binding, persistence invariants, and the complete verification matrix. Explicitly preserve ordinary continuity when only a minority or middle portion of history changes.

Relocate the Gemini CLI OAuth provider, its Google OAuth base, and supporting modules (quota tracker, credential manager, tool handler, shared utils) into `providers/_retired/` so active startup paths stop importing OAuth credential flows. The API-key Gemini provider remains active outside this folder. - Empty `PROVIDER_MAP` in `provider_factory.py` and the OAuth provider maps in `credential_manager.py`, `settings_tool.py`, `credential_tool.py`, and `launcher_tui.py`. - Generalize docstrings, env-var examples, and the export/combine credential menus that previously hardcoded `gemini_cli` to operate over dynamic provider lists. - Update internal imports inside the retired modules to reflect their new directory depth. - Drop the dedicated `test_gemini_cli_protocol_declarations.py` suite and the `gemini_cli` alias from `ModelInfoService`. BREAKING CHANGE: The `gemini_cli` OAuth provider is no longer registered or discoverable. Deployments relying on Gemini CLI OAuth credentials must migrate to the API-key Gemini provider. The `GEMINI_CLI_*` environment variables and the `gemini_cli` provider alias are no longer honored by the active provider factory, credential manager, settings tool, or export menu.

…persistence Reject false-positive compaction lineages captured in production incidents (Mistral/DeepSeek classifier prompts and ordinary long histories) and tighten the safety invariants for optional session persistence. - Require structural replacement of more than half of the parent's high-water request history before creating a compaction descendant; minority, half, or middle-only replacement remains ordinary continuity on the parent session. - Restrict size-only compaction probes to early user/system/developer messages; assistant, tool, and function-result history is never treated as a size-only probe. - Require unmarked summaries to overlap at least two distinct completed response events so a single quoted response cannot trigger lineage. - Add opaque compaction replay anchors so an exact resend reuses the validated child session instead of spawning another descendant; changing non-probe history invalidates the replay key. - Preserve the first live owner of shared content anchors instead of letting an auxiliary request steal them, and keep response-event provenance when assistant content returns as ordinary request history. - Record streaming response identity only after an explicit provider completion signal (usage-backed final chunk, finish_reason with usage, or `[DONE]`); bare iterator EOF no longer establishes identity because some transports surface truncation as normal EOF. - Harden SSE parsing to accept event-prefixed and spaceless `data:` frames and to safely ignore non-object JSON, malformed choices/delta payloads, and duplicate streamed tool-call IDs. - Bump persistence schema to 2 storing hashed high-water history profiles, scoped anchor provenance, response-event groups, and replay bindings; raw message content is never persisted. - Rebuild anchor ownership from validated records on load, rejecting malformed containers, non-finite timestamps, expired sessions, orphan anchors, namespace mismatches, invalid strengths, and unsupported schemas; per-session and global caps are enforced without orphaning either side. - Serialize disk writes outside the main tracker lock, retain dirty state on failed or exceptional writes for retry, and reject stale delayed generations from overwriting newer snapshots. - Evict weak/ordinary evidence before strong replay/tool identity during trimming and TTL pruning; late responses cannot resurrect an expired session. Also in this commit: - docs(session): add Phase 11 hardening report and update the tracking plan and DOCUMENTATION.md to describe the new compaction decision contract, streaming completion requirements, and schema-2 persistence format.

…ineage diagnostics Phase 11 follow-up hardening that closes additional false-positive compaction paths discovered during stateful-scenario review and adds diagnostic visibility for the validation period. - Add evidence-bearing context binding so post-compaction requests with changed or extended tails continue the validated child instead of forking a new session; context anchors are minted only from probe groups that actually matched parent response evidence. - Require request-side parent evidence for unmarked compaction, so output aggregation prompts quoting two responses cannot become compaction lineage, and shared long system/user harnesses cannot bind by position alone. - Downgrade raw tool-call IDs from strong to weak because deterministic or counter-like IDs can be reused across independent conversations. - Give trusted explicit and provider identity precedence over replay/context bindings and suppress unrelated compaction lineage when authoritative identity is present. - Normalize fallback response callbacks to the session's original namespace, preserving continuity without migrating anchors across provider/model/usage scopes; enforce session namespace immutability on refresh. - Add deterministic tie-breaking to per-session and global anchor eviction so weak/ordinary evidence is evicted before replay/context identity. - Emit temporary warning-level `Session tracker decision` lines for every inference (new, continue, compaction_child, compaction_replay, compaction_continue, untracked) with selected/matched/candidate/parent IDs, namespace, confidence, score, persistence origin, and compaction evidence. - Expose `SESSION_PERSISTENCE_ENABLED` and `SESSION_PERSISTENCE_FLUSH_INTERVAL_SECONDS` environment settings on RotatingClient so restart behavior can be exercised through the live API. Also in this commit: - docs: refresh session-stickiness reference, Phase 11 plan, and hardening report - test: add stateful agentic tool loop, long-form, roleplay, namespace normalization, raw tool-ID collision, context TTL/cap/scope, and lineage-logging regression coverage

…ict isolation domains The logical session ID now crosses providers and models inside one strict caller/credential isolation domain. Public traffic, each named classifier, and every ad hoc private credential bundle maintain separate domains that never share anchors, compaction lineage, replay bindings, or trusted IDs. - Add `derive_session_isolation_key()` for provider-independent caller/credential domain keys - Replace `scope:{key}:provider:{p}:model:{m}` namespaces with immutable `session-domain:{key}` - Qualify provider-native evidence by provider + session_scope so opaque IDs never collide cross-provider - Clear provider affinity on cross-provider fallback while preserving the global logical session - Add proxy-owned global hint channel (typed `SessionTrackingHints.global_*_anchors`) for Responses IDs Closed tool event evidence for sparse agentic sessions: - Pair each assistant call one-to-one with a matching tool/function result in the same request - Hash ID + function name + canonicalized arguments as medium evidence - Enforce request-local closure; persisted evidence cannot upgrade later unpaired calls - Reconstruct streamed tool calls by provider choice index + tool index with incremental/cumulative argument support Responses API isolation and security: - Key stored responses by composite `(session_domain, response_id)` - Return `X-Proxy-Session-Domain` at creation; require it for non-public GET/DELETE/input-items - Validate every stored ancestor belongs to the requesting domain - Strip routing credentials (api_keys, providers) from stored requests and diagnostic traces Streaming and provider improvements: - Split mixed reasoning/content deltas for DiffusionGemma (NVIDIA) - Add DiffusionGemma `enable_thinking` template toggle - Accept cumulative argument snapshots that replace incremental prefixes Persistence schema 3: - Hash all external identifiers (explicit IDs, provider anchors, tool IDs, response IDs) - Reject schema-2 state instead of merging across old provider/model namespaces - Enforce file size, session count, and string length bounds on load - Default sticky binding TTL reduced from 3600s to 300s; TTL=0 disables per provider Also in this commit: - fix(streaming): reconstruct streamed tool-call arguments by choice and tool index - fix(nvidia): add DiffusionGemma thinking toggle via chat_template_kwargs - docs: update architecture, structure, and session tracking documentation BREAKING CHANGE: Session tracking namespaces changed from `scope:{key}:provider:{p}:model:{m}` to `session-domain:{key}`. Persistence schema bumped to version 3; existing schema-2 state files are rejected on load. Default sticky binding TTL reduced from 3600s to 300s. Responses API retrieval, deletion, and input-items for non-public responses now require the `X-Proxy-Session-Domain` header returned at creation time.

…sponse capabilities The previous `X-Proxy-Session-Domain` header was derived deterministically from classifier labels or credential bundle hashes. Anyone who could compute that hash could retrieve, delete, continue, or list input items for any non-public stored response. - Generate a per-response random capability (`{scope_key}.{secrets.token_urlsafe(32)}`) returned to HTTP clients; only its SHA-256 hash is stored with the response. - Validate the capability via `hmac.compare_digest` on retrieval, deletion, input-items listing, and `previous_response_id` continuation. Public responses still resolve via the `public` literal. - Add `ProviderCache.delete_async` plus deleted-key and clear-on-save tracking so durable storage removes entries on delete and clear instead of relying on TTL expiry alone. - Migrate legacy public ProviderCache entries on exact ID/scope match and delete the old key; ambiguous legacy collisions are rejected. - Add `ResponsesStore.close()` to the protocol and implementations so the proxy lifespan can shut down the injected cache's background tasks. - Redact authentication, cookie, and capability headers in `RawIOLogger` before persisting request and response data to disk. - Strip `reasoning_effort` from NVIDIA payloads when thinking is disabled so the parameter does not leak to providers. Also in this commit: - refactor(core): remove dead `session_possible_compaction` and `session_lineage_parent_id` fields from `RequestContext` and trim the `_namespace()` signature accordingly. - chore(proxy_app): drop duplicate and unused imports in `main.py`. BREAKING CHANGE: Pre-capability private Responses entries intentionally reject their old deterministic `X-Proxy-Session-Domain` header because accepting it would restore forgeable access. Those records expire under the configured Responses store TTL. Clients must use the new capability returned at creation.

mirrobot-agent

Follow-up Review: Phase 11 Session-Tracking Hardening

This increment is large and mostly lands Phase 11 — a substantial rewrite of session inference (structural compaction detection, replay/context bindings, global logical session domains, schema-3 persistence), plus the Responses-API unforgeable access-capability model, retiring the Gemini CLI OAuth provider, DiffusionGemma reasoning handling, the no-reset quota exhaustion policy, streaming response-evidence hardening, and auth-header redaction in the raw/transaction loggers.

Assessment of New Changes:

Session-tracker rewrite is well-reasoned and conservatively scoped. Compaction now requires structural replacement of more than half the parent high-water history (_retained_history_ratio), probes are restricted to early user/system/developer roles, unmarked summaries need ≥2 distinct response events plus request-side evidence, and authoritative identity overrides replay/context bindings. The eviction ranking (_anchor_eviction_key) keeping compaction_context/compaction_replay above ordinary evidence is a nice touch. Test coverage is exceptional (119+ tests including restart, concurrency, and stateful multi-provider simulations).
Responses capability tokens are a real security improvement. Composite (domain, response_id) storage, per-response random nonces with only the hash persisted, hmac.compare_digest verification, recursive credential redaction from stored requests and traces, and legacy-entry migration/rejection all close the forgeable-deterministic-header gap. The streaming in-progress state correctly carries scope_access_hash so it is immediately capability-accessible.
Auth-header redaction (_redact_headers, _make_json_safe) and the move to datetime.now(timezone.utc) are solid, low-risk hardening.
No-reset quota policy is a clean, opt-in fallback with good env validation and per-provider defaults.

Key Suggestions (see inline):

Ad-hoc bundle: domains now each mint a dedicated, never-evicted UsageManager + usage file — a latent unbounded-growth path under high-cardinality private traffic.
The no-reset policy guard relies on implicit and/or precedence; one explicit pair of parens would make it regression-proof.
The DiffusionGemma model list is duplicated between the streaming handler and nvidia_provider.py.

Nitpick: The disable_scope policy name implies whole-scope disabling, but _handle_no_reset_quota_exhaustion just applies a (very long) model/group cooldown identical to the cooldown path — the only difference is the 1-year default duration. If true scope-level disabling isn't planned, a clearer name (e.g. cooldown_long) would reduce confusion.

Status of Previous Feedback:
The third-pass items remain unaddressed in this diff (none of retry_policy.py, the responses/service.py streaming poll/finally block, or usage/accounting.py's Gemini cache-write path were touched here, and responses/store.py:119 still reaches into the private _save_to_disk). Still-open: model-scoped cooldown bypassing the failure-history threshold, acquire_task.result() able to mask the original exception in the finally block, the duplicated next_upstream_chunk/acquire_upstream_stream polling logic, the disconnect-task leak in client/streaming.py, the Gemini cache-write double-count, and the private _save_to_disk coupling. None are blockers for this experimental branch, but the auth/streaming ones are worth circling back to.

Overall Status: Strong, well-tested increment. No blocking issues; the inline items are non-blocking improvements.

This review was generated by an AI assistant.

mirrobot-agent · 2026-07-15T23:47:09Z

+        session_domain = derive_session_isolation_key(
+            classifier,
+            request_api_keys,
+            request_providers,
+            private,
+        )
+        # Ad hoc bundles need their own usage manager as well as their own
+        # session namespace. Otherwise cooldowns, sticky entries, and credential
+        # registries from unrelated private bundles would share "default".
+        scope_name = (
+            session_domain
+            if session_domain.startswith("bundle:")
+            else (classifier or "default")
+        )


Isolating each ad-hoc bundle: domain into its own usage manager (and its own classifiers/bundle_<hash>/usage_<provider>.json file) is the right call for correctness, but UsageManagerRegistry.managers has no eviction. Under high-cardinality ad-hoc traffic (callers passing unique api_keys + private=true per request), every distinct bundle mints a long-lived UsageManager plus a new on-disk usage file, so memory and file counts grow without bound for the life of the process.

Named classifiers are bounded by the operator, but ad-hoc bundles are not. Consider an LRU/TTL cap (or a dedicated registry for ad-hoc bundles) so stale private-bundle managers and their files are reaped after idle.

mirrobot-agent · 2026-07-15T23:47:09Z

This guard relies on Python's and binding tighter than or:

policy == "warn_only" or configured_duration <= 0 and policy == "cooldown"

parses as warn_only or (duration <= 0 and cooldown). That happens to be intended (a cooldown policy with zero duration falls back to warn-only, while disable_scope with zero duration still reaches the 1-year default below), but it is easy to misread and brittle if the terms are ever reordered. Making the grouping explicit prevents a future regression.

Suggested change

if policy == "warn_only" or configured_duration <= 0 and policy == "cooldown":

if policy == "warn_only" or (configured_duration <= 0 and policy == "cooldown"):

mirrobot-agent · 2026-07-15T23:47:09Z

+_MIXED_REASONING_CONTENT_MODELS = frozenset(
+    {
+        "google/diffusiongemma-26b-a4b-it",
+        "nvidia_nim/google/diffusiongemma-26b-a4b-it",
+    }


_MIXED_REASONING_CONTENT_MODELS duplicates the DiffusionGemma identity already declared as DIFFUSION_GEMMA_MODEL_EXACT in nvidia_provider.py. The two lists must stay in sync manually, and model.lower() in _MIXED_REASONING_CONTENT_MODELS only matches two hard-coded prefixed forms, so a differently-prefixed alias silently skips the reasoning/content split.

Consider exposing the canonical model set from the provider (e.g. a shared constant or a provider hook like needs_mixed_reasoning_split(model)) so there is a single source of truth.

Add first-class Gemini generateContent/countTokens ingress and independent output protocol selection across Chat, Anthropic, Responses, and Gemini clients. Allow startup-configured custom providers to declare one of the four generative wire protocols, safe same-origin endpoints, authentication mode, models, adapters, field-cache rules, streaming capability, and output defaults without embedding credentials. Preserve no-auth operation across global and isolated scopes. Harden native non-success handling so status, provider bodies, retry timing, aggregate credential fallback, Responses storage, and selected-protocol errors remain correct. Cross-protocol streaming stays explicitly rejected until the canonical streaming phase. Cover the public surface, configuration validation, provider snapshots, scope isolation, native transport errors, and fallback behavior; the tracked local suite passes 889 tests and 18 subtests.

Normalize Chat, Anthropic, Responses, and Gemini provider streams through one stateful canonical lifecycle, then format an independently selected client protocol after existing timeout, retry, accounting, and session handling. Add native and in-band error rotation, lifecycle-safe retries, multiline SSE decoding, hidden stream-state extraction, Responses store-before-reformat behavior, Gemini streamGenerateContent routes, and native streaming for Antigravity, Claude Code, Codex, and Copilot. Cover the 4x4 semantic and native stream matrices, LiteLLM/custom execution, tool fragments, usage, incomplete termination, fallback visibility, hidden signatures, and provider errors with the full local suite.

Unify streaming and non-streaming signature/continuation rules under validated logical cache identities. Hash every scope component, enforce safe configured overrides and semantic continuation suppression, snapshot runtime config, and reject incompatible singleton rebinding. Bound every cache mode by count/bytes/TTL/LRU and make ProviderCache append atomic without masking legacy store errors. Add real Antigravity, Claude Code, Codex, and Copilot contracts across output protocols, mixed-mode turns, isolation dimensions, and local continuation handling.

Render proxy-side validation, context, authentication, timeout, and internal failures in the independently selected generative protocol across Chat, Responses, Anthropic, and Gemini routes. Classify common context-length errors consistently, reject non-generative provider protocol overrides at config validation, and align Anthropic open-access authentication. Remove the obsolete returned-error dictionary path and add public regressions for selected output errors, malformed Responses JSON, invalid selectors, and provider config declarations. Record the completed Phase H whole-system audit and beta-readiness verdict with 1046 passing tests plus 18 subtests.

mirrobot-agent · 2026-07-17T03:02:57Z

Starting my review of the Protocol Interoperability correction — the new canonical streaming converter, cross-protocol request/response formatting, Gemini client routes, structured error propagation, and dynamic provider config. This is a large architectural increment; I'll analyze each subsystem and report back with a bundled review.

mirrobot-agent

Overall Assessment

This increment delivers the Protocol Interoperability correction — making client, provider, and output protocols independently selectable across all four generative APIs (OpenAI Chat, Responses, Anthropic Messages, Gemini). This is the most architecturally significant change since the initial protocol layer was introduced.

Strengths:

The new canonical.py module is excellent — it centralizes stop-reason mapping, tool-choice normalization, structured-output conversion, tool-result resolution, and message coalescing into one well-documented location that all four protocols share. The de-duplication logic in message_tool_calls / message_reasoning correctly handles the dual representation (message-level + content-block-level).
The validation.py module enforces meaning-changing field preservation before transport: it rejects cross-protocol conversions that would silently drop tool calls, media without identity, unsupported modalities, or provider-bound Responses controls. This is the right place to fail early.
StructuredAPIResponseError with to_protocol_payload() is a clean solution for formatting terminal errors in the selected client protocol without leaking internal state.
The field-cache store improvements (bounded appends with max_values/max_bytes, LRU eviction on InMemoryFieldCacheStore, atomic update_async on ProviderCacheFieldStore) directly address the previously flagged unbounded-growth concerns.
New Gemini client routes (generateContent, streamGenerateContent, countTokens) with proper auth handling (x-goog-api-key, query key, and Bearer fallback) close the "no public Gemini client route" gap.
The config validation for custom providers (_configured_api_base, _configured_endpoint_paths, _configured_auth_mode) is thorough — rejecting credential-bearing URLs, fragments, and cross-origin endpoints.

Architectural Notes

The streaming architecture now has a multi-stage conversion pipeline:

native provider SSE → native parse_stream_event → canonical event
  → Chat format_stream_event → Chat SSE string (operational format)
  → Chat parse_stream_event → canonical event
  → output format_stream_event → client SSE string

This round-trips through Chat SSE as the operational lingua franca, which is architecturally clean but means canonical events are serialized and re-parsed twice for cross-protocol output. The _stream_native_response double-parse finding below is a specific instance of this.

Key Suggestions

Gemini stream usage drops cache_write_tokens from promptTokenCount — see inline.
Double stream frame parsing in Responses native streaming — see inline.
Three-way merge in _canonical_request_for_native can silently strip source-native fields on fallback — see inline.
_in_band_error_payload may false-positive on non-fatal provider warning frames — see inline.
Gemini ProtocolError propagation in streaming should emit a graceful terminal frame — see inline.

Nitpicks

_tool_result_text in openai_chat.py duplicates tool_result_text from canonical.py — consider reusing the canonical helper.
_responses_item_delta / _responses_item_done use list(state.item_ids).index(key) (O(n)) for output index lookup — negligible for typical streams but could use a parallel index map.
get_models on DynamicOpenAICompatibleProvider changed from sync to async — verify all callers are updated.

Status of Previous Feedback

The following previously-open items remain unaddressed in this diff (none of the affected files were modified here):

retry_policy.py — model-scoped cooldown bypasses failure-history threshold
responses/service.py — acquire_task.result() can mask original exception in finally block; duplicated polling logic between next_upstream_chunk / acquire_upstream_stream
client/streaming.py — disconnect-task leak if is_disconnected() raises
usage/accounting.py — Gemini _from_gemini_usage cache-write double-count (the streaming converter has the same class of issue — see inline)
responses/store.py — private _save_to_disk coupling

Overall Status: Strong, well-structured increment. No blocking issues; the inline items are non-blocking improvements for correctness and robustness.

This review was generated by an AI assistant.

mirrobot-agent · 2026-07-17T03:05:30Z

+        return None
+    return {
+        "promptTokenCount": usage.input_tokens + usage.cache_read_tokens,
+        "candidatesTokenCount": usage.output_tokens,


_gemini_usage computes promptTokenCount as input_tokens + cache_read_tokens but omits cache_write_tokens. The OpenAI (_openai_usage) and Responses (_responses_usage) formatters in this same file both add cache_read_tokens + cache_write_tokens. If a source provider reported cache-write tokens, the Gemini-formatted prompt count will be too low, and totalTokenCount will no longer equal promptTokenCount + candidatesTokenCount.

Suggested change

return None

return {

"promptTokenCount": usage.input_tokens + usage.cache_read_tokens,

"candidatesTokenCount": usage.output_tokens,

"promptTokenCount": usage.input_tokens + usage.cache_read_tokens + usage.cache_write_tokens,

mirrobot-agent · 2026-07-17T03:05:30Z

+        async for raw_frame in response_stream:
+            if isinstance(raw_frame, str) and raw_frame.lstrip().startswith(":"):
+                yield raw_frame
+                continue
+            event = self.protocol.parse_stream_event(raw_frame, response_context)
+            payload = event.extra.get("payload") if isinstance(event.extra, dict) else None
+            response_payload = payload.get("response") if isinstance(payload, dict) and isinstance(payload.get("response"), dict) else None
+            if response_payload and event.type in {"response.completed", "response.failed", "response.incomplete"}:
+                completed = True
+                _record_responses_session_anchor(session_info, response_payload)
+                self._trace_responses_usage(transaction_logger, response_payload, unified.model, source="responses_stream")
+                stored = await self._store_stream_response(
+                    stream_request,
+                    response_payload,
+                    parent,
+                    failed=event.type == "response.failed",
+                    transaction_logger=transaction_logger,
+                    session_info=session_info,
+                )
+                self._trace(
+                    transaction_logger,
+                    "responses_stored_stream_response" if stored else "responses_store_skipped",
+                    response_payload if stored else {"response_id": response_payload.get("id")},
+                    direction="metadata",
+                    stage="final",


Each raw_frame is parsed twice: once at line ~476 via self.protocol.parse_stream_event(raw_frame, response_context) for session/usage tracking, and again inside converter.convert(raw_frame) via source_protocol.parse_stream_event. For high-throughput streams this doubles JSON decode + parse work per frame. Consider exposing a method on ProtocolStreamConverter that accepts the already-parsed UnifiedStreamEvent (e.g., converter.format_event(event)) so the tracking pass and the formatting pass share one parse.

mirrobot-agent · 2026-07-17T03:05:30Z

+    def _canonical_request_for_native(
+        context: RequestContext,
+        request_payload: Dict[str, Any],
+    ) -> Any:
+        """Overlay protocol-safe attempt mutations onto the canonical request.
+
+        Existing transforms and callbacks operate on the Chat execution view.
+        Only canonical fields changed relative to that view are overlaid, so
+        source-native semantics that Chat cannot represent remain intact.
+        """
+
+        canonical_request = deepcopy(context.unified_request)
+        chat_protocol = get_protocol("openai_chat")
+        baseline = chat_protocol.parse_request(context.kwargs)
+        attempted = chat_protocol.parse_request(request_payload)
+        original_instructions = [message for message in canonical_request.messages if message.role in {"system", "developer"}]
+        original_conversation = [message for message in canonical_request.messages if message.role not in {"system", "developer"}]
+        baseline_instructions = [message for message in baseline.messages if message.role in {"system", "developer"}]
+        baseline_conversation = [message for message in baseline.messages if message.role not in {"system", "developer"}]
+        attempted_instructions = [message for message in attempted.messages if message.role in {"system", "developer"}]
+        attempted_conversation = [message for message in attempted.messages if message.role not in {"system", "developer"}]
+        if baseline_conversation != attempted_conversation:
+            original_conversation = _merge_canonical_sequence(
+                original_conversation,
+                baseline_conversation,
+                attempted_conversation,
+                _merge_canonical_message,
+                lambda message: (message.role, message.tool_call_id, message.name),
+            )
+        if baseline_instructions != attempted_instructions:
+            if canonical_request.system and not original_instructions:
+                baseline_blocks = [block for message in baseline_instructions for block in message.content]
+                attempted_blocks = [block for message in attempted_instructions for block in message.content]
+                canonical_request.system = _merge_canonical_sequence(
+                    canonical_request.system,
+                    baseline_blocks,
+                    attempted_blocks,
+                    _merge_canonical_content_block,
+                    lambda block: block.type,
+                )
+            else:
+                original_instructions = _merge_canonical_sequence(
+                    original_instructions,
+                    baseline_instructions,
+                    attempted_instructions,
+                    _merge_canonical_message,
+                    lambda message: (message.role, message.tool_call_id, message.name),
+                )
+        canonical_request.messages = original_instructions + original_conversation
+        if baseline.tools != attempted.tools:
+            canonical_request.tools = _merge_canonical_sequence(
+                canonical_request.tools,
+                baseline.tools,
+                attempted.tools,
+                _merge_canonical_tool,
+                lambda tool: (tool.type, tool.name),
+            )
+        for field_name in ("stream", "modalities"):
+            if getattr(baseline, field_name) != getattr(attempted, field_name):
+                setattr(canonical_request, field_name, deepcopy(getattr(attempted, field_name)))
+        metadata_keys = set(baseline.metadata) | set(attempted.metadata)
+        for key in metadata_keys:
+            if baseline.metadata.get(key) == attempted.metadata.get(key):
+                continue
+            if key in attempted.metadata:
+                canonical_request.metadata[key] = deepcopy(attempted.metadata[key])
+            else:
+                canonical_request.metadata.pop(key, None)
+        if baseline.response_format != attempted.response_format:
+            canonical_request.response_format = deepcopy(attempted.response_format)
+        generation_keys = set(baseline.generation_params) | set(attempted.generation_params)
+        for key in generation_keys:
+            baseline_value = baseline.generation_params.get(key)
+            attempted_value = attempted.generation_params.get(key)
+            if baseline_value == attempted_value:
+                continue


_canonical_request_for_native performs a three-way merge (original canonical × Chat baseline × Chat attempted) to overlay transform/callback mutations onto the source-native request. The merge relies on positional zip alignment and identity-key matching, which can silently misapply edits if transforms reorder messages, insert/delete turns, or change block types. When an item can't be matched, _without_source_artifacts strips raw and extra — losing source-native fields the design intends to preserve. Consider emitting a trace warning when the fallback path is taken so silent data loss is diagnosable.

mirrobot-agent · 2026-07-17T03:05:30Z

+        return ClassifiedError(
+            error_type=str(explicit_error_type).strip().lower().replace("-", "_").replace(" ", "_"),
+            original_exception=e,
+            status_code=getattr(e, "status_code", None),
+            retry_after=get_retry_after(e),
+        )
+


classify_error now calls get_retry_after(e) for explicit error types. Previously, an explicit error_type short-circuited without checking retry-after. This changes retry behavior: a StructuredAPIResponseError with error_type="rate_limit" will now pick up retry-after from its embedded response/headers dicts (added to get_retry_after at lines 700–716). This is likely intentional, but verify it doesn't cause unexpected same-key retries for structured errors that previously triggered immediate credential rotation.

mirrobot-agent · 2026-07-17T03:05:30Z

+                error = self._in_band_error_payload(chunk)
+                if error is not None:
+                    raise StreamedAPIError(
+                        str(error.get("message") or error.get("type") or "Provider stream failed"),
+                        data={"error": error},
+                    )


_in_band_error_payload is called at the top of _process_chunk and raises StreamedAPIError for any chunk containing an "error" key or error event type. Some providers emit non-fatal warning/deprecation frames with an error-like shape alongside valid data. This would terminate the stream on the first such frame. Consider gating on explicit terminal semantics (e.g., HTTP 4xx/5xx status codes, response.failed events, or a configurable allow-list) rather than any "error" key presence.

mirrobot-agent · 2026-07-17T03:05:30Z

+    if _is_terminal(event):
+        for key in state.tool_names:
+            if key in state.emitted_tools:
+                continue
+            arguments_text = state.tool_arguments.get(key, "")
+            if arguments_text and not _is_json(arguments_text):
+                raise ProtocolError(
+                    "Gemini cannot emit an incomplete streamed tool-call argument object",
+                    protocol="gemini",
+                    pass_name="format_stream_event",
+                    payload={"tool_call": key},
+                )


_format_gemini raises ProtocolError when terminal tool-call arguments aren't valid JSON. This is architecturally correct (Gemini requires complete function-call objects), but the exception propagates through convert_protocol_stream unhandled — the executor's generic except Exception will emit a generic "Streaming request failed" error rather than a protocol-specific terminal frame. Consider catching this inside the converter and emitting a terminal error event in the output protocol so the client receives a meaningful error.

Mirrowel added 30 commits May 30, 2026 22:35

mirrobot-agent Bot reviewed May 31, 2026

View reviewed changes

Mirrowel added 3 commits June 1, 2026 00:28

docs(config): plan phase 10c completion

80b9683

Adds the Phase 10c remediation plan for startup secret masking, provider config wiring, pricing env tolerance, routing validation, and JSON secret-key hardening.

Mirrowel mentioned this pull request Jun 16, 2026

test: add pytest test suite for rotator_library core modules (64 tests) #163

Open

5 tasks

Mirrowel marked this pull request as ready for review June 16, 2026 22:22

Merge branch 'dev' into experimental

5c17330

mirrobot-agent Bot reviewed Jun 16, 2026

View reviewed changes

Mirrowel added 8 commits June 17, 2026 08:11

mirrobot-agent Bot reviewed Jul 15, 2026

View reviewed changes

Mirrowel added 3 commits July 16, 2026 03:20

docs(protocols): 📚 define interoperability correction contract

455de69

feat(protocols): ✨ add canonical cross-protocol conversion

1f5325f

feat(protocols): ✨ wire independent protocol runtime

4412be1

This was referenced Jul 16, 2026

feat: add MiniMax model and endpoint support #166

Open

Add native Player2 provider with assisted login (local app detection + device code flow) #165

Open

Mirrowel added 4 commits July 16, 2026 22:02

mirrobot-agent Bot reviewed Jul 17, 2026

View reviewed changes

	if policy == "warn_only" or configured_duration <= 0 and policy == "cooldown":
	if policy == "warn_only" or (configured_duration <= 0 and policy == "cooldown"):

Uh oh!

Conversation

Mirrowel commented May 31, 2026

Experimental Native Protocol Roadmap

Operating Rules

Strategic Goal

Priority Order

Non-Goals For This Branch

Current Strengths To Preserve

Reference Gateway Ideas To Import Carefully

Phase Index

Completeness Matrix

Code Quality Expectations

Uh oh!

mirrobot-agent Bot commented May 31, 2026

Uh oh!

mirrobot-agent Bot left a comment

Choose a reason for hiding this comment

Overall Assessment

Status of Previous Feedback

Key Suggestions

Architectural Notes

Uh oh!

mirrobot-agent Bot May 31, 2026

Choose a reason for hiding this comment

Uh oh!

mirrobot-agent Bot May 31, 2026

Choose a reason for hiding this comment

Uh oh!

mirrobot-agent Bot May 31, 2026

Choose a reason for hiding this comment

Uh oh!

mirrobot-agent Bot May 31, 2026

Choose a reason for hiding this comment

Uh oh!

mirrobot-agent Bot May 31, 2026

Choose a reason for hiding this comment

Uh oh!

mirrobot-agent Bot May 31, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mirrobot-agent Bot left a comment

Choose a reason for hiding this comment

Follow-up Review: Phase 10c Config Completion

Uh oh!

mirrobot-agent Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

mirrobot-agent Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

mirrobot-agent Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

mirrobot-agent Bot left a comment

Choose a reason for hiding this comment

Follow-up Review: Phase 11 Session-Tracking Hardening

Uh oh!

mirrobot-agent Bot Jul 15, 2026

Choose a reason for hiding this comment

Uh oh!

mirrobot-agent Bot Jul 15, 2026

Choose a reason for hiding this comment

Uh oh!

mirrobot-agent Bot Jul 15, 2026

Choose a reason for hiding this comment

Uh oh!

mirrobot-agent Bot commented Jul 17, 2026

Uh oh!

mirrobot-agent Bot left a comment

Choose a reason for hiding this comment

Overall Assessment

Architectural Notes

Key Suggestions

Nitpicks

Status of Previous Feedback

Uh oh!

greptile-apps Bot commented Jun 16, 2026 •

edited

Loading