Production Issue (2026-01-25): After stopping OpenCode clients, the proxy continued processing new incoming HTTP requests with identical payloads. This indicated client-side retry logic continuing even after shutdown.
- ❌ Cost waste: Each zombie retry consumed backend API quota (87k+ tokens per request)
- ❌ Log pollution: Made debugging difficult
- ❌ False metrics: Inflated usage statistics
- ❌ Session confusion: Interleaved with legitimate requests
The proxy was working correctly - it processed all incoming HTTP requests as expected. The issue was:
- Streaming bypass: Deduplication was disabled for all streaming requests
- Client bugs: OpenCode's retry logic didn't clear when stopped
- No status tracking: Couldn't distinguish zombie retries from legitimate 429 retries
Enhanced RequestDeduplicationService to track request completion status and make intelligent duplicate decisions:
| Original Status | Duplicate Arrives | Behavior | Reason |
|---|---|---|---|
| IN_FLIGHT | Any time | ❌ BLOCKED | True parallel duplicate |
| SUCCESS (200) | Within window | ❌ BLOCKED | Zombie retry after success |
| RETRIABLE_ERROR (429, 503, 502, 504, 408) | ANY TIME | ✅ ALLOWED | Legitimate retry |
| CLIENT_DISCONNECT | Within window | ❌ BLOCKED | Zombie retry after disconnect |
| Any status | After window expires | ✅ ALLOWED | Expired, treat as new |
Retries after 429/503 errors are NEVER blocked, regardless of timing.
This ensures the fix doesn't interfere with legitimate retry workflows while preventing zombie request waste.
@dataclass
class TrackedRequest:
timestamp: float
status: RequestStatus # IN_FLIGHT, SUCCESS, RETRIABLE_ERROR, CLIENT_DISCONNECT
status_code: int | None = None
async def check_and_register(request, session_id):
# CRITICAL: Always allow retries after retriable errors
if tracked.status == RequestStatus.RETRIABLE_ERROR:
return (False, hash) # Not a duplicate
# Block duplicates of in-flight, success, or disconnected requests
if age < window and tracked.status in (IN_FLIGHT, SUCCESS, CLIENT_DISCONNECT):
return (True, hash) # Is a duplicate- Removed streaming bypass (now dedups all requests)
- Calls
mark_request_complete()with status code after request completes - Handles client disconnects (
asyncio.CancelledError) - Preserves
x-llmproxy-no-dedupheader for opt-out
Before:
if request.stream:
return True # Bypass deduplicationAfter:
# Only bypass if explicitly requested via header
if headers.get("x-llmproxy-no-dedup") == "true":
return True
return False # Apply deduplicationNo configuration changes required. The existing deduplication settings apply:
# Default values (configured via DI registration)
deduplication:
window_seconds: 3.0 # Block duplicates within 3 seconds
enabled: true # Now applies to streaming too
max_cache_size: 10000To opt out (for specific clients):
curl -H "x-llmproxy-no-dedup: true" ...- ✅
test_retry_after_429_always_allowed- Never blocks 429 retries - ✅
test_retry_after_503_allowed- Allows service unavailable retries - ✅
test_retry_after_success_blocked- Blocks zombie retries after 200 - ✅
test_retry_after_client_disconnect_blocked- Blocks zombie retries after disconnect - ✅
test_parallel_duplicate_blocked- Blocks true parallel duplicates - ✅
test_multiple_retries_after_429_allowed- Allows retry loops - ✅
test_zombie_pattern_detection- Reproduces production scenario
- ✅
test_streaming_dedup_enabled_for_streaming_requests- Streaming now dedups - ✅
test_streaming_dedup_bypass_via_header- Opt-out still works - ✅ Backend request manager integration tests
- ✅ Legitimate 429 retries: Unaffected (always allowed)
- ✅ Normal workflows: Unaffected (dedups only identical requests)
- ✅ Opt-out header: Still works (
x-llmproxy-no-dedup: true) ⚠️ Breaking: Streaming requests now deduplicated (was bypassed before)
Migration: If clients rely on streaming bypass, add x-llmproxy-no-dedup: true header.
The fix prevents the exact scenario from logs:
# Before fix:
01:53:37 - Client sends request (120 messages)
01:53:41 - Client disconnects
01:53:42 - Client sends SAME request again ← Zombie retry
01:53:42 - Client disconnects
01:53:42 - Client sends SAME request again ← Zombie retry
...continues indefinitely
# After fix:
01:53:37 - Client sends request (120 messages)
01:53:41 - Client disconnects → marked as CLIENT_DISCONNECT
01:53:42 - Client sends SAME request → BLOCKED (duplicate)
Check deduplication stats via diagnostics endpoint:
stats = dedup_service.get_stats()
print(f"Retries after errors: {stats.extra['retries_after_error_allowed']}")
print(f"Zombies blocked: {stats.duplicates_blocked}")High duplicates_blocked with low retries_after_error_allowed indicates zombie request patterns.