Zombie Request Deduplication Fix

Problem: Phantom Requests After Client Shutdown

Production Issue (2026-01-25): After stopping OpenCode clients, the proxy continued processing new incoming HTTP requests with identical payloads. This indicated client-side retry logic continuing even after shutdown.

Impact

❌ Cost waste: Each zombie retry consumed backend API quota (87k+ tokens per request)
❌ Log pollution: Made debugging difficult
❌ False metrics: Inflated usage statistics
❌ Session confusion: Interleaved with legitimate requests

Root Cause

The proxy was working correctly - it processed all incoming HTTP requests as expected. The issue was:

Streaming bypass: Deduplication was disabled for all streaming requests
Client bugs: OpenCode's retry logic didn't clear when stopped
No status tracking: Couldn't distinguish zombie retries from legitimate 429 retries

Solution: Status-Aware Deduplication

Enhanced RequestDeduplicationService to track request completion status and make intelligent duplicate decisions:

Deduplication Matrix

Original Status	Duplicate Arrives	Behavior	Reason
IN_FLIGHT	Any time	❌ BLOCKED	True parallel duplicate
SUCCESS (200)	Within window	❌ BLOCKED	Zombie retry after success
RETRIABLE_ERROR (429, 503, 502, 504, 408)	ANY TIME	✅ ALLOWED	Legitimate retry
CLIENT_DISCONNECT	Within window	❌ BLOCKED	Zombie retry after disconnect
Any status	After window expires	✅ ALLOWED	Expired, treat as new

Critical Guarantee

Retries after 429/503 errors are NEVER blocked, regardless of timing.

This ensures the fix doesn't interfere with legitimate retry workflows while preventing zombie request waste.

Implementation Changes

1. Enhanced `RequestDeduplicationService`

@dataclass
class TrackedRequest:
    timestamp: float
    status: RequestStatus  # IN_FLIGHT, SUCCESS, RETRIABLE_ERROR, CLIENT_DISCONNECT
    status_code: int | None = None

async def check_and_register(request, session_id):
    # CRITICAL: Always allow retries after retriable errors
    if tracked.status == RequestStatus.RETRIABLE_ERROR:
        return (False, hash)  # Not a duplicate
    
    # Block duplicates of in-flight, success, or disconnected requests
    if age < window and tracked.status in (IN_FLIGHT, SUCCESS, CLIENT_DISCONNECT):
        return (True, hash)  # Is a duplicate

2. Updated `BackendRequestManager`

Removed streaming bypass (now dedups all requests)
Calls mark_request_complete() with status code after request completes
Handles client disconnects (asyncio.CancelledError)
Preserves x-llmproxy-no-dedup header for opt-out

3. Enabled Streaming Deduplication

Before:

if request.stream:
    return True  # Bypass deduplication

After:

# Only bypass if explicitly requested via header
if headers.get("x-llmproxy-no-dedup") == "true":
    return True
return False  # Apply deduplication

Configuration

No configuration changes required. The existing deduplication settings apply:

# Default values (configured via DI registration)
deduplication:
  window_seconds: 3.0  # Block duplicates within 3 seconds
  enabled: true  # Now applies to streaming too
  max_cache_size: 10000

To opt out (for specific clients):

curl -H "x-llmproxy-no-dedup: true" ...

Test Coverage

Unit Tests (22/22 passed)

✅ test_retry_after_429_always_allowed - Never blocks 429 retries
✅ test_retry_after_503_allowed - Allows service unavailable retries
✅ test_retry_after_success_blocked - Blocks zombie retries after 200
✅ test_retry_after_client_disconnect_blocked - Blocks zombie retries after disconnect
✅ test_parallel_duplicate_blocked - Blocks true parallel duplicates
✅ test_multiple_retries_after_429_allowed - Allows retry loops
✅ test_zombie_pattern_detection - Reproduces production scenario

Integration Tests (4/4 passed)

✅ test_streaming_dedup_enabled_for_streaming_requests - Streaming now dedups
✅ test_streaming_dedup_bypass_via_header - Opt-out still works
✅ Backend request manager integration tests

Backward Compatibility

✅ Legitimate 429 retries: Unaffected (always allowed)
✅ Normal workflows: Unaffected (dedups only identical requests)
✅ Opt-out header: Still works (x-llmproxy-no-dedup: true)
⚠️ Breaking: Streaming requests now deduplicated (was bypassed before)

Migration: If clients rely on streaming bypass, add x-llmproxy-no-dedup: true header.

Production Verification

The fix prevents the exact scenario from logs:

# Before fix:
01:53:37 - Client sends request (120 messages)
01:53:41 - Client disconnects
01:53:42 - Client sends SAME request again  ← Zombie retry
01:53:42 - Client disconnects
01:53:42 - Client sends SAME request again  ← Zombie retry
...continues indefinitely

# After fix:
01:53:37 - Client sends request (120 messages)
01:53:41 - Client disconnects → marked as CLIENT_DISCONNECT
01:53:42 - Client sends SAME request → BLOCKED (duplicate)

Monitoring

Check deduplication stats via diagnostics endpoint:

stats = dedup_service.get_stats()
print(f"Retries after errors: {stats.extra['retries_after_error_allowed']}")
print(f"Zombies blocked: {stats.duplicates_blocked}")

High duplicates_blocked with low retries_after_error_allowed indicates zombie request patterns.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zombie Request Deduplication Fix

Problem: Phantom Requests After Client Shutdown

Impact

Root Cause

Solution: Status-Aware Deduplication

Deduplication Matrix

Critical Guarantee

Implementation Changes

1. Enhanced `RequestDeduplicationService`

2. Updated `BackendRequestManager`

3. Enabled Streaming Deduplication

Configuration

Test Coverage

Unit Tests (22/22 passed)

Integration Tests (4/4 passed)

Backward Compatibility

Production Verification

Monitoring

FilesExpand file tree

zombie-request-fix.md

Latest commit

History

zombie-request-fix.md

File metadata and controls

Zombie Request Deduplication Fix

Problem: Phantom Requests After Client Shutdown

Impact

Root Cause

Solution: Status-Aware Deduplication

Deduplication Matrix

Critical Guarantee

Implementation Changes

1. Enhanced RequestDeduplicationService

2. Updated BackendRequestManager

3. Enabled Streaming Deduplication

Configuration

Test Coverage

Unit Tests (22/22 passed)

Integration Tests (4/4 passed)

Backward Compatibility

Production Verification

Monitoring

1. Enhanced `RequestDeduplicationService`

2. Updated `BackendRequestManager`