Random Model/Backend Replacement

Probabilistically replace user-specified backend:model pairs with alternative models to improve session diversity and provide resilience when specific models encounter difficulties with certain problems.

Overview

The Random Model/Backend Replacement feature enables probabilistic swapping of backend:model pairs during a session. When activated, the system routes requests to an alternative model for a configurable number of turns before returning to the original model. This provides session diversity and can help when a specific model struggles with particular types of problems that alternative models might solve more effectively.

Key Features

Probabilistic Activation: Configure the likelihood (0.0-1.0) of triggering replacement for each session
Multi-Turn Persistence: Replacement remains active for a configurable number of consecutive turns
OAUTH-AUTO Support: Optional support for multi-account oauth-auto rotating backends via explicit override
Per-Session State: Each session maintains independent replacement state
Opt-Out Support: Disable replacement via request headers or session-level configuration
Transparent Operation: Works seamlessly with existing features like tool filtering, wire capture, and usage accounting
Streaming Compatible: Full support for streaming responses with replacement models

Why This Feature Is Useful

Improved Resilience: When a model encounters difficulties with a specific problem, an alternative model might provide a fresh perspective or different approach
Session Diversity: Probabilistic replacement introduces variety in model responses, which can be valuable for testing and development
Automatic Fallback: No manual intervention required - the system automatically tries alternative models based on configured probability
Cost Optimization: Configure replacement to use more cost-effective models for certain types of tasks
Testing and Validation: Useful for comparing model behaviors and validating that your application works with multiple backends

Configuration

Configuration follows standard precedence: CLI > Environment Variables > YAML

YAML Configuration

Add to your config.yaml:

replacement:
  enabled: true
  probability: 0.3  # 30% chance of replacement
  replacement_rules:
    - from_pattern: "*"  # Wildcard: matches all models
      to_backend: "qwen-oauth"
      to_model: "qwen3-coder-plus"
    - from_pattern: "gpt-4"  # Partial match: matches any model containing "gpt-4"
      to_backend: "openai"
      to_model: "gpt-3.5-turbo"
    - from_pattern: "openai:gpt-4"  # Exact match: matches specific backend:model
      to_backend: "anthropic"
      to_model: "claude-3-5-sonnet"
  turn_count: 3  # Stay with replacement for 3 turns
  allow_oauth_auto_replacement: true  # Allow replacement for oauth-auto backends

Legacy format (automatically converted to wildcard rule):

replacement:
  enabled: true
  probability: 0.3
  backend_model: "qwen-oauth:qwen3-coder-plus"  # Deprecated
  turn_count: 3

Environment Variables

# Enable/disable the feature
REPLACEMENT_ENABLED=true

# Set replacement probability (0.0-1.0)
REPLACEMENT_PROBABILITY=0.3

# Specify replacement rules as JSON array
REPLACEMENT_RULES='[{"from_pattern":"*","to_backend":"qwen-oauth","to_model":"qwen3-coder-plus"},{"from_pattern":"gpt-4","to_backend":"openai","to_model":"gpt-3.5-turbo"}]'

# Legacy format (deprecated)
REPLACEMENT_BACKEND_MODEL=qwen-oauth:qwen3-coder-plus

# Set number of turns to use replacement
REPLACEMENT_TURN_COUNT=3

# Allow replacement for oauth-auto backends
ALLOW_OAUTH_AUTO_REPLACEMENT=true

CLI Flags

--enable-replacement
--replacement-probability FLOAT
--random-model-replacement-from-to "<from>=<to>"  # Can be specified multiple times
--replacement-backend-model BACKEND:MODEL  # Deprecated: use --random-model-replacement-from-to instead
--replacement-turn-count N
--allow-oauth-auto-replacement  # Allow replacement for oauth-auto backends

Conditional Replacement Rules

The replacement feature supports conditional rules that specify when to replace (which models) and what to replace them with. Rules are evaluated in order, and the first matching rule is used.

Pattern Matching

Each rule has a from_pattern that matches against the original model:

Wildcard (*): Matches all models from any backend
```
from_pattern: "*"
```
Partial Match (model-name): Matches any model whose name contains the substring (case-sensitive)
```
from_pattern: "gpt-4"  # Matches "gpt-4", "gpt-4-turbo", "gpt-4o", etc.
```
Exact Match (backend:model): Matches a specific fully qualified model identifier
```
from_pattern: "openai:gpt-4"  # Only matches exactly "openai:gpt-4"
```

Rule Evaluation Order

Rules are evaluated in the order they are specified. Place more specific rules before wildcard rules:

replacement_rules:
  - from_pattern: "openai:gpt-4"  # Specific rule first
    to_backend: "anthropic"
    to_model: "claude-3-5-sonnet"
  - from_pattern: "gpt-4"  # Partial match (catches other gpt-4 variants)
    to_backend: "openai"
    to_model: "gpt-3.5-turbo"
  - from_pattern: "*"  # Wildcard last (catches everything else)
    to_backend: "qwen-oauth"
    to_model: "qwen3-coder-plus"

Usage Examples

Basic Usage

Enable replacement with 30% probability using a wildcard rule:

python -m src.core.cli \
  --default-backend openai \
  --enable-replacement \
  --replacement-probability 0.3 \
  --random-model-replacement-from-to "*=qwen-oauth:qwen3-coder-plus" \
  --replacement-turn-count 3

Conditional Replacement

Replace specific models with different targets:

python -m src.core.cli \
  --default-backend openai \
  --enable-replacement \
  --replacement-probability 0.3 \
  --random-model-replacement-from-to "gpt-4=openai:gpt-3.5-turbo" \
  --random-model-replacement-from-to "claude=anthropic:claude-3-haiku" \
  --random-model-replacement-from-to "*=qwen-oauth:qwen3-coder-plus" \
  --replacement-turn-count 3

High Probability Replacement

Use replacement frequently for testing:

python -m src.core.cli \
  --default-backend anthropic \
  --enable-replacement \
  --replacement-probability 0.8 \
  --random-model-replacement-from-to "*=openai:gpt-4" \
  --replacement-turn-count 5

Single-Turn Replacement

Replace for just one turn to get a quick alternative perspective:

python -m src.core.cli \
  --default-backend openai \
  --enable-replacement \
  --replacement-probability 0.5 \
  --random-model-replacement-from-to "*=anthropic:claude-3-5-sonnet" \
  --replacement-turn-count 1

Using Environment Variables

Set up replacement via environment variables:

export REPLACEMENT_ENABLED=true
export REPLACEMENT_PROBABILITY=0.3
export REPLACEMENT_RULES='[{"from_pattern":"*","to_backend":"qwen-oauth","to_model":"qwen3-coder-plus"}]'
export REPLACEMENT_TURN_COUNT=3

python -m src.core.cli --default-backend openai

Use Cases

Development and Testing

Test your application with multiple models to ensure compatibility:

replacement:
  enabled: true
  probability: 0.5  # 50% of sessions use alternative model
  replacement_rules:
    - from_pattern: "*"
      to_backend: "anthropic"
      to_model: "claude-3-5-sonnet"
  turn_count: 10  # Use alternative for extended testing

Cost Optimization

Probabilistically route expensive models to more cost-effective alternatives:

python -m src.core.cli \
  --default-backend openai \
  --default-model gpt-4 \
  --enable-replacement \
  --replacement-probability 0.4 \
  --random-model-replacement-from-to "gpt-4=openai:gpt-3.5-turbo" \
  --random-model-replacement-from-to "claude-3-5-sonnet=anthropic:claude-3-haiku" \
  --replacement-turn-count 5

Problem-Solving Resilience

When a model struggles, automatically try an alternative:

replacement:
  enabled: true
  probability: 0.3
  replacement_rules:
    - from_pattern: "*"
      to_backend: "qwen-oauth"
      to_model: "qwen3-coder-plus"
  turn_count: 3  # Give alternative model a few turns to help

Model Comparison

Compare behaviors across different models:

# Run multiple sessions to see how different models handle the same tasks
python -m src.core.cli \
  --default-backend anthropic \
  --enable-replacement \
  --replacement-probability 0.5 \
  --random-model-replacement-from-to "*=openai:gpt-4" \
  --replacement-turn-count 5

Selective Model Replacement

Replace only specific models while leaving others unchanged:

replacement:
  enabled: true
  probability: 0.3
  replacement_rules:
    # Only replace GPT-4 models with GPT-3.5
    - from_pattern: "gpt-4"
      to_backend: "openai"
      to_model: "gpt-3.5-turbo"
    # Other models are not replaced (no wildcard rule)
  turn_count: 3

Behavior Details

When Replacement Activates

Session Start
    |
    v
Check Probability --> Random < Threshold? --> Yes --> Activate Replacement
    |                                              |
    No                                             v
    |                                         Use Replacement Model
    v                                              |
Use Original Model                                 v
                                              Decrement Turn Counter
                                                   |
                                                   v
                                            Counter = 0? --> Yes --> Deactivate
                                                   |
                                                   No
                                                   |
                                                   v
                                            Continue Replacement

At the start of each session (when replacement is not already active), the system generates a random number between 0.0 and 1.0
If the random number is less than the configured probability, replacement activates
Once activated, all requests for that session use the replacement backend:model
The turn counter decrements after each completed turn
When the counter reaches 0, the system deactivates replacement and returns to the original model

Replacement State Management

Per-Session State: Each session maintains independent replacement state
Turn Counting: The system tracks remaining turns for each active replacement
Automatic Deactivation: Replacement automatically deactivates when the turn counter expires
State Persistence: Replacement state is maintained across the session lifetime

Opt-Out Mechanisms

Header-Based Opt-Out

Disable replacement for a specific request:

curl -X POST http://localhost:8000/v1/chat/completions \
  -H "X-Disable-Replacement: true" \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt-4", "messages": [...]}'

Session-Level Opt-Out

Disable replacement for an entire session programmatically via the replacement service API.

Configuration Parameters

Core Settings

enabled: Enable or disable the replacement feature (default: false)
probability: Probability (0.0-1.0) of triggering replacement (default: 0.0)
- 0.0 = Never replace
- 0.3 = 30% chance of replacement
- 1.0 = Always replace
replacement_rules: List of conditional replacement rules (required when enabled)
- Each rule specifies from_pattern, to_backend, and to_model
- Rules are evaluated in order; first matching rule is used
- At least one rule is required when enabled
backend_model: Deprecated - Legacy format for backward compatibility. Automatically converted to a wildcard replacement rule if replacement_rules is empty.
turn_count: Number of consecutive turns to use replacement model (default: 1, minimum: 1)
allow_oauth_auto_replacement: Allow replacement for multi-account oauth-auto backends (default: false). Enabling this is recommended for "diversity of views" research but may bypass capacity protections.

Replacement Rule Format

Each replacement rule has three fields:

from_pattern: Pattern to match against original models
- "*" - Wildcard (matches all models)
- "model-name" - Partial match (substring in model name)
- "backend:model" - Exact match (fully qualified identifier)
to_backend: Target backend identifier (e.g., "qwen-oauth", "openai")
to_model: Target model identifier (e.g., "qwen3-coder-plus", "gpt-3.5-turbo")

Validation Rules

The system validates configuration at startup:

probability must be between 0.0 and 1.0 inclusive
replacement_rules must be a non-empty list when enabled
Each rule's to_backend and to_model must be non-empty strings
Each rule's to_backend:to_model must be in format "backend:model"
to_backend specified in each rule must be registered in the backend registry
turn_count must be a positive integer (>= 1)
backend_model (legacy) must be in format "backend:model" with exactly one colon if provided

Feature Compatibility

Works With

Tool Filtering: Tool filtering is applied to replacement models
Wire Capture: Both original and replacement requests/responses are captured
Usage Accounting: Usage is correctly attributed to the actual model used
Agent Configuration: Agent settings are preserved when using replacement models
Streaming: Full support for streaming responses
Command Processing: Replacement logic executes after command processing

Execution Order

Replacement logic executes in the request processing pipeline:

Session resolution
Command prefix processing
Model replacement evaluation (this feature)
Backend request routing
Response processing

Troubleshooting

Replacement Not Activating

Problem: Replacement never activates even though it's enabled.

Solutions:

Verify enabled is set to true
Check that probability is greater than 0.0
Ensure backend_model is configured correctly (format: backend:model)
Check logs for probability evaluation messages (DEBUG level)
Verify the replacement backend is registered and available

Replacement Backend Not Found

Problem: Configuration validation fails with "backend not registered" error.

Solutions:

Verify the backend name in backend_model matches a registered backend
Check your backend configuration in the backends section
Ensure required API keys are set for the replacement backend
Review available backends in your configuration

Replacement Not Deactivating

Problem: Replacement continues beyond the configured turn count.

Solutions:

Verify turn_count is set correctly
Check logs for turn completion messages
Ensure requests are completing successfully (errors may prevent turn counting)
Review session state to confirm turn counter is decrementing

Opt-Out Not Working

Problem: Replacement still activates despite opt-out header.

Solutions:

Verify header name is exactly X-Disable-Replacement (case-insensitive)
Ensure header value is true (string)
Check that the header is being passed through any proxies or load balancers
Review logs for opt-out detection messages (DEBUG level)

Conflicts with Other Features

Problem: Replacement conflicts with other model override features.

Solutions:

Check if Planning Phase is also enabled and potentially conflicting
Review the order of middleware in the request pipeline
Consider the precedence of different model override mechanisms
Check logs for multiple override attempts

Logging

The replacement service logs key events at different levels:

INFO Level

Service initialization with configuration summary
Replacement activation (session_id, original model, replacement model, turn count)
Replacement deactivation (session_id, return to original model)
Session-level opt-out actions

DEBUG Level

Probability evaluation (session_id, random value, threshold, result)
Routing decisions (session_id, effective backend:model)
Header-based opt-out detection

Related Features

Planning Phase - Use strong models for initial planning
Model Name Rewrites - Dynamically rewrite model names using regex rules
Hybrid Backend - Use different backends for different phases
Session Management - Intelligent session state management
URI Model Parameters - Override model parameters via URI

Implementation Details

Architecture

The replacement feature is implemented as a service (ModelReplacementService) that integrates into the request processing pipeline. It maintains per-session state and uses probability-based decision making to determine when to activate replacement.

State Management

Replacement state is stored per session in memory
State includes: active flag, turns remaining, original model, replacement model
State is automatically cleaned up when sessions end

Thread Safety

The replacement service uses asyncio locks to ensure thread-safe state updates in concurrent environments.

Performance

Minimal overhead: O(1) state lookup per request
Efficient random number generation
No blocking operations in the critical path

FilesExpand file tree

random-model-replacement.md

Latest commit

History

random-model-replacement.md

File metadata and controls

Random Model/Backend Replacement

Overview

Related Documentation

Key Features

Why This Feature Is Useful

Configuration

YAML Configuration

Environment Variables

CLI Flags

Conditional Replacement Rules

Pattern Matching

Rule Evaluation Order

Usage Examples

Basic Usage

Conditional Replacement

High Probability Replacement

Single-Turn Replacement

Using Environment Variables

Use Cases

Development and Testing

Cost Optimization

Problem-Solving Resilience

Model Comparison

Selective Model Replacement

Behavior Details

When Replacement Activates

Replacement State Management

Opt-Out Mechanisms

Header-Based Opt-Out

Session-Level Opt-Out

Configuration Parameters

Core Settings

Replacement Rule Format

Validation Rules

Feature Compatibility

Works With

Execution Order

Troubleshooting

Replacement Not Activating

Replacement Backend Not Found

Replacement Not Deactivating

Opt-Out Not Working

Conflicts with Other Features

Logging

INFO Level

DEBUG Level

Related Features

Implementation Details

Architecture

State Management

Thread Safety

Performance