Testing

This guide covers the testing strategy, test execution, and best practices for the LLM Interactive Proxy.

Testing Philosophy

The project follows Test-Driven Development (TDD) principles:

Write tests first
Run tests to confirm they fail
Implement minimal code to pass tests
Refactor while keeping tests green
Repeat

Test Structure

Tests are organized by type and mirror the source code structure:

tests/
├── unit/                   # Unit tests (fast, isolated)
├── integration/            # Integration tests (slower, multiple components)
├── property/               # Property-based tests (Hypothesis)
├── regression/             # Regression tests (bug fixes)
├── simulation/             # Simulation tests (wire capture replay)
├── behavior/               # Behavior tests (end-to-end scenarios)
├── performance/            # Performance tests (benchmarks)
├── fixtures/               # Test fixtures
├── mocks/                  # Mock objects
├── helpers/                # Test helpers
└── conftest.py             # Pytest configuration

Running Tests

All Tests

# Run all tests
./.venv/Scripts/python.exe -m pytest

# Run with verbose output
./.venv/Scripts/python.exe -m pytest -v

# Run with coverage
./.venv/Scripts/python.exe -m pytest --cov=src --cov-report=html

Incremental Runs (testmon)

This project enables pytest-testmon by default, so repeated pytest runs only execute tests affected by your code changes.

# Force a full run (still updates testmon data)
./.venv/Scripts/python.exe -m pytest --testmon-noselect

# Disable testmon entirely for a run
./.venv/Scripts/python.exe -m pytest -p no:testmon

# Include default-excluded suites without using `-m ...`
./.venv/Scripts/python.exe -m pytest --run-slow
./.venv/Scripts/python.exe -m pytest --run-codex

Specific Test Types

# Run only unit tests
./.venv/Scripts/python.exe -m pytest tests/unit/

# Run only integration tests
./.venv/Scripts/python.exe -m pytest tests/integration/

# Run only property-based tests
./.venv/Scripts/python.exe -m pytest tests/property/

Specific Test Files or Functions

# Run specific test file
./.venv/Scripts/python.exe -m pytest tests/unit/test_file.py

# Run specific test function
./.venv/Scripts/python.exe -m pytest tests/unit/test_file.py::test_function_name

# Run tests matching pattern
./.venv/Scripts/python.exe -m pytest -k "test_pattern"

Test Options

# Stop on first failure
./.venv/Scripts/python.exe -m pytest -x

# Show local variables on failure
./.venv/Scripts/python.exe -m pytest -l

# Run tests in parallel (requires pytest-xdist)
./.venv/Scripts/python.exe -m pytest -n auto

# Show slowest tests
./.venv/Scripts/python.exe -m pytest --durations=10

Test Types

Unit Tests

Unit tests verify individual functions, classes, or modules in isolation.

Characteristics:

Fast execution (< 1 second per test)
No external dependencies
Use mocks for dependencies
Test single responsibility

Example:

def test_model_name_rewrite():
    """Test that model name rewriting works correctly."""
    rewriter = ModelNameRewriter(rules=[
        {"pattern": "gpt-4", "replacement": "gpt-4-turbo"}
    ])
    
    result = rewriter.rewrite("gpt-4")
    
    assert result == "gpt-4-turbo"

Location: tests/unit/

Integration Tests

Integration tests verify that multiple components work together correctly.

Characteristics:

Slower execution (1-10 seconds per test)
May use real dependencies
Test component interactions
May require setup/teardown

Example:

@pytest.mark.asyncio
async def test_backend_connector_integration():
    """Test that backend connector integrates with HTTP client."""
    connector = OpenAIConnector(api_key="test-key")
    
    async with httpx.AsyncClient() as client:
        response = await connector.chat_completion(
            client=client,
            messages=[{"role": "user", "content": "Hello"}]
        )
    
    assert response.choices[0].message.content

Location: tests/integration/

Property-Based Tests

Property-based tests use Hypothesis to generate test cases and verify universal properties.

Characteristics:

Generates many test cases automatically
Finds edge cases you might miss
Tests properties that should always hold
Provides minimal failing examples

Example:

from hypothesis import given, strategies as st

@given(st.text())
def test_api_key_redaction_property(api_key: str):
    """Test that API key redaction works for any string."""
    redacted = redact_api_key(api_key)
    
    # Property: redacted string should not contain original key
    assert api_key not in redacted or len(api_key) < 8

Location: tests/property/

Regression Tests

Regression tests verify that previously fixed bugs don't reoccur.

Characteristics:

Test specific bug scenarios
Include issue/PR references
Prevent regressions
Document historical issues

Example:

def test_issue_123_model_override_bug():
    """
    Regression test for issue #123.
    
    Bug: Model override was not applied when using force-model flag.
    Fixed in PR #124.
    """
    config = Config(force_model="gpt-4")
    request = ChatRequest(model="gpt-3.5-turbo")
    
    resolved_model = resolve_model(request, config)
    
    assert resolved_model == "gpt-4"

Location: tests/regression/

Simulation Tests

Simulation tests replay captured wire traffic to verify behavior.

Characteristics:

Use real captured traffic
Test end-to-end flows
Verify protocol compatibility
Useful for debugging

Example:

def test_replay_openai_request():
    """Test replaying captured OpenAI request."""
    capture = load_capture("var/wire_captures_cbor/openai_chat.cbor")
    
    response = replay_request(capture.entries[0])
    
    assert response.status_code == 200

Location: tests/simulation/

Test Fixtures

Fixtures provide reusable test data and setup:

# conftest.py
import pytest

@pytest.fixture
def sample_chat_request():
    """Provide a sample chat request for tests."""
    return {
        "model": "gpt-4",
        "messages": [
            {"role": "user", "content": "Hello"}
        ]
    }

@pytest.fixture
async def http_client():
    """Provide an HTTP client for tests."""
    async with httpx.AsyncClient() as client:
        yield client

Usage:

def test_with_fixture(sample_chat_request):
    """Test using fixture."""
    assert sample_chat_request["model"] == "gpt-4"

Mocking

Use mocks to isolate components and control dependencies:

from unittest.mock import Mock, patch

def test_with_mock():
    """Test using mock."""
    mock_client = Mock()
    mock_client.post.return_value = Mock(status_code=200)
    
    connector = OpenAIConnector(api_key="test")
    response = connector.send_request(mock_client, data={})
    
    assert response.status_code == 200
    mock_client.post.assert_called_once()

Patching:

@patch('src.connectors.openai.httpx.AsyncClient')
async def test_with_patch(mock_client_class):
    """Test using patch."""
    mock_client = Mock()
    mock_client_class.return_value.__aenter__.return_value = mock_client
    
    connector = OpenAIConnector(api_key="test")
    await connector.chat_completion(messages=[])
    
    mock_client.post.assert_called()

Test Coverage

Measuring Coverage

# Run tests with coverage
./.venv/Scripts/python.exe -m pytest --cov=src

# Generate HTML coverage report
./.venv/Scripts/python.exe -m pytest --cov=src --cov-report=html

# View coverage report
# Open htmlcov/index.html in browser

Coverage Goals

Overall: Aim for 80%+ coverage
Core Logic: Aim for 90%+ coverage
Critical Paths: Aim for 100% coverage
UI/CLI: 60%+ coverage acceptable

Coverage Exclusions

Some code is excluded from coverage requirements:

Debug code
Type checking blocks (if TYPE_CHECKING:)
Abstract methods
Platform-specific code

Test Best Practices

Writing Good Tests

One Assertion Per Test: Each test should verify one thing
Descriptive Names: Test names should describe what they test
Arrange-Act-Assert: Structure tests clearly
Independent Tests: Tests should not depend on each other
Fast Tests: Keep unit tests fast (< 1 second)

Test Naming

# Good test names
def test_model_rewrite_applies_first_matching_rule():
    pass

def test_api_key_validation_rejects_empty_string():
    pass

def test_streaming_response_yields_chunks_in_order():
    pass

# Bad test names
def test_model():
    pass

def test_api_key():
    pass

def test_streaming():
    pass

Arrange-Act-Assert Pattern

def test_example():
    # Arrange: Set up test data and dependencies
    config = Config(force_model="gpt-4")
    request = ChatRequest(model="gpt-3.5-turbo")
    
    # Act: Execute the code being tested
    result = resolve_model(request, config)
    
    # Assert: Verify the result
    assert result == "gpt-4"

Async Tests

Use pytest.mark.asyncio for async tests:

import pytest

@pytest.mark.asyncio
async def test_async_function():
    """Test async function."""
    result = await some_async_function()
    assert result == expected_value

Debugging Tests

Running Tests in Debug Mode

# Run with Python debugger
./.venv/Scripts/python.exe -m pytest --pdb

# Drop into debugger on failure
./.venv/Scripts/python.exe -m pytest --pdb --pdbcls=IPython.terminal.debugger:Pdb

Print Debugging

def test_with_debug_output():
    """Test with debug output."""
    result = some_function()
    print(f"Debug: result = {result}")  # Will show in pytest output with -s
    assert result == expected

# Show print statements
./.venv/Scripts/python.exe -m pytest -s

Logging in Tests

import logging

def test_with_logging(caplog):
    """Test with logging."""
    with caplog.at_level(logging.DEBUG):
        some_function_that_logs()
    
    assert "Expected log message" in caplog.text

Continuous Integration

Tests run automatically on every push via GitHub Actions:

Unit Tests: Run on every commit
Integration Tests: Run on every commit
Coverage: Tracked with Codecov
Linting: Ruff checks on every commit
Type Checking: Mypy checks on every commit

See .github/workflows/ci.yml for CI configuration.

Test Development Workflow

Write Failing Test:

# Create test file
# tests/unit/test_new_feature.py

def test_new_feature():
    result = new_feature()
    assert result == expected

Run Test (Should Fail):

./.venv/Scripts/python.exe -m pytest tests/unit/test_new_feature.py::test_new_feature

Implement Feature:

# src/module.py

def new_feature():
    return expected

Run Test (Should Pass):

./.venv/Scripts/python.exe -m pytest tests/unit/test_new_feature.py::test_new_feature

Run All Tests:
```
./.venv/Scripts/python.exe -m pytest
```

Check Coverage:

./.venv/Scripts/python.exe -m pytest --cov=src

Troubleshooting

Tests Failing Locally But Passing in CI

Check Python version matches CI
Check dependency versions
Check for platform-specific code
Check for timing issues

Tests Passing Locally But Failing in CI

Check for test interdependencies
Check for file system assumptions
Check for network dependencies
Check for race conditions

Slow Tests

Profile tests with --durations=10
Move slow tests to integration suite
Use mocks to avoid real I/O
Run tests in parallel with -n auto

Flaky Tests

Identify with pytest --count=100
Check for race conditions
Check for timing assumptions
Add retries for network tests

FilesExpand file tree

testing.md

Latest commit

History

testing.md

File metadata and controls

Testing

Testing Philosophy

Test Structure

Running Tests

All Tests

Incremental Runs (testmon)

Specific Test Types

Specific Test Files or Functions

Test Options

Test Types

Unit Tests

Integration Tests

Property-Based Tests

Regression Tests

Simulation Tests

Test Fixtures

Mocking

Test Coverage

Measuring Coverage

Coverage Goals

Coverage Exclusions

Test Best Practices

Writing Good Tests

Test Naming

Arrange-Act-Assert Pattern

Async Tests

Debugging Tests

Running Tests in Debug Mode

Print Debugging

Logging in Tests

Continuous Integration

Test Development Workflow

Troubleshooting

Tests Failing Locally But Passing in CI

Tests Passing Locally But Failing in CI

Slow Tests

Flaky Tests

Related Documentation