This guide covers the testing strategy, test execution, and best practices for the LLM Interactive Proxy.
The project follows Test-Driven Development (TDD) principles:
- Write tests first
- Run tests to confirm they fail
- Implement minimal code to pass tests
- Refactor while keeping tests green
- Repeat
Tests are organized by type and mirror the source code structure:
tests/
├── unit/ # Unit tests (fast, isolated)
├── integration/ # Integration tests (slower, multiple components)
├── property/ # Property-based tests (Hypothesis)
├── regression/ # Regression tests (bug fixes)
├── simulation/ # Simulation tests (wire capture replay)
├── behavior/ # Behavior tests (end-to-end scenarios)
├── performance/ # Performance tests (benchmarks)
├── fixtures/ # Test fixtures
├── mocks/ # Mock objects
├── helpers/ # Test helpers
└── conftest.py # Pytest configuration
# Run all tests
./.venv/Scripts/python.exe -m pytest
# Run with verbose output
./.venv/Scripts/python.exe -m pytest -v
# Run with coverage
./.venv/Scripts/python.exe -m pytest --cov=src --cov-report=htmlThis project enables pytest-testmon by default, so repeated pytest runs only execute tests affected by your code changes.
# Force a full run (still updates testmon data)
./.venv/Scripts/python.exe -m pytest --testmon-noselect
# Disable testmon entirely for a run
./.venv/Scripts/python.exe -m pytest -p no:testmon
# Include default-excluded suites without using `-m ...`
./.venv/Scripts/python.exe -m pytest --run-slow
./.venv/Scripts/python.exe -m pytest --run-codex# Run only unit tests
./.venv/Scripts/python.exe -m pytest tests/unit/
# Run only integration tests
./.venv/Scripts/python.exe -m pytest tests/integration/
# Run only property-based tests
./.venv/Scripts/python.exe -m pytest tests/property/# Run specific test file
./.venv/Scripts/python.exe -m pytest tests/unit/test_file.py
# Run specific test function
./.venv/Scripts/python.exe -m pytest tests/unit/test_file.py::test_function_name
# Run tests matching pattern
./.venv/Scripts/python.exe -m pytest -k "test_pattern"# Stop on first failure
./.venv/Scripts/python.exe -m pytest -x
# Show local variables on failure
./.venv/Scripts/python.exe -m pytest -l
# Run tests in parallel (requires pytest-xdist)
./.venv/Scripts/python.exe -m pytest -n auto
# Show slowest tests
./.venv/Scripts/python.exe -m pytest --durations=10Unit tests verify individual functions, classes, or modules in isolation.
Characteristics:
- Fast execution (< 1 second per test)
- No external dependencies
- Use mocks for dependencies
- Test single responsibility
Example:
def test_model_name_rewrite():
"""Test that model name rewriting works correctly."""
rewriter = ModelNameRewriter(rules=[
{"pattern": "gpt-4", "replacement": "gpt-4-turbo"}
])
result = rewriter.rewrite("gpt-4")
assert result == "gpt-4-turbo"Location: tests/unit/
Integration tests verify that multiple components work together correctly.
Characteristics:
- Slower execution (1-10 seconds per test)
- May use real dependencies
- Test component interactions
- May require setup/teardown
Example:
@pytest.mark.asyncio
async def test_backend_connector_integration():
"""Test that backend connector integrates with HTTP client."""
connector = OpenAIConnector(api_key="test-key")
async with httpx.AsyncClient() as client:
response = await connector.chat_completion(
client=client,
messages=[{"role": "user", "content": "Hello"}]
)
assert response.choices[0].message.contentLocation: tests/integration/
Property-based tests use Hypothesis to generate test cases and verify universal properties.
Characteristics:
- Generates many test cases automatically
- Finds edge cases you might miss
- Tests properties that should always hold
- Provides minimal failing examples
Example:
from hypothesis import given, strategies as st
@given(st.text())
def test_api_key_redaction_property(api_key: str):
"""Test that API key redaction works for any string."""
redacted = redact_api_key(api_key)
# Property: redacted string should not contain original key
assert api_key not in redacted or len(api_key) < 8Location: tests/property/
Regression tests verify that previously fixed bugs don't reoccur.
Characteristics:
- Test specific bug scenarios
- Include issue/PR references
- Prevent regressions
- Document historical issues
Example:
def test_issue_123_model_override_bug():
"""
Regression test for issue #123.
Bug: Model override was not applied when using force-model flag.
Fixed in PR #124.
"""
config = Config(force_model="gpt-4")
request = ChatRequest(model="gpt-3.5-turbo")
resolved_model = resolve_model(request, config)
assert resolved_model == "gpt-4"Location: tests/regression/
Simulation tests replay captured wire traffic to verify behavior.
Characteristics:
- Use real captured traffic
- Test end-to-end flows
- Verify protocol compatibility
- Useful for debugging
Example:
def test_replay_openai_request():
"""Test replaying captured OpenAI request."""
capture = load_capture("var/wire_captures_cbor/openai_chat.cbor")
response = replay_request(capture.entries[0])
assert response.status_code == 200Location: tests/simulation/
Fixtures provide reusable test data and setup:
# conftest.py
import pytest
@pytest.fixture
def sample_chat_request():
"""Provide a sample chat request for tests."""
return {
"model": "gpt-4",
"messages": [
{"role": "user", "content": "Hello"}
]
}
@pytest.fixture
async def http_client():
"""Provide an HTTP client for tests."""
async with httpx.AsyncClient() as client:
yield clientUsage:
def test_with_fixture(sample_chat_request):
"""Test using fixture."""
assert sample_chat_request["model"] == "gpt-4"Use mocks to isolate components and control dependencies:
from unittest.mock import Mock, patch
def test_with_mock():
"""Test using mock."""
mock_client = Mock()
mock_client.post.return_value = Mock(status_code=200)
connector = OpenAIConnector(api_key="test")
response = connector.send_request(mock_client, data={})
assert response.status_code == 200
mock_client.post.assert_called_once()Patching:
@patch('src.connectors.openai.httpx.AsyncClient')
async def test_with_patch(mock_client_class):
"""Test using patch."""
mock_client = Mock()
mock_client_class.return_value.__aenter__.return_value = mock_client
connector = OpenAIConnector(api_key="test")
await connector.chat_completion(messages=[])
mock_client.post.assert_called()# Run tests with coverage
./.venv/Scripts/python.exe -m pytest --cov=src
# Generate HTML coverage report
./.venv/Scripts/python.exe -m pytest --cov=src --cov-report=html
# View coverage report
# Open htmlcov/index.html in browser- Overall: Aim for 80%+ coverage
- Core Logic: Aim for 90%+ coverage
- Critical Paths: Aim for 100% coverage
- UI/CLI: 60%+ coverage acceptable
Some code is excluded from coverage requirements:
- Debug code
- Type checking blocks (
if TYPE_CHECKING:) - Abstract methods
- Platform-specific code
- One Assertion Per Test: Each test should verify one thing
- Descriptive Names: Test names should describe what they test
- Arrange-Act-Assert: Structure tests clearly
- Independent Tests: Tests should not depend on each other
- Fast Tests: Keep unit tests fast (< 1 second)
# Good test names
def test_model_rewrite_applies_first_matching_rule():
pass
def test_api_key_validation_rejects_empty_string():
pass
def test_streaming_response_yields_chunks_in_order():
pass
# Bad test names
def test_model():
pass
def test_api_key():
pass
def test_streaming():
passdef test_example():
# Arrange: Set up test data and dependencies
config = Config(force_model="gpt-4")
request = ChatRequest(model="gpt-3.5-turbo")
# Act: Execute the code being tested
result = resolve_model(request, config)
# Assert: Verify the result
assert result == "gpt-4"Use pytest.mark.asyncio for async tests:
import pytest
@pytest.mark.asyncio
async def test_async_function():
"""Test async function."""
result = await some_async_function()
assert result == expected_value# Run with Python debugger
./.venv/Scripts/python.exe -m pytest --pdb
# Drop into debugger on failure
./.venv/Scripts/python.exe -m pytest --pdb --pdbcls=IPython.terminal.debugger:Pdbdef test_with_debug_output():
"""Test with debug output."""
result = some_function()
print(f"Debug: result = {result}") # Will show in pytest output with -s
assert result == expected# Show print statements
./.venv/Scripts/python.exe -m pytest -simport logging
def test_with_logging(caplog):
"""Test with logging."""
with caplog.at_level(logging.DEBUG):
some_function_that_logs()
assert "Expected log message" in caplog.textTests run automatically on every push via GitHub Actions:
- Unit Tests: Run on every commit
- Integration Tests: Run on every commit
- Coverage: Tracked with Codecov
- Linting: Ruff checks on every commit
- Type Checking: Mypy checks on every commit
See .github/workflows/ci.yml for CI configuration.
-
Write Failing Test:
# Create test file # tests/unit/test_new_feature.py def test_new_feature(): result = new_feature() assert result == expected
-
Run Test (Should Fail):
./.venv/Scripts/python.exe -m pytest tests/unit/test_new_feature.py::test_new_feature
-
Implement Feature:
# src/module.py def new_feature(): return expected
-
Run Test (Should Pass):
./.venv/Scripts/python.exe -m pytest tests/unit/test_new_feature.py::test_new_feature
-
Run All Tests:
./.venv/Scripts/python.exe -m pytest
-
Check Coverage:
./.venv/Scripts/python.exe -m pytest --cov=src
- Check Python version matches CI
- Check dependency versions
- Check for platform-specific code
- Check for timing issues
- Check for test interdependencies
- Check for file system assumptions
- Check for network dependencies
- Check for race conditions
- Profile tests with
--durations=10 - Move slow tests to integration suite
- Use mocks to avoid real I/O
- Run tests in parallel with
-n auto
- Identify with
pytest --count=100 - Check for race conditions
- Check for timing assumptions
- Add retries for network tests
- Building: See building.md for build instructions
- Contributing: See contributing.md for contribution workflow
- Code Organization: See code-organization.md for project structure
- Coding Standards: See AGENTS.md for coding standards and testing guidelines