We welcome contributions to the LLM Interactive Proxy! This guide provides an overview of the development workflow, architectural guidelines, and best practices for contributing to the project.
-
Clone the repository:
git clone https://github.com/matdev83/llm-interactive-proxy.git cd llm-interactive-proxy -
Create a virtual environment:
python -m venv .venv
-
Activate the virtual environment:
- Windows:
.\.venv\Scripts\activate - Unix:
source .venv/bin/activate
- Windows:
-
Install dependencies:
pip install -e .[dev]
-
Create a
.envfile: With your API keys (see Configuration Guide for details).
# Run with default settings
python -m src.core.cli
# Run with custom configuration
python -m src.core.cli --config path/to/config.yaml
# Run with different backends
python -m src.core.cli --default-backend openrouter
python -m src.core.cli --default-backend gemini
python -m src.core.cli --default-backend gemini-oauth-plan
python -m src.core.cli --default-backend gemini-oauth-free
python -m src.core.cli --default-backend anthropic# Run all tests
python -m pytest
# Run specific test file
python -m pytest tests/unit/test_backend_service.py
# Run with coverage
python -m pytest --cov=srcTo improve safety without breaking default behavior, several strict/diagnostic modes are available via environment variables. These are OFF by default and only change behavior when explicitly enabled:
STRICT_CONTROLLER_ERRORS(also honorsSTRICT_CONTROLLER_DI):- When enabled, controller dependency resolution raises
ServiceResolutionErrorinstead of returning HTTP 503/500 fallbacks.
- When enabled, controller dependency resolution raises
STRICT_PERSISTENCE_ERRORS:- When enabled, persistence operations raise
ConfigurationError/ServiceResolutionErrorfor I/O/DI issues instead of only logging.
- When enabled, persistence operations raise
STRICT_SERVICES_ERRORS:- When enabled, selected services raise on internal failures that are otherwise logged and ignored (e.g.,
AppSettingsstate access).
- When enabled, selected services raise on internal failures that are otherwise logged and ignored (e.g.,
DI_STRICT_DIAGNOSTICS:- When enabled, the DI layer emits diagnostic logs via logger
llm.difor missing registrations and provider builds.
- When enabled, the DI layer emits diagnostic logs via logger
Example (Windows PowerShell):
$env:STRICT_CONTROLLER_ERRORS = "true"
$env:STRICT_PERSISTENCE_ERRORS = "true"
$env:STRICT_SERVICES_ERRORS = "true"
$env:DI_STRICT_DIAGNOSTICS = "true"
python -m pytest -qNote: The default test suite runs with these flags disabled to preserve current behavior. Targeted tests may set flags to verify strict-mode surfaces.
# Run ruff
python -m ruff check src
# Run black
python -m black src
# Run mypy
python -m mypy srcThe proxy centralizes exception handling so transports remain thin and domain-centric:
- DomainExceptionMiddleware translates
LLMProxyErrorsubclasses to HTTP JSON:{ "error": { "message": str, "type": str, "code?": str, "details?": any } }with the exceptionstatus_code. - FastAPI exception handlers map common third-party errors:
- Upstream connectivity (httpx) ->
503 Service Unavailable. - Malformed JSON ->
400 Bad Request. - Pydantic validation ->
422 Unprocessable Entitywithdetails.
- Upstream connectivity (httpx) ->
- Registration is done in
src/core/app/middleware_config.py.
- The DI wiring in
src/core/di/services.pycan enable a strategy-based failover plan when the application state flag is set:- Flag:
IApplicationState.get_use_failover_strategy()(e.g., viaPROXY_USE_FAILOVER_STRATEGY=true). - Default: false (uses coordinator-provided attempts).
- When true and a coordinator is available, a
DefaultFailoverStrategyis injected to compute the plan.
- Flag:
- Constants in
src/core/constants/are not considered public API unless called out in user documentation or tests. - We actively trim unused constants to reduce the public surface and avoid accidental coupling. Prefer domain models or enums over string constants.
- If you introduce a new constant intended for external use, document it in README and reference it from tests.
The project includes a comprehensive DI container usage scanner that analyzes the codebase for violations of dependency injection principles.
# Run the full DI violation test suite (shows concise warnings by default)
python -m pytest tests/unit/test_di_container_usage.py -v
# Run just the violation detection (shows concise warning + detailed report)
python -m pytest tests/unit/test_di_container_usage.py::TestDIContainerUsage::test_di_container_violations_are_detected -v -s
# Run with coverage to see scanner effectiveness
python -m pytest tests/unit/test_di_container_usage.py --cov=src --cov-report=term-missingThe scanner identifies violations where services are manually instantiated instead of using the DI container:
- Manual Service Instantiation: Direct instantiation of service classes (e.g.,
BackendService(),CommandProcessor()) - Controller Violations: Controllers creating service instances directly
- Factory Function Issues: Factory functions that don't use the DI container properly
- Business Logic Violations: Business logic manually creating dependencies
Concise Summary (Default - Always Visible):
[!] DI CONTAINER VIOLATIONS DETECTED: 61 violations in 14 files.
Most affected: core\di\services.py: 15, core\app\controllers\chat_controller.py: 8, core\app\controllers\anthropic_controller.py: 6.
Use -s flag for detailed report | Fix with IServiceProvider.get_required_service()
Detailed Report (With -s Flag):
🎯 DI Container Scanner Results:
📊 Total violations found: 61
[FOLDER] Files with violations: 14
[CLIPBOARD] Violation types:
- manual_service_instantiation: 61
[FOLDER] Top affected files:
- core\di\services.py: 15 violations
- core\app\controllers\chat_controller.py: 8 violations
[X] Bad (Violation):
def handle_request(self, request):
processor = CommandProcessor(self.config) # VIOLATION!
return processor.process(request)[OK] Good (Fixed):
def __init__(self, command_processor: ICommandProcessor):
self.command_processor = command_processor
def handle_request(self, request):
return self.command_processor.process(request) # CORRECT- Run the DI scanner regularly during development
- Address violations as part of code reviews
- Use the scanner output to identify areas needing DI improvements
- Focus on high-impact violations first (controllers, business logic)
- Use
IServiceProvider.get_required_service()for runtime resolution when needed
The LLM Interactive Proxy follows a clean architecture approach based on SOLID principles:
- Single Responsibility Principle: Each class has one responsibility.
- Open/Closed Principle: Open for extension, closed for modification.
- Liskov Substitution Principle: Subtypes must be substitutable for their base types.
- Interface Segregation Principle: Clients shouldn't depend on methods they don't use.
- Dependency Inversion Principle: High-level modules depend on abstractions, not concrete implementations.
- Interface Layer (
src/core/interfaces/): Defines contracts (abstract base classes) for services. - Domain Layer (
src/core/domain/): Contains business entities and value objects; implements domain logic using immutable models. - Application Layer (
src/core/app/): Orchestrates application flow, connects domain to infrastructure, contains controllers and middleware. - Service Layer (
src/core/services/): Implements business use cases, orchestrates domain objects, depends on interfaces. - Infrastructure Layer (
src/core/repositories/,src/connectors/): Implements interfaces, handles data storage and external services, provides adapters.
Define interfaces before implementations. Services interact through interfaces, enabling dependency inversion and clean testing.
Use a DI container to manage service dependencies, promoting loose coupling and easier testing.
Use immutable Pydantic models for core business entities to ensure data integrity and prevent accidental modifications. Use .model_copy() for modifications.
Use command handlers for processing interactive commands.
Use middleware for cross-cutting concerns like response processing.
Use repositories for data access operations.
Use factories for creating complex objects, such as backend instances.
The Tool Call Reactor system provides an event-driven architecture for reacting to tool calls from remote LLMs. This section guides you through implementing custom event handlers.
The Tool Call Reactor allows you to:
- Monitor tool calls from LLMs in real-time
- Steer LLM behavior by providing custom responses
- Apply rate limiting to prevent excessive steering
- Maintain session context across multiple requests
- Passive Event Receivers: Monitor tool calls without modifying responses
- Active Handlers: Can swallow tool calls and provide custom steering responses
Place your custom handlers in: src/core/services/tool_call_handlers/
Example directory structure:
src/core/services/tool_call_handlers/
├-- __init__.py
├-- apply_diff_handler.py # Built-in example
└-- your_custom_handler.py # Your new handler
from typing import Any
from src.core.interfaces.tool_call_reactor_interface import (
IToolCallHandler,
ToolCallContext,
ToolCallReactionResult
)
class YourCustomHandler(IToolCallHandler):
"""Custom handler for specific tool call scenarios."""
@property
def name(self) -> str:
return "your_custom_handler"
@property
def priority(self) -> int:
return 100 # Higher priority = processed first
async def can_handle(self, context: ToolCallContext) -> bool:
"""Check if this handler should process the tool call."""
# Your logic to determine if this handler applies
return context.tool_name == "your_target_tool"
async def handle(self, context: ToolCallContext) -> ToolCallReactionResult:
"""Process the tool call and return a reaction."""
# Your custom logic here
if should_swallow:
return ToolCallReactionResult(
should_swallow=True,
replacement_response="Your custom steering message",
metadata={"handler": self.name, "action": "steered"}
)
else:
return ToolCallReactionResult(
should_swallow=False,
replacement_response=None,
metadata={"handler": self.name, "action": "monitored"}
)Add your handler to the DI container in src/core/di/services.py:
# Add import at the top
from src.core.services.tool_call_handlers.your_custom_handler import YourCustomHandler
# In the services registration section:
def _tool_call_reactor_factory(provider: IServiceProvider) -> ToolCallReactorService:
"""Factory for creating the tool call reactor service."""
history_tracker = provider.get_required_service(InMemoryToolCallHistoryTracker)
reactor = ToolCallReactorService(history_tracker)
# Register built-in handlers
app_config: AppConfig = provider.get_required_service(AppConfig)
reactor_config = app_config.session.tool_call_reactor
if reactor_config.enabled and reactor_config.apply_diff_steering_enabled:
apply_diff_handler = ApplyDiffHandler(
history_tracker=history_tracker,
rate_limit_window_seconds=reactor_config.apply_diff_steering_rate_limit_seconds,
steering_message=reactor_config.apply_diff_steering_message,
)
await reactor.register_handler(apply_diff_handler)
# Register your custom handler
if reactor_config.enabled and reactor_config.your_custom_handler_enabled:
your_handler = YourCustomHandler(
# Pass any dependencies your handler needs
history_tracker=history_tracker
)
await reactor.register_handler(your_handler)
return reactorExtend the configuration in src/core/config/app_config.py:
class ToolCallReactorConfig(DomainModel):
"""Configuration for the Tool Call Reactor system."""
enabled: bool = True
apply_diff_steering_enabled: bool = True
apply_diff_steering_rate_limit_seconds: int = 60
apply_diff_steering_message: str | None = None
# Add your custom handler configuration
your_custom_handler_enabled: bool = True
your_custom_handler_rate_limit_seconds: int = 30
your_custom_handler_message: str | None = NoneUpdate config/sample.env with your handler's configuration:
# Your Custom Handler Settings
YOUR_CUSTOM_HANDLER_ENABLED=true
YOUR_CUSTOM_HANDLER_RATE_LIMIT_SECONDS=30The built-in ApplyDiffHandler provides an excellent example of a steering handler:
Location: src/core/services/tool_call_handlers/apply_diff_handler.py
Key Features:
- Monitors for
apply_difftool calls - Provides steering message recommending
patch_fileinstead - Implements per-session rate limiting (default: once per 60 seconds)
- Configurable steering message via environment variables
Usage Example:
# The handler automatically steers LLMs from:
tool_call: apply_diff(...)
# To a custom response:
"You tried to use apply_diff tool. Please prefer to use patch_file tool instead,
as it is superior to apply_diff and provides automated Python QA checks."Handlers are automatically registered when:
TOOL_CALL_REACTOR_ENABLED=true(environment variable)- Your specific handler's enabled flag is
true - The DI container initializes the reactor service
For testing or manual control:
from src.core.di.services import get_service_provider
from src.core.services.tool_call_handlers.your_custom_handler import YourCustomHandler
provider = get_service_provider()
reactor = provider.get_required_service(ToolCallReactorService)
handler = YourCustomHandler()
await reactor.register_handler(handler)Check if your handler is active:
# Get registered handlers
handlers = reactor.get_registered_handlers()
print(f"Active handlers: {handlers}")
# Should include: ['apply_diff_steering_handler', 'your_custom_handler']- Single Responsibility: Each handler should handle one specific tool or scenario
- Idempotent: Handlers should be safe to run multiple times
- Fast Execution: Avoid blocking operations in handlers
- Error Handling: Always handle exceptions gracefully
- Consider Session Context: Rate limits should be per-session, not global
- Reasonable Limits: Don't overwhelm users with too many steering messages
- Configurable: Allow users to adjust rate limits via environment variables
- Unit Tests: Test handler logic in isolation
- Integration Tests: Test full request/response flow
- Mock Dependencies: Use DI to inject mock services for testing
- Environment Variables: Use clear, descriptive names
- Sensible Defaults: Provide reasonable default values
- Documentation: Document all configuration options
- Tool Steering: Guide LLMs toward preferred tools
- Safety Monitoring: Block or warn about problematic tool usage
- Usage Analytics: Track tool call patterns and statistics
- Custom Workflows: Implement domain-specific tool call handling
- Quality Assurance: Enforce coding standards or best practices
- Check
TOOL_CALL_REACTOR_ENABLED=true - Verify your handler's enabled flag is
true - Confirm handler is registered:
reactor.get_registered_handlers() - Check logs for registration errors
- Verify
can_handle()returnsTruefor your target tool calls - Check tool call format in the
ToolCallContext - Ensure proper priority ordering if multiple handlers apply
- Review rate limiting - handlers may be temporarily disabled
- Verify environment variables are set correctly
- Check configuration loading in
AppConfig - Ensure DI container is properly wired
Test individual components in isolation, using mock dependencies where necessary.
Test how components work together, focusing on request-to-response flows.
Test complete request flows to ensure overall system functionality.
- Integration Tests: Use
setup_test_command_registry()fromtests/conftest.pyto set up the DI command registry with mock dependencies. - Unit Tests: Create mock dependencies and instantiate commands directly. For
CommandParsertests, use mock commands fromtests/unit/mock_commands.py. - Stateful Commands: Create mock dependencies for
ISecureStateAccessandISecureStateModificationand pass them to the command constructor. - Skipped Tests: Update previously skipped tests to use the new DI-based commands.
OAuth backends like gemini-oauth-plan and gemini-oauth-free have specific testing considerations:
- Credential Mocking: Use
pathlib.Path.homepatches to mock~/.gemini/oauth_creds.jsonlocation - Token Refresh: Mock
_refresh_token_if_needed()to test refresh behavior - Health Checks: Test both successful and failed health check scenarios
- File Operations: Mock file I/O operations for credential loading/saving
- Error Scenarios: Test authentication errors, connectivity issues, and token expiration
Example OAuth backend test pattern:
@patch('pathlib.Path.home')
@patch.object(OAuthConnector, '_refresh_token_if_needed', new_callable=AsyncMock)
async def test_oauth_backend_health_check(self, mock_refresh, mock_home):
# Setup mock credentials file
mock_home.return_value = Path("/tmp")
# ... test implementation- Code Style: Follow PEP 8 with type hints, use Ruff for linting, and Black for formatting.
- Never log secrets: Do not print raw API keys, tokens, or credentials. Rely on the global logging redaction filter which sanitizes messages automatically.
- Request redaction is mandatory: Outbound prompts/messages are sanitized by the request redaction middleware. Do not re-introduce connector-specific redaction; keep redaction centralized and backend-agnostic.
- Configuration:
- Prompt redaction is controlled by
auth.redact_api_keys_in_prompts(default: true). CLI flag--disable-redact-api-keys-in-promptsdisables it. - API keys are discovered from config (
auth.api_keys,backends.<name>.api_key) and environment variables.
- Prompt redaction is controlled by
- When modifying the request pipeline: If you change
RequestProcessor,BackendRequestManager, or middleware wiring, ensure the redaction step remains in the active path and add/update tests. - Tests:
- Unit tests exist for the middleware and processor redaction behavior.
- Integration tests verify redaction for both streaming and non-streaming flows.
- Run the full test suite after changes to avoid regressions.
- SOLID Principles: Adhere to SRP, OCP, LSP, ISP, and DIP.
- DRY: Avoid code duplication.
- Test-Driven Development (TDD): Write tests first.
- Error Handling: Use specific exceptions and meaningful error messages.
To prevent accidental key leaks, the repository uses a mandatory pre-commit hook that runs a secret scan before allowing commits. The scan detects common API tokens and ZAI-style keys (32 hex chars + dot + 16+ alphanum) and blocks the commit if any are found.
- Install hooks (Windows virtualenv):
./.venv/Scripts/python.exe dev/scripts/install-hooks.py
- What runs on every commit:
- Secret scan:
dev/scripts/pre_commit_api_key_check.py(includes ZAI token pattern) - Architectural checks: enhanced architectural linter on staged Python files
- Secret scan:
- Run the secret scanner manually:
./.venv/Scripts/python.exe dev/scripts/pre_commit_api_key_check.py
- False positives: If the scanner flags fixtures or generated files, remove the secret-like content or avoid staging those files.
- Emergency bypass: Hooks installed as mandatory cannot be bypassed with
--no-verify. If you must proceed locally, temporarily remove.git/hooks/pre-commit, then re-run the installer after fixing the issue.
Security best practices:
- Do not place real API keys in config files or test data. Use environment variables and placeholders only.
- Keep
.envfiles untracked and never commit them. - If a leak is suspected, rotate the affected key immediately and audit CI logs/artifacts.
- Create a feature branch:
git checkout -b feature/your-feature - Write tests for new functionality.
- Ensure all tests pass:
pytest - Update documentation as needed.
- Submit a Pull Request with a clear description following the Conventional Commits format (
type(scope): subject). - Address review comments.
- Merge after approval.
- CHANGELOG.md: Project Changelog.
- User Guide: Feature documentation and configuration.
- Development Guide: Architecture, building, and testing.
- Configuration Guide: Complete configuration options.
- Architecture Guide: System architecture and design patterns.
- JSON repair is applied both in streaming (processor) and non-streaming (middleware) paths.
- Strict mode (non-streaming) is enforced when:
session.json_repair_strict_modeis true, or- Content-Type is
application/json, or expected_json=Trueis present in middleware context/metadata, or- A
session.json_repair_schemais configured.
- Convenience helpers (available for controllers/adapters):
src/core/utils/json_intent.py#set_expected_json(metadata, True)src/core/utils/json_intent.py#set_json_response_metadata(metadata, content_type='application/json; charset=utf-8')#infer_expected_json(metadata, content)- The ResponseProcessor auto-inferrs
expected_jsonif not provided; you can override it via the helper.
The streaming pipeline runs processors in this order by default:
- JSON repair
- Text loop detection
- Tool-call repair
- Middleware
- Accumulation
This ordering ensures loop detection operates on human-visible text, tool-call repair uses normalized content, and downstream middleware sees consistent data.
- In-memory metrics in
src/core/services/metrics_service.pyrecord JSON repair outcomes for both streaming and non-streaming. - Use
metrics.snapshot()for ad-hoc debugging in tests.