A high-performance control plane for Ollama-based local LLM sessions and background memory consolidation. Optimized for RTX 4090 environments and large-context model orchestration.
If you need a reliable way to manage local LLM sessions, monitor VRAM usage, and maintain a structured memory between tasks without manual overhead—this is for you.
- Background Memory (KAIROS): Automatically consolidates session data into a persistent
MEMORY.mdusing asynchronous background cycles. - Ollama Control Plane: Comprehensive toolset for model management, health checks, and high-speed local inference.
- Hybrid Planning (ULTRAPLAN): Optional handoff to Claude Opus for complex reasoning, while maintaining 100% local tool execution.
- Fleet Observability: Integrated React dashboard for real-time monitoring of sessions, models, and memory status.
# Install dependencies
.\setup.ps1
# Start the MCP server + webapp
.\start.ps1- SSE Endpoint:
http://localhost:10932/sse - Fleet Dashboard:
http://localhost:10933 - Health API:
http://localhost:10932/api/health
| Tool | Action |
|---|---|
start_session |
Initialize a new Ollama session |
send_prompt |
Execute a prompt in an active session |
kairos_enable |
Activate background memory consolidation |
kairos_disable |
Halt background memory consolidation |
list_models |
Inventory of available Ollama models |
model_status |
Check VRAM and load status |
ultraplan |
Hybrid cloud/local planning cycle |
fleet_status |
Global health of the instance |
OpenClaude operates as a unified REST bridge for FastMCP, providing a robust integration layer between local LLM runtimes and agentic workflows. It is hardened for Windows environments with UV dependency management and subprocess isolation.
- Memory Management — KAIROS implementation details
- Hybrid Planning — ULTRAPLAN handoff logic
- Reliability & Hardening — Process management and stability
- Ollama (running locally)
- Node.js (v20+)
- Python 3.13+ with
uv - Local model pulled (e.g.,
gemma2,llama3.3)
This project adheres to SOTA 14.1 industrial standards for high-fidelity agentic orchestration:
- Python (Core): Ruff for linting and formatting. Zero-tolerance for
printstatements in core handlers (T201). - Webapp (UI): Biome for sub-millisecond linting. Strict
noConsoleLogenforcement. - Protocol Compliance: Hardened
stdout/stderrisolation to ensure crash-resistant JSON-RPC communication. - Automation: Justfile recipes for all fleet operations (
just lint,just fix,just dev). - Security: Automated audits via
banditandsafety.
MIT