SOLLOL

Super Ollama Load balancer & Orchestration Layer
Intelligent routing, observability, and distributed inference for Ollama clusters.

What It Is

SOLLOL sits between your applications and a collection of Ollama nodes. It discovers them, monitors their health, scores them by GPU/CPU capacity and current load, then routes each request to the best available node. If a node dies, it fails over automatically.

Think of it as a drop-in replacement for talking to Ollama directly — same API, but with intelligent routing, observability, and cluster management layered on top.

Quick Start

pip install sollol

from sollol import OllamaPool

# Auto-discover all Ollama nodes on the network
pool = OllamaPool.auto_configure()

# Make a request — SOLLOL routes it to the best node
response = pool.chat(
    model="llama3.2",
    messages=[{"role": "user", "content": "Hello!"}]
)
print(response['message']['content'])

That's it. No async/await, no config files. It finds nodes, picks the best one, routes the request.

CLI (Gateway Mode)

Run SOLLOL as a gateway that replaces Ollama on port 11434:

sollol up

Applications talking to localhost:11434 now go through SOLLOL's routing engine instead of hitting a single node directly.

Core Features

Intelligent Routing

SOLLOL scores every node on multiple factors before routing:

Factor	What It Checks
Success rate	Historical reliability of each node
Latency	Response time from recent requests
GPU availability	Whether the node has a GPU (via gpustat + Redis)
Current load	Queue depth and active requests
Task type	Generation, embedding, or classification workloads
Priority	Request priority (CRITICAL → BATCH)
Specialization	Nodes that historically perform well for specific models

Scoring formula:

Score = 100 (baseline)
      × success_rate
      ÷ (1 + latency_penalty)
      × gpu_bonus (1.5x if GPU available & needed)
      ÷ (1 + load_penalty)
      × priority_alignment
      × task_specialization

Auto-Discovery

Scans the local network for Ollama instances. No need to configure node addresses manually.

GPU-Aware Routing

Install the GPU reporter on each node:

sollol install-gpu-reporter --redis-host 192.168.1.10

Publishes real-time VRAM stats to Redis every 5 seconds. SOLLOL uses this to avoid routing heavy models to nodes that don't have the memory.

Dashboard

python3 -m sollol.dashboard_service &

Web UI at http://localhost:8080 showing:

Node health and status
P50/P95/P99 latency metrics
Active applications using the cluster
GPU memory usage (if reporter installed)
Live request/activity logs

Distributed Execution

Ray — Parallel request execution via Ray actors for multi-agent workloads.

Dask — Batch processing for embeddings and bulk inference with work stealing.

Sync API

No async/await needed:

from sollol.sync_wrapper import OllamaPool
from sollol.priority_helpers import Priority

pool = OllamaPool.auto_configure()

response = pool.chat(
    model="llama3.2",
    messages=[{"role": "user", "content": "Hello!"}],
    priority=Priority.HIGH,
    timeout=60
)

Hedging

Duplicate slow requests to a second node and take the first response back. Configurable threshold.

Circuit Breakers

Nodes that start failing get temporarily removed from the pool until they recover.

Architecture

┌─────────────────────┐
│   Your Application   │
└──────────┬──────────┘
           │
           ▼
┌──────────────────────────────────┐
│      SOLLOL Gateway (:11434)     │
│  ┌────────────────────────────┐  │
│  │  Intelligent Routing Engine │  │
│  │  Scores all nodes, picks    │  │
│  │  the best, routes request   │  │
│  └────────────┬───────────────┘  │
│  ┌────────────┴───────────────┐  │
│  │  Priority Queue + Hedging  │  │
│  │  + Circuit Breakers        │  │
│  └────────────┬───────────────┘  │
└───────────────┼──────────────────┘
                │
    ┌───────────┼───────────┐
    ▼           ▼           ▼
┌───────┐  ┌───────┐  ┌───────┐
│ Node 1 │  │ Node 2 │  │ Node 3 │
│ GPU 24 │  │ GPU 16 │  │ CPU    │
└───────┘  └───────┘  └───────┘

Installation

# From PyPI
pip install sollol

# From source
git clone https://github.com/B-A-M-N/SOLLOL.git
cd SOLLOL
pip install -e .

Commands

Command	Description
`sollol up`	Start the SOLLOL gateway on port 11434
`sollol install-gpu-reporter --redis-host <ip>`	Set up GPU monitoring on a node

Configuration

All settings work via environment variables:

Variable	Default	Description
`SOLLOL_PORT`	11434	Gateway port
`SOLLOL_RAY_WORKERS`	4	Ray actor count
`SOLLOL_DASK_WORKERS`	2	Dask worker count
`OLLAMA_NODES`	auto-discover	Comma-separated node addresses
`RPC_BACKENDS`	none	Comma-separated llama.cpp RPC backends
`SOLLOL_BATCH_PROCESSING`	true	Enable Dask batch mode

Design Principle

Route to the right node, fail over fast, and never lose a request to a dead endpoint.

Acknowledgments

Dallan Loomis — for the interactions and guidance that kept this project on track
My parents — for the support that made all of this possible
My son — the reason I build

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 235 Commits
.github		.github
benchmarks		benchmarks
config		config
docs		docs
examples		examples
kubernetes		kubernetes
scripts		scripts
src/sollol		src/sollol
systemd		systemd
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
CONFIGURATION.md		CONFIGURATION.md
CONTRIBUTING.md		CONTRIBUTING.md
DASHBOARD_RPC_FIX.md		DASHBOARD_RPC_FIX.md
DASK_ENVIRONMENT_PROPAGATION.md		DASK_ENVIRONMENT_PROPAGATION.md
DISTRIBUTED_NODE_FIXES.md		DISTRIBUTED_NODE_FIXES.md
DOCUMENTATION_UPDATES_2025-10-20.md		DOCUMENTATION_UPDATES_2025-10-20.md
Dockerfile		Dockerfile
EXPERIMENTAL_FEATURES.md		EXPERIMENTAL_FEATURES.md
FINAL_LINT_FIX.md		FINAL_LINT_FIX.md
INSTALLATION.md		INSTALLATION.md
INTELLIGENT_NODE_DISCOVERY.md		INTELLIGENT_NODE_DISCOVERY.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
QUICK_START.md		QUICK_START.md
README.md		README.md
ROADMAP.md		ROADMAP.md
RPC_BACKEND_DISCOVERY_COMPLETE.md		RPC_BACKEND_DISCOVERY_COMPLETE.md
RPC_ROUTING_FIX.md		RPC_ROUTING_FIX.md
SECURITY.md		SECURITY.md
SOLLOL_CONFIGURATION_GUIDE.md		SOLLOL_CONFIGURATION_GUIDE.md
SOLLOL_CONFIG_QUICK_REF.md		SOLLOL_CONFIG_QUICK_REF.md
SOLLOL_DISCOVERY_PRIORITY_FIX.md		SOLLOL_DISCOVERY_PRIORITY_FIX.md
SOLLOL_LOCALITY_AWARENESS_ISSUE.md		SOLLOL_LOCALITY_AWARENESS_ISSUE.md
SOLLOL_LOCK_OPTIMIZATION.md		SOLLOL_LOCK_OPTIMIZATION.md
TERMINOLOGY_CLARIFICATION.md		TERMINOLOGY_CLARIFICATION.md
UNIFIED_OBSERVABILITY.md		UNIFIED_OBSERVABILITY.md
compare_discovery_modes.py		compare_discovery_modes.py
config.yml		config.yml
dashboard.html		dashboard.html
demo_distributed_setup.sh		demo_distributed_setup.sh
docker-compose.test.yml		docker-compose.test.yml
docker-compose.yml		docker-compose.yml
gpu_reporter.py		gpu_reporter.py
log_streamer.py		log_streamer.py
mkdocs.yml		mkdocs.yml
prometheus.yml		prometheus.yml
pyproject.toml		pyproject.toml
refresh.js		refresh.js
rpc_backends.conf		rpc_backends.conf
setup.py		setup.py
setup_llama_cpp.py		setup_llama_cpp.py
test_activity.py		test_activity.py
test_batch_api.py		test_batch_api.py
test_connection_reuse.py		test_connection_reuse.py
test_dashboard.py		test_dashboard.py
test_dashboard_command.py		test_dashboard_command.py
test_dashboard_fallback_simple.py		test_dashboard_fallback_simple.py
test_dashboards.py		test_dashboards.py
test_dask_adaptive.py		test_dask_adaptive.py
test_dask_batch.py		test_dask_batch.py
test_dask_comparison.py		test_dask_comparison.py
test_docker.sh		test_docker.sh
test_embed_batch.py		test_embed_batch.py
test_failure_recovery.py		test_failure_recovery.py
test_full_network_discovery.py		test_full_network_discovery.py
test_integration.py		test_integration.py
test_locality_awareness.py		test_locality_awareness.py
test_multi_app_dashboard.py		test_multi_app_dashboard.py
test_new_features.py		test_new_features.py
test_observer_debug.py		test_observer_debug.py
test_optimizations.py		test_optimizations.py
test_ray_features.py		test_ray_features.py
test_remote_coordinator.py		test_remote_coordinator.py
test_resilience.py		test_resilience.py
test_routing_log.py		test_routing_log.py
test_routing_strategies.py		test_routing_strategies.py
test_routing_validation.py		test_routing_validation.py
test_rpc_backend_fix.py		test_rpc_backend_fix.py
test_vram_monitoring.py		test_vram_monitoring.py
test_websocket_client.py		test_websocket_client.py
verify_dashboards.py		verify_dashboards.py
verify_parallel_nodes.py		verify_parallel_nodes.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SOLLOL

What It Is

Quick Start

CLI (Gateway Mode)

Core Features

Intelligent Routing

Auto-Discovery

GPU-Aware Routing

Dashboard

Distributed Execution

Sync API

Hedging

Circuit Breakers

Architecture

Installation

Commands

Configuration

Design Principle

Acknowledgments

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SOLLOL

What It Is

Quick Start

CLI (Gateway Mode)

Core Features

Intelligent Routing

Auto-Discovery

GPU-Aware Routing

Dashboard

Distributed Execution

Sync API

Hedging

Circuit Breakers

Architecture

Installation

Commands

Configuration

Design Principle

Acknowledgments

License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages