Screencap can classify events using a local model exposed via an OpenAI-compatible HTTP API.
This integration is currently text-only:
- Inputs: context metadata + OCR text
- Not used: screenshot pixels (no vision / multimodal payloads yet)
The classification router tries, in order:
- Cache reuse by
(stableHash, contextKey)(no OCR/LLM) - Local retrieval from your history (no LLM)
- Local HTTP LLM (this doc)
- Cloud text (OpenRouter)
- Cloud vision (enabled by default; can be disabled)
- Fallback baseline
For steps 3–4, Screencap always provides:
- Context:
appBundleId,appName,windowTitle,urlHost,contentKind,contentTitle - OCR:
ocr_textextracted locally using macOS Vision
- A local server that implements
POST /v1/chat/completions(OpenAI-compatible) - A model that can reliably output JSON when instructed
In Settings → AI → Local LLM (Ollama / LM Studio):
- Enable Local LLM
- Set Base URL
- Ollama:
http://localhost:11434/v1 - LM Studio:
http://localhost:1234/v1(depends on your server settings)
- Ollama:
- Set Model
- Must match what your server expects (often visible in
/v1/models)
- Must match what your server expects (often visible in
- Use Test
- Install and run Ollama.
- Pull a model and ensure the OpenAI-compatible API is available.
- Set in Screencap:
- Base URL:
http://localhost:11434/v1 - Model: a model name returned by
GET /v1/models
- Base URL:
Quick check:
curl -s http://localhost:11434/v1/models | head- Start the local server in LM Studio (OpenAI-compatible mode).
- Set in Screencap:
- Base URL: whatever LM Studio shows (commonly
http://localhost:1234/v1) - Model: the served model id/name
- Base URL: whatever LM Studio shows (commonly
Quick check:
curl -s http://localhost:1234/v1/models | headEnable it in Settings → AI.
- The local server is not running, or the port is wrong.
- Verify the base URL and that
/v1/modelsresponds.
Your local model returned non-JSON text. Use a stronger instruction-tuned model, or a model that is known to follow structured-output prompts.
Local classification is driven mostly by OCR + context. Accuracy depends heavily on:
- OCR quality (fonts, contrast, language)
- Window titles / URL hosts / content titles
- Model capability
If you need higher accuracy and you accept image upload, keep Allow vision uploads enabled (Settings → AI).
Some local models are multimodal, but Screencap does not send images to the local endpoint yet. If you want local vision support, it requires a dedicated provider that sends OpenAI-style image_url parts and a runtime that supports them.