From 99b46a833a9c6edcaf4e14b8a0de1d1ff848afc0 Mon Sep 17 00:00:00 2001 From: Simon Iribarren Date: Tue, 2 Jun 2026 15:31:11 +0200 Subject: [PATCH 1/4] docs: add QVAC local provider section --- packages/web/src/content/docs/providers.mdx | 95 +++++++++++++++++++++ 1 file changed, 95 insertions(+) diff --git a/packages/web/src/content/docs/providers.mdx b/packages/web/src/content/docs/providers.mdx index f5c2160daaf4..f2a4bcfe86e4 100644 --- a/packages/web/src/content/docs/providers.mdx +++ b/packages/web/src/content/docs/providers.mdx @@ -1796,6 +1796,101 @@ OpenCode Zen is a list of tested and verified models provided by the OpenCode te --- +### QVAC + +You can configure opencode to use local models through [QVAC](https://qvac.com), an open-source runtime for local-first, peer-to-peer AI. QVAC exposes an OpenAI-compatible HTTP server via `qvac serve openai`, and the [`@qvac/ai-sdk-provider`](https://www.npmjs.com/package/@qvac/ai-sdk-provider) package wraps it as a branded provider. + +:::tip +Once QVAC lands in [models.dev](https://models.dev), it appears automatically in the `/connect` list — the manual config below is the explicit equivalent and is useful for pinning models and context limits. +::: + +First, install [`@qvac/cli`](https://www.npmjs.com/package/@qvac/cli) and start a server. Coding agents fire concurrent requests (a main chat completion plus a "title generation" call), and QVAC serializes inference per model file — so preload **two different model files** under aliases that match the model IDs you reference in `opencode.json`: + +```json title="qvac.config.json" +{ + "serve": { + "models": { + "qwen3-4b": { + "model": "QWEN3_4B_INST_Q4_K_M", + "preload": true, + "config": { + "ctx_size": 16384, + "reasoning_budget": 0 + } + }, + "qwen3-1.7b": { + "model": "QWEN3_1_7B_INST_Q4", + "preload": true, + "config": { + "ctx_size": 4096, + "reasoning_budget": 0 + } + } + } + } +} +``` + +```bash +npm i -g @qvac/cli +qvac serve openai +``` + +Then point opencode at it: + +```json title="opencode.json" "qvac" {5, 6, 8, 11-12, 18, 25-26} +{ + "$schema": "https://opencode.ai/config.json", + "provider": { + "qvac": { + "npm": "@qvac/ai-sdk-provider", + "name": "QVAC (local)", + "options": { + "baseURL": "http://127.0.0.1:11434/v1", + "apiKey": "qvac" + }, + "models": { + "qwen3-4b": { + "name": "Qwen3 4B (local)", + "limit": { + "context": 16384, + "output": 8192 + } + }, + "qwen3-1.7b": { + "name": "Qwen3 1.7B (local)", + "limit": { + "context": 4096, + "output": 2048 + } + } + } + } + }, + "model": "qvac/qwen3-4b", + "small_model": "qvac/qwen3-1.7b" +} +``` + +In this example: + +- `qvac` is the custom provider ID. This can be any string you want. +- `npm` specifies the package to use for this provider — `@qvac/ai-sdk-provider`, the branded QVAC wrapper around `@ai-sdk/openai-compatible`. +- `options.baseURL` is the endpoint for your local `qvac serve` (set this to match your serve port). +- `options.apiKey` can be any non-empty string — `qvac serve` does not validate it, but some clients refuse to send a request without an `Authorization` header. +- The model IDs (`qwen3-4b`, `qwen3-1.7b`) must match the **serve aliases** in `qvac.config.json` — those aliases are what the provider sends as the OpenAI `model` field. +- `model` is the main chat model; `small_model` is the lighter model opencode uses for titles and other utility calls. Pointing them at two different files lets QVAC run both concurrently. + +:::note +QVAC's LLM `ctx_size` defaults to 1024 tokens — too small for an agent's tool definitions and system prompt. Set it explicitly (16k+ for chat) in `qvac.config.json`, and set `reasoning_budget: 0` to suppress `` blocks that opencode would otherwise render verbatim. +::: + +:::tip +Local tool-calling quality is bounded by the model. Q4-quantized 4B/8B Instruct models can hold a conversation but won't reliably invoke tools; reliable local agent use generally needs ≥14B parameters with coder/agent post-training. See the [QVAC provider README](https://www.npmjs.com/package/@qvac/ai-sdk-provider) for the full agent setup notes. +::: + +--- + ### SAP AI Core SAP AI Core provides access to 40+ models from OpenAI, Anthropic, Google, Amazon, Meta, Mistral, and AI21 through a unified platform. From cb440e296dac4d4a48a8ede74f0467c35a516b19 Mon Sep 17 00:00:00 2001 From: Simon Iribarren Date: Tue, 2 Jun 2026 15:37:38 +0200 Subject: [PATCH 2/4] docs: recommend gpt-oss-20b for tool use in QVAC section --- packages/web/src/content/docs/providers.mdx | 19 ++++++++++++++++++- 1 file changed, 18 insertions(+), 1 deletion(-) diff --git a/packages/web/src/content/docs/providers.mdx b/packages/web/src/content/docs/providers.mdx index f2a4bcfe86e4..4deafb3ffcce 100644 --- a/packages/web/src/content/docs/providers.mdx +++ b/packages/web/src/content/docs/providers.mdx @@ -1886,7 +1886,24 @@ QVAC's LLM `ctx_size` defaults to 1024 tokens — too small for an agent's tool ::: :::tip -Local tool-calling quality is bounded by the model. Q4-quantized 4B/8B Instruct models can hold a conversation but won't reliably invoke tools; reliable local agent use generally needs ≥14B parameters with coder/agent post-training. See the [QVAC provider README](https://www.npmjs.com/package/@qvac/ai-sdk-provider) for the full agent setup notes. +The Qwen3 models above are a fast, low-footprint starting point, but small Instruct models won't reliably **invoke tools**. For real agent workflows use a larger, agent-tuned model — `gpt-oss-20b` (OpenAI's open-weight model, ~12 GB download, ~16 GB RAM) is the recommended local backend, and QVAC parses its tool calls natively. Add it as a third model and point `model` at it: + +```json title="qvac.config.json" +"gpt-oss-20b": { + "model": "GPT_OSS_20B_INST_Q4_K_M", + "preload": true, + "config": { "ctx_size": 32768 } +} +``` + +```json title="opencode.json" +"gpt-oss-20b": { + "name": "GPT-OSS 20B (local)", + "limit": { "context": 32768, "output": 8192 } +} +``` + +Then set `"model": "qvac/gpt-oss-20b"` and keep a small Qwen3 model as `small_model` for fast title generation. See the [QVAC provider README](https://www.npmjs.com/package/@qvac/ai-sdk-provider) for the full agent setup notes. ::: --- From c811dd3c10beb57a311461908b5f266ee7f070f3 Mon Sep 17 00:00:00 2001 From: Simon Iribarren Date: Tue, 2 Jun 2026 16:19:56 +0200 Subject: [PATCH 3/4] docs: enlarge QVAC small_model context and note Ollama port collision - bump small_model (qwen3-1.7b) to 8k context so it survives opencode's summarization/compaction passes, not just title generation - warn that qvac serve defaults to port 11434, same as Ollama --- packages/web/src/content/docs/providers.mdx | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/packages/web/src/content/docs/providers.mdx b/packages/web/src/content/docs/providers.mdx index 4deafb3ffcce..619da01f7f29 100644 --- a/packages/web/src/content/docs/providers.mdx +++ b/packages/web/src/content/docs/providers.mdx @@ -1822,7 +1822,7 @@ First, install [`@qvac/cli`](https://www.npmjs.com/package/@qvac/cli) and start "model": "QWEN3_1_7B_INST_Q4", "preload": true, "config": { - "ctx_size": 4096, + "ctx_size": 8192, "reasoning_budget": 0 } } @@ -1860,8 +1860,8 @@ Then point opencode at it: "qwen3-1.7b": { "name": "Qwen3 1.7B (local)", "limit": { - "context": 4096, - "output": 2048 + "context": 8192, + "output": 4096 } } } @@ -1876,10 +1876,10 @@ In this example: - `qvac` is the custom provider ID. This can be any string you want. - `npm` specifies the package to use for this provider — `@qvac/ai-sdk-provider`, the branded QVAC wrapper around `@ai-sdk/openai-compatible`. -- `options.baseURL` is the endpoint for your local `qvac serve` (set this to match your serve port). +- `options.baseURL` is the endpoint for your local `qvac serve` (set this to match your serve port). Note that `qvac serve openai` defaults to port `11434` — the same default as Ollama. If you run both, pass `--port` to one of them and update `baseURL` to match. - `options.apiKey` can be any non-empty string — `qvac serve` does not validate it, but some clients refuse to send a request without an `Authorization` header. - The model IDs (`qwen3-4b`, `qwen3-1.7b`) must match the **serve aliases** in `qvac.config.json` — those aliases are what the provider sends as the OpenAI `model` field. -- `model` is the main chat model; `small_model` is the lighter model opencode uses for titles and other utility calls. Pointing them at two different files lets QVAC run both concurrently. +- `model` is the main chat model; `small_model` is the lighter model opencode uses for titles, summarization, and compaction. Give it enough context (8k+) for those tasks, and point it at a different file from `model` so QVAC can run both concurrently. :::note QVAC's LLM `ctx_size` defaults to 1024 tokens — too small for an agent's tool definitions and system prompt. Set it explicitly (16k+ for chat) in `qvac.config.json`, and set `reasoning_budget: 0` to suppress `` blocks that opencode would otherwise render verbatim. From 0f2980276bd0b33cfecc0642b9e7da19af4776ef Mon Sep 17 00:00:00 2001 From: Simon Iribarren Date: Fri, 5 Jun 2026 12:36:57 +0200 Subject: [PATCH 4/4] docs: simplify QVAC opencode recipe --- packages/web/src/content/docs/providers.mdx | 59 +++++---------------- 1 file changed, 13 insertions(+), 46 deletions(-) diff --git a/packages/web/src/content/docs/providers.mdx b/packages/web/src/content/docs/providers.mdx index 619da01f7f29..c85af3f9246b 100644 --- a/packages/web/src/content/docs/providers.mdx +++ b/packages/web/src/content/docs/providers.mdx @@ -1804,26 +1804,17 @@ You can configure opencode to use local models through [QVAC](https://qvac.com), Once QVAC lands in [models.dev](https://models.dev), it appears automatically in the `/connect` list — the manual config below is the explicit equivalent and is useful for pinning models and context limits. ::: -First, install [`@qvac/cli`](https://www.npmjs.com/package/@qvac/cli) and start a server. Coding agents fire concurrent requests (a main chat completion plus a "title generation" call), and QVAC serializes inference per model file — so preload **two different model files** under aliases that match the model IDs you reference in `opencode.json`: +First, install [`@qvac/cli`](https://www.npmjs.com/package/@qvac/cli) and start a server. Preload the model alias you reference in `opencode.json`; QVAC queues same-model completion requests, so opencode's background title, summary, and compaction calls can share the same local model instead of requiring a second model file. ```json title="qvac.config.json" { "serve": { "models": { - "qwen3-4b": { - "model": "QWEN3_4B_INST_Q4_K_M", + "gpt-oss-20b": { + "model": "GPT_OSS_20B_INST_Q4_K_M", "preload": true, "config": { - "ctx_size": 16384, - "reasoning_budget": 0 - } - }, - "qwen3-1.7b": { - "model": "QWEN3_1_7B_INST_Q4", - "preload": true, - "config": { - "ctx_size": 8192, - "reasoning_budget": 0 + "ctx_size": 32768 } } } @@ -1850,25 +1841,18 @@ Then point opencode at it: "apiKey": "qvac" }, "models": { - "qwen3-4b": { - "name": "Qwen3 4B (local)", + "gpt-oss-20b": { + "name": "GPT-OSS 20B (local)", "limit": { - "context": 16384, + "context": 32768, "output": 8192 } - }, - "qwen3-1.7b": { - "name": "Qwen3 1.7B (local)", - "limit": { - "context": 8192, - "output": 4096 - } } } } }, - "model": "qvac/qwen3-4b", - "small_model": "qvac/qwen3-1.7b" + "model": "qvac/gpt-oss-20b", + "small_model": "qvac/gpt-oss-20b" } ``` @@ -1878,32 +1862,15 @@ In this example: - `npm` specifies the package to use for this provider — `@qvac/ai-sdk-provider`, the branded QVAC wrapper around `@ai-sdk/openai-compatible`. - `options.baseURL` is the endpoint for your local `qvac serve` (set this to match your serve port). Note that `qvac serve openai` defaults to port `11434` — the same default as Ollama. If you run both, pass `--port` to one of them and update `baseURL` to match. - `options.apiKey` can be any non-empty string — `qvac serve` does not validate it, but some clients refuse to send a request without an `Authorization` header. -- The model IDs (`qwen3-4b`, `qwen3-1.7b`) must match the **serve aliases** in `qvac.config.json` — those aliases are what the provider sends as the OpenAI `model` field. -- `model` is the main chat model; `small_model` is the lighter model opencode uses for titles, summarization, and compaction. Give it enough context (8k+) for those tasks, and point it at a different file from `model` so QVAC can run both concurrently. +- The model ID (`gpt-oss-20b`) must match the **serve alias** in `qvac.config.json` — that alias is what the provider sends as the OpenAI `model` field. +- `model` is the main chat model; `small_model` is the model opencode uses for titles, summarization, and compaction. Pointing both at the same QVAC alias is supported; if you want those utility calls to avoid waiting behind a long chat response, add a second lighter model alias and point `small_model` at it. :::note -QVAC's LLM `ctx_size` defaults to 1024 tokens — too small for an agent's tool definitions and system prompt. Set it explicitly (16k+ for chat) in `qvac.config.json`, and set `reasoning_budget: 0` to suppress `` blocks that opencode would otherwise render verbatim. +QVAC's LLM `ctx_size` defaults to 1024 tokens — too small for an agent's tool definitions and system prompt. Set it explicitly in `qvac.config.json`, and set `reasoning_budget: 0` for reasoning-tuned models such as Qwen3 if you want to suppress `` blocks that opencode would otherwise render verbatim. ::: :::tip -The Qwen3 models above are a fast, low-footprint starting point, but small Instruct models won't reliably **invoke tools**. For real agent workflows use a larger, agent-tuned model — `gpt-oss-20b` (OpenAI's open-weight model, ~12 GB download, ~16 GB RAM) is the recommended local backend, and QVAC parses its tool calls natively. Add it as a third model and point `model` at it: - -```json title="qvac.config.json" -"gpt-oss-20b": { - "model": "GPT_OSS_20B_INST_Q4_K_M", - "preload": true, - "config": { "ctx_size": 32768 } -} -``` - -```json title="opencode.json" -"gpt-oss-20b": { - "name": "GPT-OSS 20B (local)", - "limit": { "context": 32768, "output": 8192 } -} -``` - -Then set `"model": "qvac/gpt-oss-20b"` and keep a small Qwen3 model as `small_model` for fast title generation. See the [QVAC provider README](https://www.npmjs.com/package/@qvac/ai-sdk-provider) for the full agent setup notes. +Local tool-calling quality is bounded by the model. Small Qwen3 Instruct models are a fast, low-footprint starting point but won't reliably **invoke tools**; for real agent workflows, use a larger agent-tuned model such as `gpt-oss-20b`. ::: ---