Skip to content

Commit 74ea0ac

Browse files
final
Signed-off-by: Adrian Cole <adrian@tetrate.io>
1 parent 4f8df81 commit 74ea0ac

3 files changed

Lines changed: 3 additions & 10 deletions

File tree

inference-platforms/chat.py

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -39,10 +39,8 @@ def main():
3939

4040
# vllm-specific switch to disable thinking, ignored by other inference platforms.
4141
# See https://qwen.readthedocs.io/en/latest/deployment/vllm.html#thinking-non-thinking-modes
42-
if "qwen3" in model.lower():
43-
extra_body = {"chat_template_kwargs": {"enable_thinking": False}}
44-
else:
45-
extra_body = {}
42+
extra_body = {"chat_template_kwargs": {"enable_thinking": False}} if model.startswith("Qwen/Qwen3") else None
43+
4644
if args.use_responses_api:
4745
response = client.responses.create(
4846
model=model, input=messages[0]["content"], temperature=0, extra_body=extra_body

inference-platforms/llama-stack/README.md

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -43,15 +43,11 @@ uv run --exact -q --env-file env.local ../agent.py --use-responses-api
4343

4444
* Llama Stack's Responses API connects to MCP servers server-side (unlike aigw
4545
which proxies MCP). The agent passes MCP configuration via `HostedMCPTool`.
46-
4746
* Uses the `starter` distribution with its built-in `remote::openai` provider,
4847
pointing to Ollama via `OPENAI_BASE_URL` environment variable.
4948
* Models require `provider_id/` prefix (e.g., `openai/qwen3:0.6b`)
50-
* Until [this issue][docker] resolves, running docker on Apple Silicon
51-
requires emulation.
5249

5350
---
54-
[docker]: https://github.com/llamastack/llama-stack/issues/406
5551
[docs]: https://llama-stack.readthedocs.io/en/latest/index.html
5652
[otel-sink]: https://llama-stack.readthedocs.io/en/latest/building_applications/telemetry.html#configuration
5753
[uv]: https://docs.astral.sh/uv/getting-started/installation/

inference-platforms/llama-stack/docker-compose.yml

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,8 +15,7 @@ services:
1515
depends_on:
1616
ollama-pull:
1717
condition: service_completed_successfully
18-
image: llamastack/distribution-starter:0.4.1
19-
platform: linux/amd64 # ARM64 not published: https://github.com/llamastack/llama-stack/issues/406
18+
image: llamastack/distribution-starter:0.5.0
2019
container_name: llama-stack
2120
tty: true
2221
env_file:

0 commit comments

Comments
 (0)