Request: CLI / GET endpoint for runnerConfig

Currently it is difficult to verify the current runtime configs for models. It appears you can set configuration either through `docker model configure` or via a `.../_configure/` endpoint, but as far as I can tell there is no way to see what the runtime params are.

I was able to view it with something like this, but it would be more ideal to be able to retrieve this information via cli or the API.

```bash
~ ❯ tr '\0' ' ' < /proc/3352229/cmdline ; echo
/app/bin/com.docker.llama-server -ngl 999 --metrics --model /models/bundles/sha256/8d4c5d3f8f32429577d8e9403454a03bf8784f29f600fd09427240a2c4f78c3c/model/model.gguf --host inference-runner-0.sock --ctx-size 4096 --jinja
```

Given that things like moe layer placement, flash attention, kv quantization often need to be tweaked for a given set of hardware, it would be nice if this were better exposed. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Request: CLI / GET endpoint for runnerConfig #440

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Request: CLI / GET endpoint for runnerConfig #440

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions