Skip to content

fix(model): deterministic, type-filtered backend auto-detection (#9287)#10286

Open
localai-bot wants to merge 1 commit into
masterfrom
fix/9287-backend-autodetect
Open

fix(model): deterministic, type-filtered backend auto-detection (#9287)#10286
localai-bot wants to merge 1 commit into
masterfrom
fix/9287-backend-autodetect

Conversation

@localai-bot

Copy link
Copy Markdown
Collaborator

Closes #9287

Problem

When a model config has no explicit backend:, (*ModelLoader).Load built the auto-detect candidate list by ranging an unordered Go map of installed backends with no filtering, then loaded the first one whose gRPC LoadModel succeeded. Every installed backend is registered there - including non-LLM ones like the opus audio codec - so after installing such a backend it could win a GGUF/LLM load, sending the model to the wrong backend.

Fix

New pure, unit-tested SelectAutoLoadBackends(available, modelFile):

  • deterministic sort (no more map-iteration randomness),
  • for .gguf files, filters to LLM-capable backends (chat/completion/edit/embeddings usecases via core/config.BackendCapabilities) with llama-cpp first,
  • zero-candidate fallback returns the full sorted set, so nothing previously loadable becomes unloadable.

Load() now calls this instead of ranging the map directly. (Verified pkg/model -> core/config introduces no import cycle.)

Test plan

  • New Ginkgo specs in pkg/model/autoload_test.go (red -> green): given {opus, llama-cpp} + a .gguf, opus is excluded and llama-cpp is first; deterministic order; zero-candidate fallback returns the original set.
  • go test ./pkg/model/... ./core/config/... green; scoped golangci-lint --new-from-merge-base clean.

Follow-up (noted, not done)

Did not force cfg.Backend = "llama-cpp" in the empty-backend GGUF hook (more blast radius on non-llama GGUFs); the candidate filter alone fixes the bug. A metadata-based GGUF architecture check is a possible refinement.

Assisted-by: claude:claude-opus-4-8 [Claude Code]

)

When a model config declares no explicit `backend:`, Load() fell into a
trial loop built by ranging the external-backends Go map (random order)
with no filtering, returning the first backend whose gRPC LoadModel
succeeded. An unrelated installed backend - e.g. the "opus" audio codec -
could therefore win a GGUF/LLM model load, so a model that should run on
llama.cpp wrongly tried to use opus.

Extract the candidate selection into a pure, testable function
SelectAutoLoadBackends that:

  - sorts the candidate list deterministically (no more map-order
    nondeterminism), and
  - for a `.gguf` model, filters to LLM-capable backends (via
    core/config.BackendCapabilities) and puts llama-cpp first, so an
    incompatible audio/codec/image backend can never win the trial loop.

If filtering would leave zero candidates, the full sorted set is returned
unchanged, so a previously-loadable model is never made unloadable.

Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Assisted-by: claude:claude-opus-4-8 [Claude Code]
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

If you add opus as a backend via api, then add a model via the api that should use llama.cpp or vllm, it then tries to use opus

2 participants