Skip to content

Self-hosted ask_holmes fails: 404 on /api/investigate (bundled Holmes 0.32/0.33 only serves /api/chat) #2102

@sraj0210

Description

@sraj0210

Summary

On a self-hosted install (no Robusta UI/account), a customPlaybooks ask_holmes action fails because robusta-runner POSTs to /api/investigate,
but the bundled Holmes server only exposes /api/chat. Automated AI analysis of alerts never runs.

Environment

  • Robusta Helm chart: 0.41.0 (also confirmed present in 0.42.0)
  • robusta-runner image: robustadev/robusta-runner:0.41.0
  • holmes image: robustadev/holmes:0.32.0 (0.42.0 ships 0.33.0)
  • Self-hosted, Azure OpenAI (MODEL=azure/...), enableHolmesGPT: true, no Robusta UI sink / account
  • Sink: MS Teams

What happens

A customPlaybook ask_holmes action on on_pod_crash_loop fails in robusta-runner:

ERROR Holmes responded 404 on POST http://robusta-holmes.robusta.svc.cluster.local:80/api/investigate: {"detail":"Not Found"}
robusta.utils.error_codes.ActionException: Holmes internal configuration error.
  File ".../core/playbooks/internal/ai_integration.py", line 136, in ask_holmes

Root cause

  • robusta-runner ai_integration.py (0.41 and current master) POSTs to /api/investigate and /api/stream/investigate.
  • The bundled Holmes server (0.32.0, and master) only registers: /api/chat, /api/model, /api/oauth/callback, /healthz, /readyz — confirmed
    via the pod's own /openapi.json.
  • /api/investigate existed in Holmes ≤ 0.13.0 and was removed in the server rewrite; the runner was never migrated to /api/chat.

So the shipped chart pairs a runner and a Holmes that disagree on the endpoint, and self-hosted automated investigation is broken out of the box.

Expected

Self-hosted automated alert investigation (documented as supported with OpenAI/Azure/Bedrock) should work, or robusta-runner should call /api/chat.

Reproduce

  1. Install robusta chart 0.41/0.42 self-hosted, enableHolmesGPT: true, Azure model configured.
  2. Add a customPlaybook with an ask_holmes action (e.g. on on_pod_crash_loop).
  3. Trigger a CrashLoopBackOff pod → runner logs the 404 above; the Teams/Slack alert arrives with no AI section.

Notes (other gotchas found while debugging, in case docs need updating)

  • Model env must be under holmes.additionalEnvVars (a holmes.env: key is silently ignored → Holmes pod has no model).
  • ask_holmes also requires investigation_type (e.g. issue) and a non-null context (context.issue_type), or it raises AttributeError: 'NoneType' object has no attribute 'get' in build_investigation_title.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions