Skip to content

[BUG] Agent pod CrashLoops with pydantic ValidationError when RemoteMCPServer ref omits toolNames (http_tools[*].tools=null) #1797

@reilly3000

Description

@reilly3000

📋 Prerequisites

  • I have searched the existing issues to avoid creating a duplicate
  • By submitting this issue, you agree to follow our Code of Conduct
  • I am using the latest version of the software
  • I can consistently reproduce this issue

🎯 Affected Service(s)

Multiple services / System-wide issue

🚦 Impact/Severity

Blocker

🐛 Bug Description

When an Agent (kagent.dev/v1alpha2) lists a tool of type: McpServer
referencing a RemoteMCPServer without an explicit toolNames: filter,
the controller materializes the agent's config.json Secret with
http_tools[*].tools = null (and sse_tools[*].tools = null for the SSE
protocol). The Python ADK runtime image
(cr.kagent.dev/kagent-dev/kagent/app:0.9.1) then crashes on startup with a
pydantic ValidationError, leaving the agent pod in CrashLoopBackOff.

Whether toolNames is intended to be required or optional, the current
behavior is a bug because omitting it silently produces a config that the
runtime cannot load, with no controller-side validation or admission error.

The mismatch is between two sides of the same wire format:

The CRD schema for McpServerTool.toolNames does not appear to mark the
field as required, so users (and tooling) have no signal that omitting it
is unsafe.

🔄 Steps To Reproduce

  1. Stand up kagent v0.9.1 against any working RemoteMCPServer
    (e.g. cr.kagent.dev/kagent-dev/kagent/tools).

  2. Apply the following Agent (no toolNames on the tool ref):

    apiVersion: kagent.dev/v1alpha2
    kind: Agent
    metadata:
      name: repro
      namespace: kagent
    spec:
      type: Declarative
      description: repro
      declarative:
        modelConfig: <existing>
        systemMessage: "test"
        tools:
          - type: McpServer
            mcpServer:
              apiGroup: kagent.dev
              kind: RemoteMCPServer
              name: <existing-rms>
              # NOTE: no toolNames
  3. Observe the agent pod enter CrashLoopBackOff.

  4. kubectl logs shows the pydantic validation error below.

  5. Inspect the rendered config:
    kubectl get secret repro-config -o jsonpath='{.data.config\.json}' | base64 -d | jq '.http_tools[0].tools'
    prints null.

Adding any non-empty toolNames: [<one>] to the tool ref makes the pod
start successfully — that's a viable workaround.

🤔 Expected Behavior

The agent should not CrashLoop on a config the controller itself produced.
Possible resolutions (deferring to maintainers on the intended semantics):

  1. If toolNames is meant to be required, validate it in the CRD
    schema or in a controller-side admission check so users get a clear
    error at apply-time rather than a pydantic crash at pod-start.
  2. If toolNames is meant to be optional (with omission meaning "all
    discovered tools"), the controller should emit an explicit list (or
    the runtime should tolerate None) rather than handing the runtime a
    null it can't parse.

📱 Actual Behavior

Agent pod CrashLoops on startup with:

ValidationError: 1 validation error for AgentConfig
http_tools.0.tools
  Input should be a valid list [type=list_type, input_value=None,
  input_type=NoneType]

The rejected blob is:

{
  "http_tools": [
    {
      "params": {"url": "...", "headers": {}, "terminate_on_close": true},
      "tools": null
    }
  ]
}

💻 Environment

  • Kubernetes: Talos Linux, k8s 1.34
  • kagent: v0.9.1 (cr.kagent.dev/kagent-dev/kagent/app:0.9.1 runtime,
    controller image cr.kagent.dev/kagent-dev/kagent/controller:0.9.1)
  • Helm chart: kagent 0.9.1
  • Topology: agents reference a single RemoteMCPServer named
    agentgateway-mcp that fronts several upstream MCP servers via
    agentgateway. The RMS reports ~50
    discoveredTools and is Healthy.

🔍 Additional Context

  • Same field exists on sse_tools[*].tools and the same nil-slice → JSON
    null defect applies there.
  • Workaround: pin explicit toolNames per agent.
  • I'll open a PR shortly that addresses both sides defensively (controller
    expands discovered tools when toolNames is empty; runtime coerces
    None[] so older controllers don't crash newer runtimes). Happy
    to rework toward whichever direction maintainers prefer if the intended
    semantics are different.

📋 Logs

$ kubectl logs -n kagent repro-<hash>
Traceback (most recent call last):
  ...
  File ".../kagent/adk/types.py", line ..., in to_agent
    config = AgentConfig.model_validate(...)
pydantic_core._pydantic_core.ValidationError: 1 validation error for AgentConfig
http_tools.0.tools
  Input should be a valid list [type=list_type, input_value=None, input_type=NoneType]
    For further information visit https://errors.pydantic.dev/2.x/v/list_type

🙋 Are you willing to contribute?

  • I am willing to submit a PR to fix this issue

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions