Skip to content

[BUG] skills-init image missing openssh-client/ssh-keyscan - SSH git auth crashes init container #1770

@Aveek-Saha

Description

@Aveek-Saha

📋 Prerequisites

  • I have searched the existing issues to avoid creating a duplicate
  • By submitting this issue, you agree to follow our Code of Conduct
  • I am using the latest version of the software
  • I have tried to clear cache/cookies or used incognito mode (if ui-related)
  • I can consistently reproduce this issue

🎯 Affected Service(s)

Controller Service

🚦 Impact/Severity

Blocker

🐛 Bug Description

When an Agent is configured with SSH-based git skills (skills.gitAuthSecretRef pointing to a kubernetes.io/ssh-auth Secret), the skills-init init container crashes immediately and the Agent pod never becomes ready. SSH git authentication is completely non-functional.

🔄 Steps To Reproduce

Create a kubernetes.io/ssh-auth Secret with a valid SSH private key:

apiVersion: v1
kind: Secret
type: kubernetes.io/ssh-auth
metadata:
  name: my-agent-git-auth
  namespace: kagent
data:
  ssh-privatekey: <base64-encoded private key>

Configure an Agent with SSH gitRefs and gitAuthSecretRef:

spec:
  skills:
    gitAuthSecretRef:
      name: my-agent-git-auth
    gitRefs:
      - url: git@github.com:my-org/my-repo.git
        ref: main
        path: .agents/skills/my-skill
        name: my-skill

Observe the skills-init init container statu, it enters CrashLoopBackOff
Check init container logs: kubectl -n kagent logs -c skills-init

🤔 Expected Behavior

The skills-init container adds the git host to ~/.ssh/known_hosts via ssh-keyscan, then clones the repository successfully using the provided SSH key.

📱 Actual Behavior

The init container exits immediately with:

/skills-init.sh: ssh-keyscan: not found

The pod enters CrashLoopBackOff. The Agent is never reconciled.

💻 Environment

  • OS and version: Linux (Kubernetes node)
  • Kubernetes version: 1.32
  • Kubernetes provider: RKE2
  • Application version: v0.9.0
  • skills-init image: cr.kagent.dev/kagent-dev/kagent/skills-init:0.9.0

🔧 CLI Bug Report

N/A

🔍 Additional Context

Root cause: The generated skills-init.sh (from skills-init.sh.tmpl) runs under set -e. The SSH key branch calls ssh-keyscan as a standalone binary to populate ~/.ssh/known_hosts. The skills-init image does not have openssh-client installed, so ssh-keyscan is not present. The set -e flag causes immediate exit.

PR #1529 changed the template to derive SSH hosts dynamically from gitRefs URLs instead of hardcoding github.com gitlab.com bitbucket.org, but made no changes to the skills-init Dockerfile. The binary was never in the image; this bug predates and survives that PR.

Note: The HTTPS token path (elif [ -f "${_auth_mount}/token" ]) never calls ssh-keyscan and works correctly. Only SSH auth is broken.

Suggested fix: Add openssh-client to the skills-init Dockerfile:

RUN apk add --no-cache git openssh-client

📋 Logs

/skills-init.sh: ssh-keyscan: not found

📷 Screenshots

No response

🙋 Are you willing to contribute?

  • I am willing to submit a PR to fix this issue

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions