Skip to content

Latest commit

 

History

History
531 lines (423 loc) · 21.9 KB

File metadata and controls

531 lines (423 loc) · 21.9 KB

☸️ Deploying CodeRAG on Kubernetes

A Helm chart that self-hosts the CodeRAG HTTP/REST API (and, optionally, the web UI) on any Kubernetes cluster, with a persistent index, a git-sourced workspace, scheduled re-indexing, and sensible security defaults.

Don't use Helm? Every example below works with plain kubectl too — just pipe helm template … into kubectl apply -f - (see Without Helm).

⚠️ SECURITY — the API is UNAUTHENTICATED by default. With no API key set, anyone who can reach the Service (in-cluster, via port-forward, or through an Ingress) can read your indexed source and trigger reindexing. Before exposing CodeRAG beyond a trusted namespace you must:

  1. Set an API key so every request is authenticated — --set secrets.apiKey=… (demo) or, preferred, supply CODERAG_API_KEY via secrets.existingSecret (see Require authentication).
  2. Terminate TLS and add auth at any Ingress — never publish the plain HTTP API.
  3. Turn on the NetworkPolicy (--set networkPolicy.enabled=true) so only known clients can reach the server pod (see Lock down network access).

How it's designed (read this first)

CodeRAG keeps its index in a single embedded LanceDB store, and the engine is a single writer — the store is written non-atomically, so two processes writing one index would corrupt it. The chart is built around that fact:

  • One replica, Recreate strategy, ReadWriteOnce PVC. Never scale the writer horizontally; it is not safe and the chart intentionally pins replicas: 1.
  • Indexing is driven over HTTP, not by a second pod mounting the volume. An initial Job (and an optional CronJob) call POST /index on the running server, so exactly one process ever touches the index files.
  • The embedding model is downloaded once (≈130 MB) and cached on the data volume (CODERAG_CACHE_DIR=/data/.model-cache), so restarts don't re-download it. A generous startup probe covers that first download before liveness kicks in.
  • Standalone by default. helm install with no arguments boots a healthy server on your cluster's default storage (empty index); point it at your code with one setting.
  • The codebase is mounted read-only into the app and refreshed by a git init container (and optional git-sync sidecar), never written by the engine.
  • Hardened by default: non-root (uid 10001), read-only root filesystem, dropped capabilities, RuntimeDefault seccomp, and the service-account token is not mounted.

The server is the primary, recommended surface. The UI is optional and runs in one of two topologies:

  • Shared (recommended for a read-only/demo UI): ui.useServerIndex=true mounts the server's index volume read-only, so the UI serves exactly what the index/reindex Jobs built — no separate volume, always in sync, and it can never corrupt the writer's store. Reindexing stays a server-side Job.
  • Independent (default): the UI gets its own data volume and bundles the engine. Nothing populates that volume automatically (the index Jobs drive the server's volume), so you build it with the in-app Reindex button — which is disabled in demo mode, so a demo UI left on the default shows an empty index (0 files / 0 chunks).

Prerequisites

  • A Kubernetes cluster (v1.25+) and kubectl configured for it.
  • A default StorageClass (or set persistence.storageClass) that can provision ReadWriteOnce volumes.
  • Helm 3 (only for the Helm workflow).

Quick start

Standalone (zero config). Installs and runs anywhere with a default StorageClass — no required flags:

helm install coderag ./deploy/helm/coderag --namespace coderag --create-namespace

The server comes up healthy on a freshly provisioned 10Gi volume with an empty index. Now point it at your code — clone a git repo into the pod:

helm upgrade coderag ./deploy/helm/coderag -n coderag --reuse-values \
  --set workspace.source=git \
  --set workspace.git.repository=https://github.com/Neverdecel/CodeRAG.git

That provisions the index volume, clones the repo into the pod, and runs a one-shot Job that builds the index once the server is ready. (You can pass both --sets on the first install too, to do it in one step.) Watch it come up:

kubectl -n coderag get pods -w
kubectl -n coderag logs -f job/coderag-index-1        # initial indexing progress

Query it:

kubectl -n coderag port-forward svc/coderag-server 8000:8000
curl "http://127.0.0.1:8000/status"
curl "http://127.0.0.1:8000/search?q=where%20is%20retry%20handled&k=5"

Without Helm

The chart needs no Tiller/cluster-side component, so you can render it locally and apply the plain manifests:

helm template coderag ./deploy/helm/coderag \
  --namespace coderag \
  --set workspace.git.repository=https://github.com/Neverdecel/CodeRAG.git \
  > coderag.yaml

kubectl create namespace coderag
kubectl -n coderag apply -f coderag.yaml

Re-render and re-apply to upgrade. (You lose Helm's release tracking and the automatic revision-suffixed index Job, but the manifests are otherwise identical.)


Configuration reference

Full list with comments: values.yaml. The most-used knobs:

Value Default Purpose
image.tag beta Image tag. Pin to sha-<commit> for reproducibility.
workspace.source emptyDir emptyDir (standalone) · git · existingClaim.
workspace.git.repository Required for source=git. Repo to index.
workspace.git.ref "" Branch/tag (empty = default branch).
workspace.git.sync.enabled false Sidecar that git pulls on an interval.
persistence.enabled true Persist the index to a PVC (false = ephemeral).
persistence.size 10Gi Index volume size.
persistence.storageClass "" "" default class · <name> · "-" static.
persistence.volumeName / persistence.selector Bind a pre-provisioned PV (static).
config.provider fastembed fastembed (local, no key) · openai · fake.
config.openaiBaseUrl "" Self-hosted OpenAI-compatible endpoint.
secrets.existingSecret "" Preferred — pre-created Secret with OPENAI_API_KEY / ANTHROPIC_API_KEY / CODERAG_API_KEY.
secrets.openaiApiKey / secrets.anthropicApiKey "" Inline keys (demo only — stored in the release; prefer existingSecret).
secrets.apiKey "" Inline CODERAG_API_KEYturns API auth ON (demo only; prefer existingSecret).
networkPolicy.enabled false Recommended — default-deny ingress to the server, allow only UI/jobs/(ingress).
server.service.type ClusterIP ClusterIP · NodePort · LoadBalancer.
index.initJob.enabled true Build the index automatically on install/upgrade.
index.cronjob.enabled false Recurring reindex (index.cronjob.schedule).
ui.enabled false Also deploy the web UI.
ui.useServerIndex false UI serves the server's index (read-only) instead of its own empty volume.
ui.coLocateWithServer false Pin the UI onto the server's node — required with useServerIndex on RWO storage.
ingress.enabled false Expose via an Ingress (add TLS + auth — the API has none).
resources (server.*, ui.*) see values CPU/memory requests & limits.

Storage

The index needs one ReadWriteOnce volume per writer. The chart works with whatever your cluster already provides — you rarely need to configure anything.

Use the cluster default StorageClass (recommended). Leave persistence.storageClass: "" and the PVC binds to your default class. That covers virtually every managed and self-managed cluster out of the box:

Environment Typical default class
Amazon EKS gp3 / gp2 (EBS CSI)
Google GKE standard-rwo (PD CSI)
Azure AKS managed-csi / default (Disk CSI)
k3s / Rancher local-path
Minikube / kind standard
DigitalOcean / Civo / … provider block-storage class

Pick a specific class when you run your own provisioner:

--set persistence.storageClass=longhorn        # Longhorn
--set persistence.storageClass=nfs-client       # NFS subdir provisioner
--set persistence.storageClass=openebs-hostpath # OpenEBS LocalPV

Bind a pre-provisioned PersistentVolume (static). Common on-prem when there's no dynamic provisioner — e.g. a hand-made NFS, hostPath, or local PV. Disable provisioning with storageClass: "-" and point at the PV by name (or label):

persistence:
  storageClass: "-"            # storageClassName: "" — no dynamic provisioning
  volumeName: coderag-data-pv  # bind this specific PV
  # or match by labels instead of by name:
  # selector:
  #   matchLabels: { app: coderag }
# Example PV backed by an NFS export (apply once, cluster-wide):
apiVersion: v1
kind: PersistentVolume
metadata:
  name: coderag-data-pv
spec:
  capacity: { storage: 10Gi }
  accessModes: [ReadWriteOnce]
  storageClassName: ""
  nfs: { server: nfs.internal, path: /export/coderag }

Bring your own PVC. If you already manage the claim, reference it directly and the chart won't create one: --set persistence.existingClaim=my-index-pvc.

The index is single-writer, so ReadWriteOnce is the right access mode. ReadWriteMany (NFS, CephFS) also works if that's all you have, but it buys you nothing here.


Common scenarios

Use OpenAI or Anthropic for answers/embeddings

helm install coderag ./deploy/helm/coderag -n coderag --create-namespace \
  --set workspace.git.repository=https://github.com/org/repo.git \
  --set secrets.openaiApiKey=sk-... \
  --set config.provider=openai            # optional: OpenAI embeddings too

Prefer a pre-created Secret (so keys never sit in your values/CI):

kubectl -n coderag create secret generic coderag-keys \
  --from-literal=OPENAI_API_KEY=sk-... \
  --from-literal=ANTHROPIC_API_KEY=sk-ant-...
helm install coderag ./deploy/helm/coderag -n coderag \
  --set workspace.git.repository=https://github.com/org/repo.git \
  --set secrets.existingSecret=coderag-keys

Require authentication on the API

The CodeRAG HTTP API is unauthenticated unless CODERAG_API_KEY is set. When set, the server, the UI, and the in-cluster index/reindex Jobs all use it, and every request must present it (as a Bearer token: Authorization: Bearer <key>). Always enable this before exposing CodeRAG outside a trusted namespace.

Preferred — supply the key via a pre-created Secret (no credential in your values/CI):

kubectl -n coderag create secret generic coderag-keys \
  --from-literal=CODERAG_API_KEY="$(openssl rand -hex 32)"
helm install coderag ./deploy/helm/coderag -n coderag \
  --set secrets.existingSecret=coderag-keys
# (the same Secret can also hold OPENAI_API_KEY / ANTHROPIC_API_KEY)

# Then call the API with the key:
curl -H "Authorization: Bearer $(kubectl -n coderag get secret coderag-keys \
  -o jsonpath='{.data.CODERAG_API_KEY}' | base64 -d)" \
  http://127.0.0.1:8000/status

Demo only — inline key (lands in the stored Helm release in plaintext; fine for a throwaway cluster, not for production):

--set secrets.apiKey="$(openssl rand -hex 32)"

Lock down network access with a NetworkPolicy

By default any pod in the cluster can reach the server's Service. Enable the bundled NetworkPolicy to default-deny ingress to the server pod and allow only known clients — the UI pods, the index/reindex Jobs, and (optionally) your ingress controller — on the API port (8000). Egress is restricted to DNS and HTTPS (for git/model/provider access).

--set networkPolicy.enabled=true

Your CNI must enforce NetworkPolicy (Calico, Cilium, Antrea, Weave, AKS Azure-CNI, GKE Dataplane V2, …). To also let an ingress controller through, point the policy at its pods:

networkPolicy:
  enabled: true
  ingressController:
    namespaceSelector:
      matchLabels: { kubernetes.io/metadata.name: ingress-nginx }
    podSelector:
      matchLabels: { app.kubernetes.io/name: ingress-nginx }
  # If your provider/model egress needs a non-443 port, either add it here…
  # extraEgress:
  #   - ports: [{ port: 11434, protocol: TCP }]   # e.g. an in-cluster Ollama
  # …or disable egress restrictions entirely:
  # egress: { enabled: false }

The policy already permits DNS (53) and HTTPS (443) egress. A self-hosted OpenAI-compatible endpoint on a custom port (e.g. Ollama on 11434) needs an extraEgress rule or egress.enabled=false.

Point at a self-hosted / local model (Ollama, vLLM, …)

--set config.openaiBaseUrl=http://ollama.ai-system.svc:11434/v1 \
--set config.llmProvider=openai \
--set config.chatModel=llama3.1

Keep the index fresh automatically

Pair a git-sync sidecar (pulls new commits) with a reindex CronJob (re-embeds changes):

--set workspace.git.sync.enabled=true \
--set workspace.git.sync.periodSeconds=300 \
--set index.cronjob.enabled=true \
--set index.cronjob.schedule="*/30 * * * *"

Index a private git repository

Option A — pre-populated volume (no in-cluster git auth). Put your code on a PVC (e.g. via a CI job or kubectl cp) and mount it:

--set workspace.source=existingClaim \
--set workspace.existingClaim=my-code-pvc

Option B — your own clone init container + a Secret (credential helper, no token in the URL). Skip the built-in clone (source=emptyDir) and supply credentials from a Secret. Do not put the token in the clone URL — it leaks into process listings, shell history, git remote -v, and the repo's .git/config on the workspace volume. Instead hand it to git out-of-band via GIT_ASKPASS: write a tiny helper script to /tmp with mode 0600, and clone a clean https://github.com/... URL.

# private-repo.yaml
workspace:
  source: emptyDir            # disables the built-in (public) git clone
extraInitContainers:
  - name: git-clone
    image: alpine/git:2.45.2@sha256:16ad8e788e1d3b0c30f18da8dde5c0ace3b187445a62d8af893b003ca1e70592
    securityContext: { allowPrivilegeEscalation: false, readOnlyRootFilesystem: true, capabilities: { drop: [ALL] } }
    env:
      - name: HOME
        value: /tmp
      - name: GIT_TERMINAL_PROMPT     # fail fast instead of hanging on a credential prompt
        value: "0"
      - name: GIT_ALLOW_PROTOCOL      # restrict to safe transports
        value: "https:git"
      - name: GIT_USERNAME
        value: x-access-token         # GitHub PAT/installation token username
      - name: GIT_TOKEN
        valueFrom: { secretKeyRef: { name: git-creds, key: token } }
    command: ["/bin/sh","-c"]
    args:
      - |
        set -eu
        # GIT_ASKPASS helper: git calls it for "Username"/"Password" prompts.
        # The token never appears in the URL, argv, or .git/config.
        ASKPASS="$(mktemp /tmp/askpass.XXXXXX)"
        chmod 0600 "$ASKPASS"
        cat > "$ASKPASS" <<'EOF'
        #!/bin/sh
        case "$1" in
          Username*) printf '%s' "$GIT_USERNAME" ;;
          Password*) printf '%s' "$GIT_TOKEN" ;;
        esac
        EOF
        chmod 0700 "$ASKPASS"
        export GIT_ASKPASS="$ASKPASS"
        git clone --depth=1 -- "https://github.com/org/private-repo.git" /workspace
        rm -f "$ASKPASS"
    volumeMounts:
      - { name: workspace, mountPath: /workspace }
      - { name: tmp, mountPath: /tmp }
kubectl -n coderag create secret generic git-creds --from-literal=token=ghp_...
helm install coderag ./deploy/helm/coderag -n coderag -f private-repo.yaml

The tmp volume is an emptyDir, so the helper script lives only in memory/ephemeral storage for the life of the init container and is removed after the clone. The token is supplied solely through the git-creds Secret env var.

Expose it with an Ingress

⚠️ The API has no built-in auth. Any Ingress you create must terminate TLS and add an authentication layer in front of it. At minimum set secrets.apiKey / secrets.existingSecret (below) so the app itself rejects unauthenticated requests, and add an auth annotation/middleware at the controller (e.g. ingress-nginx nginx.ingress.kubernetes.io/auth-*, or an OAuth2 proxy). Always include a tls: block.

ingress:
  enabled: true
  className: nginx
  annotations:
    # Example: require Basic-auth at the edge (in addition to the app's API key).
    nginx.ingress.kubernetes.io/auth-type: basic
    nginx.ingress.kubernetes.io/auth-secret: coderag-basic-auth
  hosts:
    - host: coderag.example.com
      paths:
        - { path: /, pathType: Prefix, service: server }
  tls:
    - { secretName: coderag-tls, hosts: [coderag.example.com] }   # TLS is mandatory

Also run the web UI

Recommended — UI serves the server's index (read-only):

--set ui.enabled=true \
--set ui.useServerIndex=true
# On ReadWriteOnce storage (the default), also pin the UI onto the server's node:
--set ui.coLocateWithServer=true
# Omit coLocateWithServer if persistence uses a ReadWriteMany storageClass.

The UI mounts the server's index volume read-only, so it shows whatever the init/reindex Jobs built — nothing to reindex from the UI, and it stays in sync with the server. This is the right choice for a public/demo UI, where the in-app Reindex button is disabled. Open it via port-forward (svc/coderag-ui:8501) or an Ingress path with service: ui.

Independent UI (default, ui.useServerIndex=false): the UI gets its own data volume and clones the same repo, but nothing populates that volume — you must click Reindex in the sidebar to build it (impossible in demo mode). If your UI shows 0 files / 0 chunks, this is almost always why: the index Jobs filled the server's volume, not the UI's. Switch to ui.useServerIndex=true.

Pin to an immutable image (reproducible / air-gapped)

No versioned tags are published yet; the default is the rolling :beta. Pin to a commit:

--set image.tag=sha-<commit>          # API → ghcr.io/.../coderag:sha-<commit>
                                      # UI  → :sha-<commit>-ui (image.uiSuffix)

For private registries, set image.pullSecrets: [{ name: my-regcred }].


Operations

# Trigger a reindex by hand (incremental):
kubectl -n coderag exec deploy/coderag-server -c server -- \
  python -c "import urllib.request as u; print(u.urlopen(u.Request('http://127.0.0.1:8000/index', data=b'{\"full\":false}', headers={'content-type':'application/json'})).read().decode())"

# Or from your laptop after a port-forward:
curl -X POST localhost:8000/index -H 'content-type: application/json' -d '{"full": true}'

# Status, logs:
curl localhost:8000/status
kubectl -n coderag logs deploy/coderag-server -c server
kubectl -n coderag logs deploy/coderag-server -c git-sync   # if sync enabled

Upgrade

helm upgrade coderag ./deploy/helm/coderag -n coderag --reuse-values

Each upgrade runs a fresh …-index-<revision> Job to refresh the index. A ConfigMap checksum annotation rolls the pod automatically when configuration changes.

Uninstall (and reclaiming storage)

The index PVCs are annotated helm.sh/resource-policy: keep, so your index survives an uninstall. Remove the volumes explicitly when you're done:

helm uninstall coderag -n coderag
kubectl -n coderag delete pvc -l app.kubernetes.io/instance=coderag

Validate changes to the chart

The same checks run in CI (helm.yml):

helm lint deploy/helm/coderag -f deploy/helm/coderag/ci/default-values.yaml
helm template coderag deploy/helm/coderag -f deploy/helm/coderag/ci/full-values.yaml \
  | kubeconform -strict -summary -kubernetes-version 1.29.0

Troubleshooting

  • Pod stuck ContainerCreating / Pending — usually the PVC can't be provisioned. Check kubectl -n coderag describe pvc and set persistence.storageClass to a class that supports ReadWriteOnce.
  • First start is slow / startup probe restarts — the embedding model (~130 MB) is downloading. It's cached on the data volume afterwards. Raise server.startupProbe.failureThreshold on very slow networks.
  • A read-only-filesystem write error (rare; some model backend writing outside the mounted caches) — the pod runs with readOnlyRootFilesystem: true and writable /tmp, /data, and /home/coderag. If a backend insists on another path, mount it via extraVolumes/extraVolumeMounts, or relax the hardening: --set securityContext.readOnlyRootFilesystem=false.
  • UI shows 0 files / 0 chunks / 0 vectors — the UI is on its own (empty) data volume while the index Jobs populated the server's volume. They are different PVCs (…-ui-data vs …-server-data). Set ui.useServerIndex=true so the UI serves the server's index read-only (add ui.coLocateWithServer=true on ReadWriteOnce storage). The independent UI only fills its own volume via the in-app Reindex button, which is disabled in demo mode.

Limitations

  • Single writer by design — do not raise replicas. For higher search throughput, put a cache/load balancer in front of the read endpoints; the index itself stays single-writer.
  • ReadWriteOnce ties the index to one node at a time; that's expected for the embedded store.
  • The UI, when enabled, defaults to a separate index from the server. For a single shared index, set ui.useServerIndex=true (the UI reads the server's volume read-only), or run the server alone and point browsers/tools at its REST API.