feat(chart): zero-downtime UI rollout option + 409-tolerant reindex by Neverdecel · Pull Request #32 · Neverdecel/CodeRAG

Neverdecel · 2026-06-16T19:02:42Z

Why

coderag-ui.neverdecel.com intermittently served Traefik's "no available server" page. Root cause: the UI Deployment is strategy: Recreate with a single replica (correct for the single-writer RWO index), so every image change kills the old pod before the new one is Ready. The dev deployment auto-rolls on each new beta-* image, so on a busy build day the public demo drops for the ~40-90s pull+boot window on every build.

What

ui.strategy is now configurable, default unchanged (Recreate) — no behavior change for existing installs. A deployment whose volume tolerates same-node multi-attach (k3s local-path, or RWM) and whose UI is read-only during the overlap (demo mode, Reindex hidden) can opt into a zero-surge RollingUpdate so the new pod goes Ready before the old is removed → no backend gap. Worst case if the volume can't double-mount is a stalled-but-still-up rollout, never an outage.
Reindex CronJob tolerates HTTP 409. A periodic full=false refresh that overlaps the per-upgrade full=true init Job got 409 ("already indexing"); curl -fsS treated that as failure → backoff retries → a pile of Error pods. 409 is now a benign no-op (exit 0); non-2xx still fails.
Chart 0.1.1 → 0.1.2.

Verified with helm lint + helm template: default renders Recreate, override renders RollingUpdate{maxUnavailable:0,maxSurge:1}, reindex script renders the 409 branch.

Follow-up

starnode-core flips the dev UI to RollingUpdate{0,1} once this is on master (the chart GitRepository tracks master).

The UI Deployment hardcoded strategy: Recreate. With a single replica that is correct for the single-writer ReadWriteOnce index, but it means every image change tears the old pod down before the new one is Ready, leaving the ingress with no backend for the pull+boot window (surfaces as a 502 / "no available server"). On a deployment that auto-updates on each beta image, the public UI flaps on every build. - Make ui.strategy configurable (default unchanged: Recreate). Operators whose volume tolerates same-node multi-attach (k3s local-path, RWM) AND whose UI is read-only during the overlap (demo mode) can opt into a zero-surge RollingUpdate for seamless rollouts. Worst case on a misjudged volume is a stalled-but-up rollout, never an outage. - reindex CronJob: treat HTTP 409 (a build already in progress) as a benign no-op instead of a hard curl failure, so a periodic refresh overlapping the per-upgrade init Job no longer fails the Job and piles up Error pods via backoff retries. Non-2xx still fails. - Bump chart 0.1.1 -> 0.1.2.

Neverdecel merged commit d9d50f5 into master Jun 16, 2026
12 checks passed

Neverdecel deleted the chore/zero-downtime-ui-rollout branch June 18, 2026 08:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(chart): zero-downtime UI rollout option + 409-tolerant reindex#32

feat(chart): zero-downtime UI rollout option + 409-tolerant reindex#32
Neverdecel merged 1 commit into
masterfrom
chore/zero-downtime-ui-rollout

Neverdecel commented Jun 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Neverdecel commented Jun 16, 2026

Why

What

Follow-up

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant