fix(deploy): isolate per-app failures so one bad app doesn't abort the batch by BradMclain · Pull Request #160 · uptick/gitops

BradMclain · 2026-06-22T06:30:34Z

Problem

App upgrades are deployed concurrently in Deployer.deploy() via asyncio.gather(...) without return_exceptions=True. The helm upgrade step itself is guarded with suppress_errors=True, but the steps before it are not:

cloning a custom git chart (temp_repo → git clone --branch ...)
helm dependency build

So a custom chart pointing at a git branch that doesn't exist raised an exception that escaped the per-app coroutine, propagated into the bare gather(), and cancelled every other app's deployment in the batch. The whole batch of upgrades got skipped because of one bad app.

Fix

Wrap _update_app_deployment in update_app_deployment with a try/except that converts any per-app exception into a failed UpdateAppResult, routed through the existing post_result path (Slack alert, GitHub status, results summary). The broken app is now reported as failed while the rest of the batch deploys normally.

Chose this over gather(return_exceptions=True) deliberately: the latter would push raw exception objects into update_results, which post_result_summary (r["exit_code"]) and the is not None filter don't handle — it would need extra unwrapping anyway.

Test

Added test_one_app_failing_does_not_abort_the_batch: forces the first app's chart fetch to raise (as a missing branch would), asserts deploy() doesn't raise, the surviving app still runs its full helm flow, and both apps get a post_result (one exit_code 1, one 0). Verified it goes red with the fix reverted and green with it in place.

Also included

uv.lock had drifted to 1.6.0 while pyproject.toml was already 1.6.1 — release-please's python release-type bumps version strings but doesn't regenerate uv.lock. Resynced the lock and added a uv.lock extra-files entry to release-please-config.json so future releases bump it automatically (per googleapis/release-please#2561).

🤖 Generated with Claude Code

…e batch App upgrades run concurrently via asyncio.gather() with no exception isolation. The helm upgrade itself is guarded with suppress_errors=True, but the steps before it (cloning a custom git chart, helm dependency build) raise on failure. A custom chart pointing at a git branch that doesn't exist therefore raised an exception that escaped the per-app coroutine, propagated into the bare gather(), and cancelled every other app's deployment in the batch. Wrap _update_app_deployment in update_app_deployment with a try/except that converts any per-app exception into a failed UpdateAppResult routed through the existing post_result path (Slack alert, GitHub status, results summary). The broken app is now reported as failed while the rest of the batch deploys normally. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

release-please's python release-type bumps version strings in pyproject.toml, __init__.py and version.py, but does not regenerate uv.lock. The lockfile embeds the project's own version in its [[package]] entry, so it drifted behind pyproject.toml after the 1.6.1 release (lock still recorded 1.6.0). Add uv.lock to the root package's extra-files with the toml updater targeting the gitops package version, so future releases bump it automatically (per googleapis/release-please#2561). Also resync the lockfile to the current 1.6.1. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

github-actions · 2026-06-22T06:34:10Z

Docker Images

Commit: 83dbc8e217e358ea51168eb955007ac76bceb9aa

Tag
`610829907584.dkr.ecr.ap-southeast-2.amazonaws.com/gitops:test-83dbc8e`

uptickmetachu

Good find!

I think we might want 1 more try catch block within the post_result as now its another point of failure within the gather block.

…ort the batch Addresses PR review: post_result makes Slack/GitHub network calls and runs inside the asyncio.gather() in deploy(). It is invoked from call sites not covered by update_app_deployment's guard (its own except handler and uninstall_app), so a reporting failure could still cancel every other app's deployment. Wrap post_result's body in try/except that logs and swallows. The success/failure bookkeeping (successful_apps/failed_apps) is recorded before the network call, so it survives a notification failure. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

BradMclain and others added 2 commits June 22, 2026 16:16

BradMclain requested review from KhueDuong and uptickmetachu and removed request for uptickmetachu June 22, 2026 06:31

BradMclain self-assigned this Jun 22, 2026

BradMclain added the AI_GENERATED label Jun 22, 2026

BradMclain requested a review from uptickmetachu June 22, 2026 06:31

uptickmetachu previously approved these changes Jun 22, 2026

View reviewed changes

Comment thread gitops_server/workers/deployer/deploy.py

BradMclain dismissed uptickmetachu’s stale review via 83dbc8e June 23, 2026 00:41

BradMclain requested a review from uptickmetachu June 23, 2026 01:14

uptickmetachu approved these changes Jun 23, 2026

View reviewed changes

BradMclain merged commit 29ff9c3 into develop Jun 23, 2026
3 checks passed

BradMclain deleted the no-000/graceful-batch-fail branch June 23, 2026 01:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(deploy): isolate per-app failures so one bad app doesn't abort the batch#160

fix(deploy): isolate per-app failures so one bad app doesn't abort the batch#160
BradMclain merged 3 commits into
developfrom
no-000/graceful-batch-fail

BradMclain commented Jun 22, 2026

Uh oh!

github-actions Bot commented Jun 22, 2026 •

edited

Loading

Uh oh!

uptickmetachu left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

BradMclain commented Jun 22, 2026

Problem

Fix

Test

Also included

Uh oh!

github-actions Bot commented Jun 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Docker Images

Uh oh!

uptickmetachu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented Jun 22, 2026 •

edited

Loading