Skip to content

fix(github-status-comments): merge entries across concurrent workflows#1108

Open
gregmagolan wants to merge 4 commits into
mainfrom
fix-ghsc-cross-workflow-merge
Open

fix(github-status-comments): merge entries across concurrent workflows#1108
gregmagolan wants to merge 4 commits into
mainfrom
fix-ghsc-cross-workflow-merge

Conversation

@gregmagolan
Copy link
Copy Markdown
Member

@gregmagolan gregmagolan commented May 21, 2026

Summary

When two GitHub Actions workflows run in parallel on the same PR (e.g. ci-github-runners.yaml + ci-workflows-runners.yaml), each workflow's status-comment artifact swarm only sees its own GITHUB_RUN_ID's artifacts (the list_run_artifacts endpoint is workflow-scoped). Both swarms PATCH the single PR-marker comment using only their own task entries, so each upsert clobbers the other workflow's rows.

The PR comment is now the canonical cross-workflow source of truth. A hidden state block at the bottom of the comment carries:

Field Purpose
head_sha Mismatch wipes everything — each push starts the comment fresh.
last_upsert_epoch_s Cross-workflow lease — every workflow sees when any other workflow last published.
workflow_runs [{run_id, workflow_name}, ...] — one slot per workflow active on this PR.
entries Full per-task snapshots, used as a TTL/race fallback.

Each publish cycle does:

  1. GET the existing comment + parse the state block.
  2. list_run_artifacts(own_run_id) + download → fresh OWN entries.
  3. For each OTHER run_id in the state block: list_run_artifacts(other_run_id) + download — fresh sibling-workflow data. On API failure or empty listing (e.g. artifact TTL expiry), fall back to the comment's snapshot for that workflow.
  4. Build the new state by overlaying our own slot onto sibling slots; never modify sibling slots.
  5. Render the body + new state block; PATCH.

Why this eliminates the race: each writer only writes its own slot (its row in workflow_runs + its own entries). Sibling slots are passed through. Two concurrent writers in different workflows touch disjoint slots → no cross-workflow data race. Intra-workflow writers see the same list_run_artifacts(own) output → they all write identical data, so an intra-workflow race is benign.

TTL resilience: the entries snapshot in the state block survives past artifact retention (default 30min), so a workflow that finished hours ago still renders correctly.

Backward compat: writers on older aspect-cli versions don't emit the state block. New-version writers see parsed == None and fall back to swarm-only rendering for that cycle. Once both workflows ship the new version, the merge path kicks in. The visible markdown layout is unchanged — only the hidden HTML-comment block is added.

API-cost optimizations:

  • Swarm-local fast-path: when the swarm-local lease already holds, the combined lease (max(state, swarm-local)) also holds, so we'd skip the PATCH either way. Short-circuit before the GET and the per-sibling-workflow list_run_artifacts refresh. Only the leaseholder per swarm pays the cross-workflow GET cost per cycle. final=True always runs the full path so terminal verdicts pick up the freshest cross-workflow state.
  • Cross-workflow epoch propagation: after the leaseholder GETs and parses the state block, bump our own _last_pr_comment_upsert_epoch_s to max(state.last_upsert_epoch_s, our prior stamp). The next _publish_self carries the higher value on our artifact, so sibling swarm members' _max_recent_upsert_epoch then sees it and their fast-path skips the GET too — propagating the cross-workflow lease across the swarm without each member needing its own GET.

A new github.issues.get_comment API reads a single comment by id to avoid re-listing the PR's comments every cycle.

This PR also wires up a new ci-github-runners.yaml workflow that exercises the multi-workflow path against GitHub-hosted ephemeral runners. Those runners don't have remote-cache access, so the pre-build self-hosted job uploads the just-built aspect-cli binary as an artifact and a shared install-prebuilt-aspect-cli composite action (marked internal-only in its description) downloads it into $HOME/.local/bin for each ephemeral job.


Changes are visible to end-users: yes

  • Searched for relevant documentation and updated as needed: no
  • Breaking change (forces users to change their own code or config): no
  • Suggested release notes appear below: yes
* Bug fix: when two GitHub Actions workflows run in parallel on the same PR (e.g. one for GHA runners and one for Aspect Workflows runners), the PR status comment used to flap between each workflow's task rows. The comment is now an authoritative cross-workflow source of truth: each writer refreshes other workflows' data from their artifacts before rendering, so writes are slot-independent and no longer clobber each other.

Test plan

  • aspect dev test-pr-comment-snapshots (20 scenarios + 17 unit checks) passes. Checks cover serialize/parse round-trip, malformed payload, missing-terminator payload, wrong-shape field normalization, non-dict row filtering, merge first-writer, two-workflows-with-fresh-refresh, snapshot-fallback when refresh missing, own-slot replacement, stale-push drop, orphan-snapshot-entry drop, workflow_runs sorted, entries sorted, render-body state-block presence/absence.
  • aspect tests axl passes — no regression elsewhere.
  • New ci-github-runners.yaml workflow exercises the multi-workflow path end-to-end against GitHub-hosted ephemeral runners.

@aspect-workflows
Copy link
Copy Markdown

aspect-workflows Bot commented May 21, 2026

✨ Aspect Workflows Tasks

📅 Fri May 22 00:22:41 UTC 2026

⚠️ 2 flagged tasks

  • ⚠️ delivery (delivery-gha) · ⏱ 22.4s · ✨ Aspect · 🐙 GitHub Actions · ☑️ Check
    💬 Delivery complete (2 warn · 1 skipped)
  • ⚠️ delivery (delivery-gha-debug) · ⏱ 19.3s · ✨ Aspect · 🐙 GitHub Actions · ☑️ Check
    💬 Delivery complete (2 warn · 1 skipped)

✅ 17 successful tasks

  • ✅ build (build-gha) · ⏱ 24.2s · ✨ Aspect · 🐙 GitHub Actions · ☑️ Check
    💬 Bazel build complete (160 built)
  • ✅ build (build-gha-debug) · ⏱ 59.7s · ✨ Aspect · 🐙 GitHub Actions · ☑️ Check
    💬 Bazel build complete (160 built)
  • ✅ format (buildifier-gha) · ⏱ 18.4s · ✨ Aspect · 🐙 GitHub Actions · ☑️ Check
    💬 Format complete (clean)
  • ✅ format (buildifier-gha-debug) · ⏱ 21.2s · ✨ Aspect · 🐙 GitHub Actions · ☑️ Check
    💬 Format complete (clean)
  • ✅ format (buildifier-gha-ephemeral) · ⏱ 1m 14s · 🐙 GitHub Actions · ☑️ Check
    💬 Format complete (clean)
  • ✅ format (format-gha) · ⏱ 18.9s · ✨ Aspect · 🐙 GitHub Actions · ☑️ Check
    💬 Format complete (clean)
  • ✅ format (format-gha-debug) · ⏱ 31.2s · ✨ Aspect · 🐙 GitHub Actions · ☑️ Check
    💬 Format complete (clean)
  • ✅ format (format-gha-ephemeral) · ⏱ 2m 15s · 🐙 GitHub Actions · ☑️ Check
    💬 Format complete (clean)
  • ✅ gazelle (gazelle-from-source-gha) · ⏱ 19.3s · ✨ Aspect · 🐙 GitHub Actions · ☑️ Check
    💬 Gazelle complete (clean)
  • ✅ gazelle (gazelle-from-source-gha-debug) · ⏱ 32.9s · ✨ Aspect · 🐙 GitHub Actions · ☑️ Check
    💬 Gazelle complete (clean)
  • ✅ gazelle (gazelle-gha) · ⏱ 15.7s · ✨ Aspect · 🐙 GitHub Actions · ☑️ Check
    💬 Gazelle complete (clean)
  • ✅ gazelle (gazelle-gha-debug) · ⏱ 14s · ✨ Aspect · 🐙 GitHub Actions · ☑️ Check
    💬 Gazelle complete (clean)
  • ✅ gazelle (gazelle-gha-ephemeral) · ⏱ 1m 20s · 🐙 GitHub Actions · ☑️ Check
    💬 Gazelle complete (clean)
  • ✅ lint (lint-gha) · ⏱ 19.1s · ✨ Aspect · 🐙 GitHub Actions · ☑️ Check
    💬 Lint complete (clean)
  • ✅ lint (lint-gha-debug) · ⏱ 18.5s · ✨ Aspect · 🐙 GitHub Actions · ☑️ Check
    💬 Lint complete (clean)
  • ✅ test (test-gha) · ⏱ 25.9s · ✨ Aspect · 🐙 GitHub Actions · ☑️ Check
    💬 Bazel test complete (25/25 passed · 25 cached)
  • ✅ test (test-gha-debug) · ⏱ 1m · ✨ Aspect · 🐙 GitHub Actions · ☑️ Check
    💬 Bazel test complete (25/25 passed · 25 cached)

🔁 Reproduce

⚠️ delivery (delivery-gha · delivery-gha-debug)

aspect delivery --mode=always --track-state=false --dry-run=true

CI ran --mode=selective; --mode=always is the off-runner equivalent (selective change detection needs the Aspect Workflows delivery state backend).


⏱ Last updated Fri May 22 00:24:58 UTC 2026 · 📊 GitHub API quota 458/15,000 (3% used, resets in 58m)
🚀 Powered by Aspect CLI (v0.0.0-dev)  |  Aspect Build · X · LinkedIn · YouTube

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 92b8bb7dc9

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread crates/aspect-cli/src/builtins/aspect/feature/github_status_comments.axl Outdated
@gregmagolan gregmagolan force-pushed the fix-ghsc-cross-workflow-merge branch 8 times, most recently from 6062525 to 13d3f26 Compare May 22, 2026 00:16
@gregmagolan gregmagolan force-pushed the fix-ghsc-cross-workflow-merge branch from 13d3f26 to c755f86 Compare May 22, 2026 00:21
gregmagolan and others added 3 commits May 21, 2026 21:52
Add a fast-path before the comment GET in `_publish`: when the
swarm-local lease holds, the combined cross-workflow lease (which is
`max(state.last_upsert_epoch, swarm-local)`) also holds, so we wouldn't
PATCH this cycle either way. Short-circuit before the GET and the
per-sibling-workflow `list_run_artifacts` refresh.

Effect: N swarm members → 1 GET per cycle (the leaseholder's) instead
of N. Sole-swarm and multi-workflow PRs both benefit.

Correctness: skipping cross-workflow discovery for one cycle is
harmless — any sibling workflow's data we missed will land in the next
cycle when our swarm-local lease relaxes and we run the full path.
Terminal updates (`final=True`) always run the full path so the final
verdict picks up the freshest cross-workflow state.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When the leaseholder GETs the comment and parses the state block, bump
our own `_last_pr_comment_upsert_epoch_s` to
`max(state.last_upsert_epoch_s, our prior stamp)`. The next
`_publish_self` carries the higher value on our artifact, so sibling
swarm members' `_collect_sibling_entries` sees it via
`_max_recent_upsert_epoch`.

Effect: their swarm-local fast-path can short-circuit when a
sibling-workflow recently published, even before our own swarm has
done its own first PATCH. Only the leaseholder per swarm ever pays the
cross-workflow GET cost.

Field semantics widened from "epoch of our last publish" to "high-water
mark of any PR-comment upsert we're aware of (ours or observed via the
state block)". The lease decision math is unchanged — both meanings
contribute to "someone published at T", which is what the lease cares
about.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Trim three comment blocks in `_publish` (swarm lease + fast path,
combined lease check, two-renders) from ~45 lines to ~20. Update the
file docstring to mention cross-workflow refresh and the GET role of
the `prs.write` token. Replace the stale `_merge_entries` reference in
`_own_payload` with the actual field roles (`_merge_state` snapshot
bucketing + display-only `workflow_name`). Reword
`_collect_other_workflow_entries` docstring so the race-elimination
claim sits where it's accurate (the system as a whole, not this
function alone).

No behavior change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@gregmagolan gregmagolan force-pushed the fix-ghsc-cross-workflow-merge branch from c90edd4 to 551864f Compare May 22, 2026 05:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant