Skip to content

PR checks do not create fresh agent sessions on PR updates; stale task IDs re-post to new commits #12382

@smonett

Description

@smonett

Before submitting your bug report

Relevant environment info

  • OS: Darwin 25.3.0
  • Continue version: cn CLI 1.5.45 (/opt/homebrew/bin/cn)
  • IDE version: Cursor
  • Integration: Continue GitHub App installed at GitHub organization level
  • GitHub App permissions observed: statuses: write, checks: write, pull_requests: write, contents: write, actions: read
  • Check definitions: three repo check files under .continue/checks/:
    • anti-slop.md
    • test-coverage.md
    • accessibility.md
  • Repo / PR where observed: crosswindholdings/agent-stack PR Improve telemetry opt out experience #60
  • Branch protection note: these Continuous AI checks were advisory in this repo, not merge-required

No access tokens, user email, or local config secrets are included here.

Description

On a PR update (pull_request.synchronize), Continue appears to re-post the original PR-open check agent sessions as GitHub commit statuses on the new head commit instead of creating fresh agent sessions that evaluate the updated diff.

The documented behavior at https://docs.continue.dev/checks/running-in-ci says:

"When a pull request is opened, reopened, or updated: Each check file runs independently against the PR diff."

In this incident, the PR was opened and then received 5 follow-up commits attempting to address an Anti-Slop failure. The same three Continue task IDs were re-used on every investigated head commit:

  • Continuous AI: Anti-Slop: 35ebd5d9-a928-440c-954e-c37edac12691
  • Continuous AI: Test Coverage: b8aeaff7-4911-4a01-a4ef-24cac375a2df
  • Continuous AI: Accessibility: 4cf5f5da-5918-491d-a499-ddf65390b72e

The Continue agents API also showed only the original three PR-open agent sessions for this PR, all created around 2026-05-12T14:25:27-28Z. No new sessions appeared for the five later commits.

This matters because the status looked like a live check result, but it was actually an old result being re-posted. In our workflow this consumed a lot of debugging time and created misleading CI feedback:

  • 6 PR head commits received Continuous AI statuses.
  • 5 follow-up commits were pushed after the original verdict.
  • 3 check task IDs were re-used across all checked commits.
  • 0 fresh check agent sessions were observed after PR open.
  • 2 close/reopen attempts did not produce a fresh evaluation.
  • 1 empty commit was pushed solely to try to trigger re-evaluation.
  • The visible failure stayed unchanged for roughly 6 hours while we tried normal remediation paths.
  • The failed Anti-Slop agent had suggestionStatus=null and diff=null, so the normal cn checks reject path could not dismiss or refresh it.

I am filing this with the assumption that this is an unintended propagation/cache behavior rather than expected product behavior. The report below separates observed evidence from the inferred server-side cause.

To reproduce

Minimal reproduction protocol for a repository with Continue GitHub checks enabled:

  1. Create or use a repo with multiple .continue/checks/*.md checks.
  2. Open a PR that intentionally trips one check, such as Anti-Slop.
  3. Record the task IDs from the first head commit:
gh api repos/{owner}/{repo}/commits/{sha1}/statuses \
  --jq '.[] | select(.context | startswith("Continuous AI")) | {context, state, description, created_at, target_url}'
  1. Push a fix commit that removes the flagged content.
  2. Record the task IDs from the new head commit:
gh api repos/{owner}/{repo}/commits/{sha2}/statuses \
  --jq '.[] | select(.context | startswith("Continuous AI")) | {context, state, description, created_at, target_url}'
  1. Compare the taskId query parameter in each target_url.

Expected:

  • The updated PR head gets fresh check sessions.
  • The new commit statuses point to new task IDs.
  • The check result reflects the updated diff.

Observed in PR #60:

  • The new commit statuses used the same task IDs as the original PR-open run.
  • Continue's agent API showed no new agent sessions after the original PR-open sessions.
  • The failed Anti-Slop status continued to describe the original issue rather than a fresh evaluation of the fixed diff.

If you have server-side access, a useful internal confirmation would be:

  • Count check agent sessions for the PR by triggerPullRequestUrl and creation time.
  • Compare that count to the number of PR open/synchronize events and check files.
  • For PR Improve telemetry opt out experience #60, expected after six commit events and three checks would be fresh sessions per event; observed externally was only three sessions total, all from the initial PR-open event.

Log output

Status history on the original PR commit included pending/running statuses and final results:

# Initial PR commit: 55326fc
Continuous AI: Anti-Slop
  state: failure
  description: Remove AI slop comments from pricing_normalizer.py
  target_url: https://hub.continue.dev/inbox/pr/crosswindholdings/agent-stack/60?taskId=35ebd5d9-a928-440c-954e-c37edac12691...

Continuous AI: Test Coverage
  state: success
  target_url: ...taskId=b8aeaff7-4911-4a01-a4ef-24cac375a2df...

Continuous AI: Accessibility
  state: success
  target_url: ...taskId=4cf5f5da-5918-491d-a499-ddf65390b72e...

The same task IDs were then re-posted on later PR head commits:

2e2a441  Anti-Slop taskId=35ebd5d9-a928-440c-954e-c37edac12691
b51a620  Anti-Slop taskId=35ebd5d9-a928-440c-954e-c37edac12691
ce2cf07  Anti-Slop taskId=35ebd5d9-a928-440c-954e-c37edac12691
9c291ec  Anti-Slop taskId=35ebd5d9-a928-440c-954e-c37edac12691
beca467  Anti-Slop taskId=35ebd5d9-a928-440c-954e-c37edac12691

The final head commit also had all three Continue statuses posted at the same timestamp:

# Final head: beca467
Continuous AI: Anti-Slop     created_at=2026-05-12T16:48:52Z taskId=35ebd5d9-a928-440c-954e-c37edac12691
Continuous AI: Test Coverage created_at=2026-05-12T16:48:52Z taskId=b8aeaff7-4911-4a01-a4ef-24cac375a2df
Continuous AI: Accessibility created_at=2026-05-12T16:48:52Z taskId=4cf5f5da-5918-491d-a499-ddf65390b72e

Continue agent API evidence, sanitized to the relevant fields:

{
  "id": "35ebd5d9-a928-440c-954e-c37edac12691",
  "createdAt": "2026-05-12T14:25:27.491Z",
  "updatedAt": "2026-05-12T14:28:30.756Z",
  "workflowName": "Anti-Slop",
  "triggerPullRequestUrl": "https://github.com/crosswindholdings/agent-stack/pull/60",
  "summary": "Suggestion submitted: Removed 6 AI slop comments from `pricing_normalizer.py`...",
  "suggestionStatus": null,
  "lastCommitHash": null,
  "lastCommitMessage": null
}

GET /agents/35ebd5d9-a928-440c-954e-c37edac12691/diff returned:

{"diff": null}

Related but separate CLI symptom:

/opt/homebrew/bin/cn --version
1.5.45

/opt/homebrew/bin/cn checks https://github.com/crosswindholdings/agent-stack/pull/60
# no output, exit 0

/opt/homebrew/bin/cn --org scotts-workspace-22 checks https://github.com/crosswindholdings/agent-stack/pull/60
# no output, exit 0

A direct call to the checks status API with the same authenticated local account returned:

{"error":"Failed to fetch check status"}

That CLI/API issue may be org-scope related and may deserve a separate issue. I included it because it removed the normal local path for seeing/rejecting the stale check state.

Suggested fix direction

This is only a hypothesis from the outside, but the observable behavior suggests the PR check/status path is keyed too broadly, likely by PR and check name/session, rather than by current head SHA.

On pull_request.synchronize, Continue likely needs to:

  1. Create fresh check agent sessions keyed by at least (repo, PR number, head SHA, check file/check content hash).
  2. Post GitHub commit statuses whose target_url points to those fresh task IDs.
  3. Optionally mark older PR check sessions as superseded so they are not treated as actionable against the current head.
  4. Make cn checks fail loudly if it cannot retrieve org-scoped check status, rather than exiting 0 with no output.

Thanks for looking at this. We are using these checks as a serious feedback signal, so stale status propagation is high-impact even when the checks are advisory rather than branch-protection-required.

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind:bugIndicates an unexpected problem or unintended behavior

    Type

    No type

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions