Skip to content
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
1ae1b6c
Add babysit-pr skill
enyst Feb 24, 2026
9c03486
Add /babysit trigger for babysit-pr
openhands-agent Feb 24, 2026
0c0e183
Expand default review bot keywords
openhands-agent Feb 24, 2026
894f26d
Apply suggestion from @enyst
enyst Feb 24, 2026
72e0af9
babysit-pr: document optional re-review workflow
openhands-agent Feb 24, 2026
67a671e
fix: surface autospawned bot reviews in babysit-pr watcher
enyst Feb 24, 2026
af3f569
fix: simplify babysit-pr watcher state and add unit tests
enyst Feb 24, 2026
3306586
fix: normalize review bot keywords with optional [bot] suffix
enyst Feb 24, 2026
cefd92d
chore: prefer all-hands-bot keyword in babysit-pr watcher defaults
enyst Feb 24, 2026
b217518
chore: dedupe review bot normalization helpers
enyst Feb 24, 2026
8c500dd
fix: handle API rate limits and recover from transient watch errors
enyst Feb 24, 2026
b8fa5c9
perf: cache authenticated login and remove unused fresh_state arg
enyst Feb 24, 2026
835e97c
test: cover legacy state migration and refactor helper
enyst Feb 24, 2026
597789e
chore: restore babysit-pr skill description
enyst Feb 24, 2026
123fd90
chore: restore detailed monitoring loop and polling cadence
enyst Feb 24, 2026
dddc54c
Apply suggestion from @enyst
enyst Feb 24, 2026
758530f
Apply suggestion from @enyst
enyst Feb 24, 2026
e847e0e
Clarify PR comment guidelines
enyst Feb 24, 2026
21ee18d
Merge remote-tracking branch 'origin/main' into add-babysit-pr-skill
openhands-agent Mar 8, 2026
7ad2c04
Apply suggestion from @enyst
enyst Mar 9, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions .plugin/marketplace.json
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,20 @@
"pull-request"
]
},
{
"name": "babysit-pr",
"source": "./babysit-pr",
"description": "Babysit a GitHub pull request by monitoring CI checks, workflow runs, review comments, and mergeability until it is ready to merge.",
"category": "productivity",
"keywords": [
"github",
"pull-request",
"ci",
"actions",
"review",
"monitoring"
]
},
{
"name": "bitbucket",
"source": "./bitbucket",
Expand Down
19 changes: 19 additions & 0 deletions skills/babysit-pr/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# babysit-pr

Babysit a GitHub pull request by monitoring CI checks/workflow runs, review comments, and mergeability until the PR is ready to merge (or merged/closed).

## Triggers

This skill is activated by:

- `/babysit-pr`
- `/babysit`
- the agent may activate it if it needs to “babysit PR”, “watch PR”, “monitor CI”, or “check GitHub Actions”

## Details

- Requires the GitHub CLI (`gh`) to be available and authenticated.
- Uses `scripts/gh_pr_watch.py` to emit one-shot snapshots (`--once`) or a continuous JSONL stream (`--watch`).
- The watcher can surface review comments from approved review bots by matching keywords in the bot login.
- Defaults include: `openhands`, `openhands-agent`, `all-hands-bot`, `smolpaws`, `claude`, `codex`.
- Optional: set `BABYSIT_PR_REVIEW_BOT_KEYWORDS` (comma-separated) to allow additional bot keywords.
205 changes: 205 additions & 0 deletions skills/babysit-pr/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,205 @@
---
name: babysit-pr
description: Monitor a GitHub PR until ready to merge by handling CI and review feedback.
triggers:
- /babysit-pr
- /babysit
---

# PR Babysitter

## Objective
Babysit a PR persistently until one of these terminal outcomes occurs:

- The PR is merged or closed.
- CI is successful, there are no unaddressed review comments surfaced by the watcher, required review approval is not blocking merge, and there are no potential merge conflicts (PR is mergeable / not reporting conflict risk).
- A situation requires user help (for example CI infrastructure issues, repeated flaky failures after retry budget is exhausted, permission problems, or ambiguity that cannot be resolved safely).

Do not stop merely because a single snapshot returns `idle` while checks are still pending.

## Inputs
Accept any of the following:

- No PR argument: infer the PR from the current branch (`--pr auto`)
- PR number
- PR URL

## Core Workflow

1. When the user asks to "monitor"/"watch"/"babysit" a PR, start with the watcher's continuous mode (`--watch`) unless you are intentionally doing a one-shot diagnostic snapshot.
2. Run the watcher script to snapshot PR/CI/review state (or consume each streamed snapshot from `--watch`).
3. Inspect the `actions` list in the JSON response.
4. If `diagnose_ci_failure` is present, inspect failed run logs and classify the failure.
5. If the failure is likely caused by the current branch, patch code locally, commit, and push.
6. If `process_review_comment` is present, inspect surfaced review items and decide whether to address them.
7. If a review item is actionable and correct, patch code locally, commit, and push.
8. If the failure is likely flaky/unrelated and `retry_failed_checks` is present, rerun failed jobs with `--retry-failed-now`.
9. If both actionable review feedback and `retry_failed_checks` are present, prioritize review feedback first; a new commit will retrigger CI, so avoid rerunning flaky checks on the old SHA unless you intentionally defer the review change.
10. On every loop, verify mergeability / merge-conflict status (for example via `gh pr view`) in addition to CI and review state.
11. After any push or rerun action, immediately return to step 1 and continue polling on the updated SHA/state.
12. If you had been using `--watch` before pausing to patch/commit/push, relaunch `--watch` yourself in the same turn immediately after the push (do not wait for the user to re-invoke the skill).
13. Repeat polling until the PR is green + review-clean + mergeable, `stop_pr_closed` appears, or a user-help-required blocker is reached.
14. Maintain terminal/session ownership: while babysitting is active, keep consuming watcher output in the same turn; do not leave a detached `--watch` process running and then end the turn as if monitoring were complete.

## Commands

### One-shot snapshot

```bash
python3 <this-skill-path>/scripts/gh_pr_watch.py --pr auto --once
```

### Continuous watch (JSONL)

```bash
python3 <this-skill-path>/scripts/gh_pr_watch.py --pr auto --watch
```

### Trigger flaky retry cycle (only when watcher indicates)

```bash
python3 <this-skill-path>/scripts/gh_pr_watch.py --pr auto --retry-failed-now
```

### Explicit PR target

```bash
python3 <this-skill-path>/scripts/gh_pr_watch.py --pr <number-or-url> --once
```

## CI Failure Classification
Use `gh` commands to inspect failed runs before deciding to rerun.

- `gh run view <run-id> --json jobs,name,workflowName,conclusion,status,url,headSha`
- `gh run view <run-id> --log-failed`

Prefer treating failures as branch-related when logs point to changed code (compile/test/lint/typecheck/snapshots/static analysis in touched areas).

Prefer treating failures as flaky/unrelated when logs show transient infra/external issues (timeouts, runner provisioning failures, registry/network outages, GitHub Actions infra errors).

If classification is ambiguous, perform one manual diagnosis attempt before choosing rerun.

Read `references/heuristics.md` for a concise checklist.

## Review Comment Handling
The watcher surfaces review items from:

- PR issue comments
- Inline review comments
- Review submissions (COMMENT / APPROVED / CHANGES_REQUESTED)

It can also surface feedback from approved review bots (configured in `scripts/gh_pr_watch.py`) in addition to human reviewer feedback. Ignore unrelated bot noise.
For safety, the watcher only auto-surfaces trusted human review authors (OWNER/MEMBER/COLLABORATOR, plus the authenticated operator) and approved review bots.
On a fresh watcher state file, existing pending review feedback may be surfaced immediately (not only comments that arrive after monitoring starts). This is intentional so already-open review comments are not missed.

When you agree with a comment and it is actionable:

1. Patch code locally.
2. Commit with `chore: address PR review feedback (#<n>)`.
3. Push to the PR head branch.
4. Resume watching on the new SHA immediately (do not stop after reporting the push).
5. If monitoring was running in `--watch` mode, restart `--watch` immediately after the push in the same turn; do not wait for the user to ask again.

If you disagree or the comment is non-actionable/already addressed, record it as handled by continuing the watcher loop (the script de-duplicates surfaced items via state after surfacing them).
If a code review comment/thread is already marked as resolved in GitHub, treat it as non-actionable and safely ignore it unless new unresolved follow-up feedback appears.

## Optional: Request (Re-)Review

If the PR is green/mergeable but blocked on approval (for example `reviewDecision` is `REVIEW_REQUIRED` or `CHANGES_REQUESTED`) and you believe you’ve addressed all surfaced feedback, you can request another look.

Rules:

- Only do this when the user explicitly asks you to request review / ping reviewers, or after confirming with the user (avoid spamming humans).
- Prefer requesting review only once per new head SHA.
- If permissions fail or it’s unclear who should review, stop and ask the user.

Suggested flow:

1. Leave a brief PR comment summarizing what changed and why re-review is needed.
```bash
gh pr comment <pr> --body "Addressed the requested changes in <sha>. Could you take another look? @reviewer"
Comment thread
enyst marked this conversation as resolved.
Outdated
```
2. (Optional) Re-request reviewers via the GitHub API.
Comment thread
enyst marked this conversation as resolved.
Outdated
```bash
gh api -X POST repos/<owner>/<repo>/pulls/<pr_number>/requested_reviewers \
-f reviewers[]=reviewer1 \
-f reviewers[]=reviewer2
# For team reviewers:
# -f team_reviewers[]=my-team
Comment thread
enyst marked this conversation as resolved.
Outdated
```

If the API returns an error indicating reviewers are already requested, treat it as non-fatal and continue monitoring.


## Git Safety Rules

- Work only on the PR head branch.
- Avoid destructive git commands.
- Do not switch branches unless necessary to recover context.
- Before editing, check for unrelated uncommitted changes. If present, stop and ask the user.
- After each successful fix, commit and `git push`, then re-run the watcher.
- If you interrupted a live `--watch` session to make the fix, restart `--watch` immediately after the push in the same turn.
- Do not run multiple concurrent `--watch` processes for the same PR/state file; keep one watcher session active and reuse it until it stops or you intentionally restart it.
- A push is not a terminal outcome; continue the monitoring loop unless a strict stop condition is met.

Commit message defaults:

- `fix: CI failure on PR #<n>`
- `chore: address PR review feedback (#<n>)`

## Monitoring Loop Pattern

Core loop:

1. Snapshot PR state (`--once`, or consume a `--watch` event).
2. If terminal (merged/closed/ready): stop.
3. If review items exist: address them, commit/push, and resume monitoring on the new SHA.
4. If CI failures exist: diagnose; fix and push when branch-related, or rerun when likely flaky and within retry budget.
5. Sleep per the polling cadence and repeat until a strict stop condition is met.

Prefer `--watch` for ongoing babysitting; use `--once` for one-shot diagnostics.

## Polling Cadence
Use adaptive polling and continue monitoring even after CI turns green:

- Poll every 1 minute while CI is not all-green.
- After CI turns green, back off exponentially when nothing changes (1m→2m→4m→…→60m max). Reset to 1 minute on any change.
- If any poll shows the PR is merged or closed: stop immediately and report the terminal state.

## Stop Conditions (Strict)
Stop only when one of the following is true:

- PR merged or closed (stop as soon as a poll/snapshot confirms this).
- PR is ready to merge: CI succeeded, no surfaced unaddressed review comments, not blocked on required review approval, and no merge conflict risk.
- User intervention is required and the agent cannot safely proceed alone.

Keep polling when:

- `actions` contains only `idle` but checks are still pending.
- CI is still running/queued.
- Review state is quiet but CI is not terminal.
- CI is green but mergeability is unknown/pending.
- CI is green and mergeable, but the PR is still open and you are waiting for possible new review comments or merge-conflict changes per the green-state cadence.
- The PR is green but blocked on review approval (`REVIEW_REQUIRED` / similar); continue polling on the green-state cadence and surface any new review comments without asking for confirmation to keep watching.

## Output Expectations
Provide concise progress updates while monitoring and a final summary that includes:

- During long unchanged monitoring periods, avoid emitting a full update on every poll; summarize only status changes plus occasional heartbeat updates.
- Treat push confirmations, intermediate CI snapshots, and review-action updates as progress updates only; do not emit the final summary or end the babysitting session unless a strict stop condition is met.
- A user request to "monitor" is not satisfied by a couple of sample polls; remain in the loop until a strict stop condition or an explicit user interruption.
- A review-fix commit + push is not a completion event; immediately resume live monitoring (`--watch`) in the same turn and continue reporting progress updates.
- When CI first transitions to all green for the current SHA, emit a one-time celebratory progress update (do not repeat it on every green poll). Preferred style: `🚀 CI is all green! 33/33 passed. Still on watch for review approval.`
- Do not send the final summary while a watcher terminal is still running unless the watcher has emitted/confirmed a strict stop condition; otherwise continue with progress updates.

- Final PR SHA
- CI status summary
- Mergeability / conflict status
- Fixes pushed
- Flaky retry cycles used
- Remaining unresolved failures or review comments

## References

- Heuristics and decision tree: `references/heuristics.md`
- GitHub CLI/API details used by the watcher: `references/github-api-notes.md`
72 changes: 72 additions & 0 deletions skills/babysit-pr/references/github-api-notes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# GitHub CLI / API Notes For `babysit-pr`

## Primary commands used

### PR metadata

- `gh pr view --json number,url,state,mergedAt,closedAt,headRefName,headRefOid,headRepository,headRepositoryOwner`

Used to resolve PR number, URL, branch, head SHA, and closed/merged state.

### PR checks summary

- `gh pr checks --json name,state,bucket,link,workflow,event,startedAt,completedAt`

Used to compute pending/failed/passed counts and whether the current CI round is terminal.

### Workflow runs for head SHA

- `gh api repos/{owner}/{repo}/actions/runs -X GET -f head_sha=<sha> -f per_page=100`

Used to discover failed workflow runs and rerunnable run IDs.

### Failed log inspection

- `gh run view <run-id> --json jobs,name,workflowName,conclusion,status,url,headSha`
- `gh run view <run-id> --log-failed`

Used by the agent to classify branch-related vs flaky/unrelated failures.

### Retry failed jobs only

- `gh run rerun <run-id> --failed`

Reruns only failed jobs (and dependencies) for a workflow run.

## Review-related endpoints

- Issue comments on PR:
- `gh api repos/{owner}/{repo}/issues/<pr_number>/comments?per_page=100`
- Inline PR review comments:
- `gh api repos/{owner}/{repo}/pulls/<pr_number>/comments?per_page=100`
- Review submissions:
- `gh api repos/{owner}/{repo}/pulls/<pr_number>/reviews?per_page=100`

## JSON fields consumed by the watcher

### `gh pr view`

- `number`
- `url`
- `state`
- `mergedAt`
- `closedAt`
- `headRefName`
- `headRefOid`

### `gh pr checks`

- `bucket` (`pass`, `fail`, `pending`, `skipping`)
- `state`
- `name`
- `workflow`
- `link`

### Actions runs API (`workflow_runs[]`)

- `id`
- `name`
- `status`
- `conclusion`
- `html_url`
- `head_sha`
58 changes: 58 additions & 0 deletions skills/babysit-pr/references/heuristics.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
# CI / Review Heuristics

## CI classification checklist

Treat as **branch-related** when logs clearly indicate a regression caused by the PR branch:

- Compile/typecheck/lint failures in files or modules touched by the branch
- Deterministic unit/integration test failures in changed areas
- Snapshot output changes caused by UI/text changes in the branch
- Static analysis violations introduced by the latest push
- Build script/config changes in the PR causing a deterministic failure

Treat as **likely flaky or unrelated** when evidence points to transient or external issues:

- DNS/network/registry timeout errors while fetching dependencies
- Runner image provisioning or startup failures
- GitHub Actions infrastructure/service outages
- Cloud/service rate limits or transient API outages
- Non-deterministic failures in unrelated integration tests with known flake patterns

If uncertain, inspect failed logs once before choosing rerun.

## Decision tree (fix vs rerun vs stop)

1. If PR is merged/closed: stop.
2. If there are failed checks:
- Diagnose first.
- If branch-related: fix locally, commit, push.
- If likely flaky/unrelated and all checks for the current SHA are terminal: rerun failed jobs.
- If checks are still pending: wait.
3. If flaky reruns for the same SHA reach the configured limit (default 3): stop and report persistent failure.
4. Independently, process any new human review comments.

## Review comment agreement criteria

Address the comment when:

- The comment is technically correct.
- The change is actionable in the current branch.
- The requested change does not conflict with the user’s intent or recent guidance.
- The change can be made safely without unrelated refactors.

Do not auto-fix when:

- The comment is ambiguous and needs clarification.
- The request conflicts with explicit user instructions.
- The proposed change requires product/design decisions the user has not made.
- The codebase is in a dirty/unrelated state that makes safe editing uncertain.

## Stop-and-ask conditions

Stop and ask the user instead of continuing automatically when:

- The local worktree has unrelated uncommitted changes.
- `gh` auth/permissions fail.
- The PR branch cannot be pushed.
- CI failures persist after the flaky retry budget.
- Reviewer feedback requires a product decision or cross-team coordination.
Loading
Loading