githubnext · mrjf · Jun 27, 2026 · Jun 27, 2026
diff --git a/.github/workflows/evergreen.lock.yml b/.github/workflows/evergreen.lock.yml
diff --git a/.github/workflows/evergreen.md b/.github/workflows/evergreen.md
diff --git a/.github/workflows/shared/evergreen/repo-policy.md b/.github/workflows/shared/evergreen/repo-policy.md
@@ -0,0 +1,79 @@
+# Evergreen Repo Policy
+
+## Merge Gates
+
+Evergreen may report `evergreen-ready` only when the target PR currently has the `evergreen` label and all installed gates are satisfied.
+
+Required gates inferred for this repository:
+- PR is open and not draft.
+- PR is conflict-free and mergeable.
+- Current PR head SHA has passing CI gates:
+  - `Test & Lint`
+  - `Playground E2E (Playwright)`
+  - `Validate Python Examples`
+  - `Build`
+- `OpenEvolve benchmark` is required only for `autoloop/*-evolve` branches or when that check is present and non-skipped for the current head.
+- Requested-changes reviews, unresolved review threads, and explicit maintainer blocker comments prevent ready status.
+
+Repository settings observed during install:
+- Default branch: `main`.
+- No branch protection or rulesets were configured on `main`.
+- GitHub auto-merge is enabled. Evergreen must never directly merge PRs, but making a PR green can indirectly allow a PR with auto-merge enabled to merge.
+
+## Branch Updates
+
+Prefer merging `main` into the PR branch when the branch is behind and freshness is needed. Do not force-push, rebase, squash, or amend. If a fork or permission boundary prevents branch updates, comment and apply `evergreen-human-needed`.
+
+## CI/CD Activation
+
+Use `GITHUB_TOKEN` where possible. The repository already has `GH_AW_CI_TRIGGER_TOKEN`; use it only when token-authored pushes or workflow dispatch are needed to trigger CI. Scheduled observation alone must not rerun green checks.
+
+Prefer actions in this order:
+- Wait for current pending checks.
+- Dispatch `CI` when checks are missing or stale.
+- Push a verified repair commit when a failure is understood.
+- Use an empty trigger commit only as a last resort, and do not count it as a semantic repair attempt.
+
+## Repair Policy
+
+Allowed repair areas:
+- `src/**`
+- `tests/**`
+- `tests-e2e/**`
+- `playground/**`
+- `golden/**`
+- `scripts/**`
+- `benchmarks/**`
+- `docs/**`
+
+Forbidden or human-confirmation areas:
+- Never edit `README.md`.
+- Never edit `.autoloop/programs/**`.
+- Do not edit workflow, agentic workflow, or skill files at runtime unless the PR is explicitly about those files and a human has confirmed the workflow-change path.
+- Treat `package.json`, `bun.lock`, `tsconfig.json`, `biome.json`, and `bunfig.toml` as protected high-risk files.
+
+Strict TypeScript rules:
+- No `any`.
+- No `as` casts.
+- No `@ts-ignore` or equivalent escape hatches.
+- Keep zero core dependencies.
+
+## Review Policy
+
+Evergreen must not mark draft PRs ready for review, request reviewers, approve PRs, or resolve review threads. If a reviewer or maintainer decision is needed, label `evergreen-human-needed` and explain the decision needed.
+
+## Quotas
+
+Quota unit: one continuous application of the `evergreen` label.
+
+Defaults:
+- Maximum 10 Evergreen runs per PR label application.
+- Maximum 3 semantic repair attempts per failure signature.
+- Maximum 100K AI credits per PR label application.
+- Maximum 6 hours wall-clock per run.
+
+When quota is exhausted, remove `evergreen`, add `evergreen-exhausted`, and leave a concise blocker comment.
+
+## Discovered Repo Context
+
+This is a Bun/TypeScript package named `tsb`, a TypeScript port of pandas. CI runs typecheck, lint, unit tests with coverage, pandas golden snapshot validation, cross-validation tests, Playwright playground e2e, Python example validation, and browser build. Recent merged PRs consistently passed the main CI gates. Autoloop PRs commonly have duplicate push and pull_request check runs.
diff --git a/.github/workflows/shared/evergreen/report-template.md b/.github/workflows/shared/evergreen/report-template.md
@@ -0,0 +1,19 @@
+# Evergreen Report Template
+
+Keep comments short and evidence-backed.
+
+Use this shape:
+
+```markdown
+### Evergreen status
+
+| Gate | State | Evidence |
+| --- | --- | --- |
+| Test & Lint | passing | current head SHA |
+
+**Result:** ready | blocked | human-needed | exhausted | continuing
+
+Next action: one sentence.
+```
+
+Include at most three workflow run links. Do not paste long logs; summarize the failure signature and link to the run.
diff --git a/.github/workflows/shared/evergreen/safe-output-policy.md b/.github/workflows/shared/evergreen/safe-output-policy.md
@@ -0,0 +1,18 @@
+# Evergreen Safe Output Policy
+
+Use safe outputs for every visible write or branch mutation.
+
+Allowed actions:
+- Add concise PR comments.
+- Add or remove only Evergreen state labels.
+- Dispatch the `CI` workflow when checks are missing, stale, or blocked.
+- Push repair commits to PR branches that still have the `evergreen` label.
+
+Disallowed actions:
+- Do not merge PRs.
+- Do not approve PRs.
+- Do not resolve review threads.
+- Do not request reviewers.
+- Do not use shell commands or GitHub tools for write operations that have configured safe outputs.
+
+Before reporting success, verify the side effect with current GitHub state.
diff --git a/.github/workflows/shared/evergreen/state-labels.md b/.github/workflows/shared/evergreen/state-labels.md
@@ -0,0 +1,11 @@
+# Evergreen State Labels
+
+Use labels as current state, not as a run log.
+
+- `evergreen`: persistent opt-in and permission for Evergreen to work on a PR.
+- `evergreen-ready`: all configured merge gates currently pass.
+- `evergreen-blocked`: a blocker exists, but Evergreen may keep monitoring.
+- `evergreen-human-needed`: a human decision, credential, review, or protected edit is required.
+- `evergreen-exhausted`: quota was exhausted; remove `evergreen` when applying this label.
+
+Remove stale state labels when the underlying state changes. For example, remove `evergreen-ready` after a new commit, stale check, failing gate, or newly discovered blocker.
diff --git a/.github/workflows/shared/skills/attempt-memory-writer.md b/.github/workflows/shared/skills/attempt-memory-writer.md
@@ -0,0 +1,11 @@
+# Skill: attempt-memory-writer
+
+Store semantic attempt state without turning memory into a run log.
+
+Write small structured entries under `/tmp/gh-aw/repo-memory/evergreen/`:
+- `ci-signatures.jsonl` for reusable failure signatures and outcomes.
+- `skill-outcomes.jsonl` for selected skills and whether they helped.
+- `review-patterns.jsonl` for repeated merge-blocking human feedback.
+- `velocity.jsonl` for label-to-action, label-to-green, and label-to-ready timing.
+
+Do not count empty CI trigger commits as semantic repair attempts. Preserve enough source identifiers to audit the memory later.
diff --git a/.github/workflows/shared/skills/autoloop-coordinator.md b/.github/workflows/shared/skills/autoloop-coordinator.md
@@ -0,0 +1,10 @@
+# Skill: autoloop-coordinator
+
+Use this for PRs from `autoloop/**` branches or PRs labeled `autoloop`.
+
+Coordinate with the installed Autoloop conventions:
+- Do not edit `.autoloop/programs/**`.
+- Treat duplicate push and pull_request CI runs as one logical gate.
+- For `autoloop/*-evolve` branches, include `OpenEvolve benchmark` when deciding readiness.
+- Prefer fixing merge blockers over continuing feature iteration.
+- Preserve evidence that Autoloop can use after Evergreen finishes.
diff --git a/.github/workflows/shared/skills/ci-gate-evaluator.md b/.github/workflows/shared/skills/ci-gate-evaluator.md
@@ -0,0 +1,15 @@
+# Skill: ci-gate-evaluator
+
+Evaluate CI/check gates for the current PR head SHA.
+
+Classify each gate as:
+- passing
+- failing
+- pending
+- stale
+- missing
+- skipped but expected
+- skipped and acceptable
+- blocked by permissions or environment
+
+Recommend the smallest next action: wait, rerun/dispatch, parse logs, repair deterministic failure, merge base, or ask a human.
diff --git a/.github/workflows/shared/skills/ci-log-parser.md b/.github/workflows/shared/skills/ci-log-parser.md
@@ -0,0 +1,12 @@
+# Skill: ci-log-parser
+
+Extract normalized failure signatures from failing CI logs.
+
+For each failing check, capture:
+- check name and workflow name
+- failing command or step
+- failure class: typecheck, lint, unit test, cross-validation, golden snapshot, e2e, build, dependency install, infrastructure, timeout, or unknown
+- relevant file, line, test name, or stack frame when present
+- whether the next move is deterministic repair, targeted reproduction, rerun, or human escalation
+
+Keep signatures compact enough to store in `ci-signatures.jsonl`.
diff --git a/.github/workflows/shared/skills/ci-run-deduper.md b/.github/workflows/shared/skills/ci-run-deduper.md
@@ -0,0 +1,11 @@
+# Skill: ci-run-deduper
+
+Collapse duplicate check runs into logical merge gates.
+
+Group check runs by:
+- PR head SHA
+- workflow name
+- job/check name
+- conclusion and status
+
+Treat duplicate `push` and `pull_request` runs for the same head as one logical gate unless their conclusions disagree. Ignore stale check runs from older SHAs. Return the logical gate list and the raw run IDs used as evidence.
diff --git a/.github/workflows/shared/skills/dependency-gate-repair.md b/.github/workflows/shared/skills/dependency-gate-repair.md
@@ -0,0 +1,5 @@
+# Skill: dependency-gate-repair
+
+Use this when dependency installation, lockfiles, package manifests, or supply-chain checks block mergeability.
+
+This repository has zero core dependencies. Treat changes to `package.json`, `bun.lock`, `tsconfig.json`, `biome.json`, and `bunfig.toml` as high-risk protected edits unless the PR is explicitly about dependency/tooling work. If a protected edit is required, label `evergreen-human-needed` and explain the smallest requested human action.
diff --git a/.github/workflows/shared/skills/deterministic-repair.md b/.github/workflows/shared/skills/deterministic-repair.md
@@ -0,0 +1,14 @@
+# Skill: deterministic-repair
+
+Prefer repo-native deterministic work before agentic edits.
+
+Use the installed repo policy and discovered scripts. For this repository, prefer:
+- `bun run typecheck`
+- `bun run lint`
+- `bun test ./tests/`
+- `bun run test:e2e`
+- `bun run build`
+- `python scripts/validate-python-examples.py playground/`
+- `python golden/generate.py` followed by `git diff --exit-code -- golden/snapshots`
+
+Run targeted commands first when a failure points to a specific area. Apply the smallest patch that can make the gate pass.
diff --git a/.github/workflows/shared/skills/diff-risk-map.md b/.github/workflows/shared/skills/diff-risk-map.md
@@ -0,0 +1,20 @@
+# Skill: diff-risk-map
+
+Classify changed files so the orchestrator can choose focused repair skills.
+
+Risk classes:
+- tests only
+- docs or examples
+- playground/browser UI
+- public API or exported TypeScript
+- core data structure behavior
+- I/O or golden snapshot behavior
+- dependency or lockfile
+- CI, workflow, or agent configuration
+- Autoloop generated or program files
+- benchmark or performance behavior
+
+Report:
+- The primary risk class.
+- Conditional skills that should run and why.
+- Files that require human confirmation before edit under the installed policy.
diff --git a/.github/workflows/shared/skills/docs-release-gate-repair.md b/.github/workflows/shared/skills/docs-release-gate-repair.md
@@ -0,0 +1,8 @@
+# Skill: docs-release-gate-repair
+
+Use this when docs, examples, golden snapshots, or generated documentation block mergeability.
+
+Respect repository rules:
+- Do not edit `README.md` unless the PR explicitly requires it and a human confirms.
+- Keep docs changes tied to a failing gate or explicit blocker.
+- For playground examples, validate Python snippets with `python scripts/validate-python-examples.py playground/`.
diff --git a/.github/workflows/shared/skills/infra-ci-repair.md b/.github/workflows/shared/skills/infra-ci-repair.md
@@ -0,0 +1,5 @@
+# Skill: infra-ci-repair
+
+Use this when the blocker is in GitHub Actions, runner setup, package installation, workflow permissions, generated workflow locks, or CI activation.
+
+First determine whether the failure is caused by the PR code, workflow configuration, external infrastructure, or missing credentials. For workflow-source changes under `.github/workflows/*.md`, the repo requires `gh aw compile` and `apm compile`; if the runtime cannot safely push workflow files, label `evergreen-human-needed` and comment with exact next steps.
diff --git a/.github/workflows/shared/skills/lint-policy-review.md b/.github/workflows/shared/skills/lint-policy-review.md
@@ -0,0 +1,5 @@
+# Skill: lint-policy-review
+
+Use this when lint or formatting failures look like repository policy, not a local code mistake.
+
+Check whether the failure can be fixed with `bun run lint:fix`. If the fix would violate strict TypeScript rules, introduce `any`, use `as`, add ignore comments, or change style configuration, stop and ask for human input with `evergreen-human-needed`.
diff --git a/.github/workflows/shared/skills/merge-blocker-comment-reader.md b/.github/workflows/shared/skills/merge-blocker-comment-reader.md
@@ -0,0 +1,12 @@
+# Skill: merge-blocker-comment-reader
+
+Read human comments only for merge-blocking signals.
+
+Look for:
+- requested changes reviews
+- unresolved review threads
+- comments that explicitly block merge
+- comments asking for required tests, docs, screenshots, or policy decisions
+- maintainer instructions that change the repair plan
+
+Ignore non-blocking suggestions, thanks, progress chatter, and general discussion. Return a blocker map with source URLs.
diff --git a/.github/workflows/shared/skills/merge-gate-reporter.md b/.github/workflows/shared/skills/merge-gate-reporter.md
@@ -0,0 +1,14 @@
+# Skill: merge-gate-reporter
+
+Decide whether the PR is ready, blocked, human-needed, exhausted, or should continue.
+
+Evaluate:
+- draft state
+- conflict and mergeability state
+- current-head CI gates
+- review decision and unresolved threads
+- required and blocker labels
+- repo-specific policy gates
+- quota state
+
+Produce a concise gate table for comments. State only evidence-backed conclusions. Never directly merge a pull request.
diff --git a/.github/workflows/shared/skills/playground-e2e-diagnoser.md b/.github/workflows/shared/skills/playground-e2e-diagnoser.md
@@ -0,0 +1,12 @@
+# Skill: playground-e2e-diagnoser
+
+Use this when Playwright, browser, playground, or visual behavior blocks mergeability.
+
+Collect:
+- failing test name and trace/artifact links
+- browser console errors
+- network failures
+- screenshots when available
+- whether the failure reproduces with `bun run test:e2e`
+
+Do not call a failure flaky without evidence from a rerun or a known memory signature.
diff --git a/.github/workflows/shared/skills/pr-intake.md b/.github/workflows/shared/skills/pr-intake.md
@@ -0,0 +1,13 @@
+# Skill: pr-intake
+
+Build a factual snapshot of the target pull request before choosing any repair.
+
+Collect:
+- PR number, title, author, draft state, base branch, head branch, and head SHA.
+- Current labels and whether the `evergreen` label is still present.
+- Changed files, grouped by directory and file type.
+- Current mergeability, behind/base freshness, review decision, and unresolved review threads when available.
+- Current check runs and workflow runs for the PR head SHA only.
+- Recent PR comments, reviews, and review comments that may affect mergeability.
+
+Return facts only. Do not recommend fixes from this skill.
diff --git a/.github/workflows/shared/skills/repo-memory-reader.md b/.github/workflows/shared/skills/repo-memory-reader.md
@@ -0,0 +1,13 @@
+# Skill: repo-memory-reader
+
+Load durable repository knowledge before making a repair plan.
+
+Read relevant files under `/tmp/gh-aw/repo-memory/evergreen/` if they exist:
+- `gates.json`
+- `labels.json`
+- `ci-signatures.jsonl`
+- `review-patterns.jsonl`
+- `skill-outcomes.jsonl`
+- `velocity.jsonl`
+
+Summarize only knowledge that applies to the current PR. Treat current GitHub state and the installed repo policy as more authoritative than memory when they disagree. Do not copy large memory files into comments.
diff --git a/.github/workflows/shared/skills/safe-output-verifier.md b/.github/workflows/shared/skills/safe-output-verifier.md
@@ -0,0 +1,11 @@
+# Skill: safe-output-verifier
+
+Verify intended GitHub side effects before reporting success.
+
+After requesting a safe output:
+- For labels, reload the issue/PR and confirm the expected labels changed.
+- For comments, confirm the comment exists and points at the intended PR.
+- For branch pushes, confirm the PR head SHA changed to the expected commit and only allowed files changed.
+- For workflow dispatch, confirm the dispatch request was accepted or explain what could not be verified.
+
+If verification fails, report the operation as blocked. Do not use completion language for unverified side effects.