From 6e8df2a1b7bf3d78d0e2b041a2d4b4c09a33afba Mon Sep 17 00:00:00 2001 From: mrjf Date: Wed, 11 Mar 2026 20:32:54 -0700 Subject: [PATCH 1/7] Add Autoloop workflow: iterative optimization agent Adds an Autoresearch-inspired workflow that runs on a schedule to autonomously improve a target artifact toward a measurable goal. Each iteration proposes a change, evaluates it against a metric, and keeps only improvements (ratchet pattern). Includes a template program.md with a setup guard sentinel that prevents the workflow from running until the user configures their optimization goal, target files, and evaluation command. Co-Authored-By: Claude Opus 4.6 --- README.md | 1 + docs/autoloop.md | 161 ++++++++++++++++++++ workflows/autoloop.md | 266 ++++++++++++++++++++++++++++++++++ workflows/autoloop/program.md | 125 ++++++++++++++++ 4 files changed, 553 insertions(+) create mode 100644 docs/autoloop.md create mode 100644 workflows/autoloop.md create mode 100644 workflows/autoloop/program.md diff --git a/README.md b/README.md index 45e9ba2..93bb873 100644 --- a/README.md +++ b/README.md @@ -22,6 +22,7 @@ A sample family of reusable [GitHub Agentic Workflows](https://github.github.com ### Research, Status & Planning Workflows +- [🔄 Autoloop](docs/autoloop.md) - Iterative optimization agent that proposes changes, evaluates against a metric, and keeps only improvements - [📚 Weekly Research](docs/weekly-research.md) - Collect research updates and industry trends - [📊 Weekly Issue Summary](docs/weekly-issue-summary.md) - Weekly issue activity report with trend charts and recommendations - [👥 Daily Repo Status](docs/daily-repo-status.md) - Assess repository activity and create status reports diff --git a/docs/autoloop.md b/docs/autoloop.md new file mode 100644 index 0000000..f4a36c6 --- /dev/null +++ b/docs/autoloop.md @@ -0,0 +1,161 @@ +# Autoloop + +> For an overview of all available workflows, see the [main README](../README.md). + +**Iterative optimization agent inspired by [Autoresearch](https://github.com/karpathy/autoresearch) and Claude Code's `/loop`** + +The [Autoloop workflow](../workflows/autoloop.md?plain=1) runs on a schedule to autonomously improve a target artifact toward a measurable goal. Each iteration proposes a change, evaluates it against a metric, and keeps only improvements. + +## Installation + +```bash +# Install the 'gh aw' extension +gh extension install github/gh-aw + +# Add the workflow to your repository +gh aw add-wizard githubnext/agentics/autoloop +``` + +This walks you through adding the workflow to your repository. + +## How It Works + +```mermaid +graph LR + A[Scheduled Run] --> B[Read program.md] + B --> C[Review History] + C --> D[Propose Change] + D --> E[Implement on Branch] + E --> F[Run Evaluation] + F --> G{Metric Improved?} + G -->|Yes| H[Create Draft PR] + G -->|No| I[Record & Reject] + H --> J[Update Experiment Log] + I --> J +``` + +## Getting Started + +When you install Autoloop, a **template `program.md`** is added to your repo at `.github/autoloop/program.md`. This template has placeholder sections you must fill in — the workflow **will not run** until you do. + +### Setup flow + +```mermaid +graph LR + A[Install Workflow] --> B[Edit program.md] + B --> C[Define Goal, Targets, Evaluation] + C --> D[Remove UNCONFIGURED sentinel] + D --> E[Commit & Push] + E --> F[Loop Begins] +``` + +1. **Install** — `gh aw add-wizard githubnext/agentics/autoloop` +2. **Edit** — Open `.github/autoloop/program.md` and replace the placeholders with your project's goal, target files, and evaluation command. The template includes three complete examples (ML training, test coverage, build performance) for inspiration. +3. **Activate** — Remove the `` line at the top of the file. +4. **Compile & push** — `gh aw compile && git add . && git commit -m "Configure autoloop" && git push` + +If you forget to edit the template, the first scheduled run will create a GitHub issue reminding you, with a direct link to edit the file. + +## Configuration + +The `program.md` file (at `.github/autoloop/program.md` or repo root) has three sections: + +### 1. Goal — What to optimize + +Describe the objective in natural language. Be specific about what "better" means. + +### 2. Target — What files can be changed + +List the files the agent is allowed to modify. Everything else is off-limits. + +### 3. Evaluation — How to measure success + +Provide a command to run and a metric to extract. Specify whether higher or lower is better. + +### Example program.md + +````markdown +# Autoloop Program + +## Goal + +Optimize the training script to minimize validation loss on CIFAR-10 +within a 5-minute training budget. + +## Target + +Only modify these files: +- `train.py` +- `config.yaml` + +## Evaluation + +```bash +python train.py --epochs 5 && python evaluate.py --output-json results.json +``` + +Metric: `validation_loss` from `results.json`. Lower is better. +```` + +### Customizing the Schedule + +Edit the workflow's `schedule` field. Examples: +- `every 6h` — 4 times a day (default) +- `every 1h` — hourly iterations +- `daily` — once a day +- `0 */2 * * *` — every 2 hours (cron syntax) + +After editing, run `gh aw compile` to update the workflow. + +## Usage + +### Automatic mode + +Once `program.md` exists and the workflow is installed, iterations run automatically on schedule. Each run: + +1. Reads the program definition and past history +2. Proposes a single targeted change +3. Runs the evaluation command +4. Accepts (creates draft PR) or rejects (logs the attempt) + +### Manual trigger + +```bash +# Run an iteration now +gh aw run autoloop + +# Give specific instructions +gh aw run autoloop -- "try using cosine annealing for the learning rate schedule" +``` + +### Slash command + +Comment on any issue or PR: +``` +/autoloop try batch size 64 instead of 32 +``` + +## Experiment Tracking + +All iterations are logged in a monthly GitHub issue titled `[Autoloop] Experiment Log {YYYY-MM}`. The issue tracks: + +- Current best metric value +- Full iteration history with accept/reject status +- Links to PRs for accepted changes +- Links to GitHub Actions runs + +## Human in the Loop + +- **Review draft PRs** — accepted improvements appear as draft PRs for human review +- **Merge or close** — you decide which optimizations to keep +- **Adjust the program** — update `program.md` to change the goal, targets, or evaluation +- **Steer via slash command** — use `/autoloop ` to direct specific experiments +- **Pause** — disable the workflow schedule to stop iterations + +## Security + +- Runs with read-only GitHub permissions +- Only modifies files listed in `program.md`'s Target section +- Never modifies evaluation scripts +- All changes go through draft PRs requiring human approval +- Uses "safe outputs" to constrain what the agent can create diff --git a/workflows/autoloop.md b/workflows/autoloop.md new file mode 100644 index 0000000..5c6a7b4 --- /dev/null +++ b/workflows/autoloop.md @@ -0,0 +1,266 @@ +--- +description: | + An iterative optimization loop inspired by Karpathy's Autoresearch and Claude Code's /loop. + Runs on a configurable schedule to autonomously improve a target artifact toward a measurable goal. + Each iteration: reads the program definition, proposes a change, evaluates against a metric, + and accepts or rejects the change. Tracks all iterations in a rolling GitHub issue. + - User defines the optimization goal and evaluation criteria in a program.md file + - Accepts changes only when they improve the metric (ratchet pattern) + - Persists state between runs via repo memory + - Creates draft PRs for accepted improvements + - Maintains a living experiment log as a GitHub issue + +on: + schedule: every 6h + workflow_dispatch: + slash_command: + name: autoloop + +permissions: read-all + +timeout-minutes: 45 + +network: + allowed: + - defaults + - node + - python + - rust + - java + - dotnet + +safe-outputs: + add-comment: + max: 5 + target: "*" + hide-older-comments: false + create-pull-request: + draft: true + title-prefix: "[Autoloop] " + labels: [automation, autoloop] + protected-files: fallback-to-issue + max: 2 + push-to-pull-request-branch: + target: "*" + title-prefix: "[Autoloop] " + max: 2 + create-issue: + title-prefix: "[Autoloop] " + labels: [automation, autoloop] + max: 2 + update-issue: + target: "*" + title-prefix: "[Autoloop] " + max: 1 + +tools: + web-fetch: + github: + toolsets: [all] + bash: true + repo-memory: true + +imports: + - shared/reporting.md + +--- + +# Autoloop + +An iterative optimization agent that proposes changes, evaluates them against a metric, and keeps only improvements — running autonomously on a schedule. + +## Command Mode + +Take heed of **instructions**: "${{ steps.sanitized.outputs.text }}" + +If these are non-empty (not ""), then you have been triggered via `/autoloop `. The instructions may be: +- **A one-off directive**: e.g., `/autoloop try a different approach to the loss function`. Execute it as a single iteration using the program.md context, then report results. +- **A configuration change**: e.g., `/autoloop set metric to accuracy instead of loss`. Update the relevant program.md section and confirm. + +Then exit — do not run the normal loop after completing the instructions. + +## Program Definition + +The user configures the optimization loop by editing **`.github/autoloop/program.md`** — a template file installed alongside this workflow. This file defines three things: + +1. **Goal**: What the agent is trying to optimize (natural language description) +2. **Target**: Which files the agent is allowed to modify +3. **Evaluation**: How to measure whether a change is an improvement + +### Setup Guard + +**The workflow will not run until the user has edited the template.** The installed template contains a sentinel line: + +``` + +``` + +At the start of every run, check for this sentinel. If it is present: + +1. **Do not run any iterations.** +2. Create a single GitHub issue (if one doesn't already exist) titled `[Autoloop] Action required: configure your program.md` with: + - A clear explanation that the workflow is installed but paused until `program.md` is configured. + - A direct link to the file: `${{ github.server_url }}/${{ github.repository }}/edit/${{ github.ref_name }}/.github/autoloop/program.md` + - A brief guide: "Open the file, replace the placeholder sections with your project's goal, target files, and evaluation command, then remove the `` line." + - Two or three example programs for inspiration (ML training, test coverage, build performance). +3. Exit. + +If the sentinel is absent, proceed with the iteration loop. + +### Reading the Program + +At the start of every run: + +1. Read `.github/autoloop/program.md` (or `program.md` in the repo root as a fallback). +2. Check for the `` sentinel — if present, run the **Setup Guard** flow above. +3. Parse the three sections: Goal, Target, Evaluation. +4. Validate that all three sections have non-placeholder content. If any section still contains `TODO` or `REPLACE` markers, treat it as unconfigured — create/update the setup issue and exit. +5. Read the current state of all target files. +6. Read repo memory for the current best metric value and iteration history. + +## Iteration Loop + +Each run executes **one iteration** of the optimization loop: + +### Step 1: Read State + +1. Read `program.md` to understand the goal, targets, and evaluation method. +2. Read repo memory to get: + - `best_metric`: The current best metric value (null if first run). + - `iteration_count`: How many iterations have been completed. + - `history`: Summary of recent iterations (last 20). + - `current_branch`: Any in-progress branch from a previous run. + - `rejected_approaches`: Approaches that were tried and failed (to avoid repeating). + +### Step 2: Analyze and Propose + +1. Read the target files and understand the current state. +2. Review the history of previous iterations — what worked, what didn't. +3. **Think carefully** about what change is most likely to improve the metric. Consider: + - What has been tried before and rejected (don't repeat failures). + - What the evaluation criteria reward. + - Small, targeted changes are more likely to succeed than large rewrites. + - If many small optimizations have been exhausted, consider a larger architectural change. +4. Describe the proposed change in your reasoning before implementing it. + +### Step 3: Implement + +1. Create a fresh branch: `autoloop/iteration--` from the default branch. +2. Make the proposed changes to the target files only. +3. **Respect the program constraints**: do not modify files outside the target list. + +### Step 4: Evaluate + +1. Run the evaluation command specified in `program.md`. +2. Parse the metric from the output. +3. Compare against `best_metric` from memory. + +### Step 5: Accept or Reject + +**If the metric improved** (or this is the first run establishing a baseline): +1. Record the new `best_metric` in repo memory. +2. Create a draft PR with: + - Title: `[Autoloop] Iteration : ` + - Body includes: what was changed, why, the old metric, the new metric, and the improvement delta. + - AI disclosure: `🤖 *This change was proposed and validated by Autoloop.*` +3. Add an entry to the experiment log issue. +4. Update memory: add to `history`, increment `iteration_count`, clear `current_branch`. + +**If the metric did not improve** (or evaluation failed): +1. Do NOT create a PR. +2. Record the attempt in `rejected_approaches` in memory with: what was tried, the resulting metric, and why it likely didn't work. +3. Add a "rejected" entry to the experiment log issue. +4. Update memory: increment `iteration_count`, clear `current_branch`. + +**If evaluation could not run** (build failure, missing dependencies, etc.): +1. Do NOT create a PR. +2. Record the error in memory. +3. Add an "error" entry to the experiment log issue. +4. If this is a recurring error (3+ times), create an issue describing the problem and pause further iterations until resolved. + +## Experiment Log Issue + +Maintain a single open issue titled `[Autoloop] Experiment Log {YYYY}-{MM}` as a rolling record of all iterations. + +### Issue Body Format + +```markdown +🤖 *Autoloop — an iterative optimization agent for this repository.* + +## Program + +**Goal**: {one-line summary from program.md} +**Target files**: {list of target files} +**Metric**: {metric name} ({higher/lower} is better) +**Current best**: {best_metric} (established in iteration {N}) + +## Iteration History + +### Iteration {N} — {YYYY-MM-DD HH:MM UTC} — [Run]({run_url}) +- **Status**: ✅ Accepted / ❌ Rejected / ⚠️ Error +- **Change**: {one-line description} +- **Metric**: {value} (previous best: {previous_best}, delta: {delta}) +- **PR**: #{number} (if accepted) + +### Iteration {N-1} — {YYYY-MM-DD HH:MM UTC} — [Run]({run_url}) +- **Status**: ❌ Rejected +- **Change**: {one-line description} +- **Metric**: {value} (previous best: {previous_best}, delta: {delta}) +- **Reason**: {why it was rejected} +``` + +### Format Rules + +- Iterations in **reverse chronological order** (newest first). +- Each iteration heading links to its GitHub Actions run. +- Use `${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}` for the current run URL. +- Close the previous month's issue and create a new one at month boundaries. +- Maximum 50 iterations per issue; create a continuation issue if exceeded. + +## Memory Schema + +Store the following in repo memory: + +```json +{ + "best_metric": 0.0234, + "metric_name": "validation_loss", + "metric_direction": "lower", + "iteration_count": 17, + "current_branch": null, + "last_run": "2025-01-15T12:00:00Z", + "history": [ + { + "iteration": 17, + "status": "accepted", + "description": "Reduced learning rate warmup from 5 to 3 epochs", + "metric": 0.0234, + "previous_best": 0.0241, + "pr": 42 + } + ], + "rejected_approaches": [ + { + "iteration": 16, + "description": "Switched from Adam to SGD with momentum", + "metric": 0.0298, + "reason": "SGD converges slower within the 5-minute budget" + } + ], + "consecutive_errors": 0, + "paused": false, + "pause_reason": null +} +``` + +## Guidelines + +- **One change per iteration.** Keep changes small and targeted. A single hyperparameter tweak, a minor architectural modification, or a focused code optimization. This makes it clear what caused metric changes. +- **No breaking changes.** Target files must remain functional even if the iteration is rejected. +- **Respect the evaluation budget.** If the evaluation command has a time constraint (e.g., 5-minute training), respect it. Do not modify evaluation scripts or timeout settings. +- **Learn from history.** The rejected_approaches list exists to prevent repeating failures. Read it carefully before proposing changes. +- **Diminishing returns.** If the last 5 consecutive iterations were rejected, post a comment on the experiment log suggesting the user review the program definition — the optimization may have plateaued. +- **Transparency.** Every PR and comment must include AI disclosure with 🤖. +- **Safety.** Never modify files outside the target list. Never modify the evaluation script. Never modify program.md (except via `/autoloop` command mode). +- **Read AGENTS.md first**: before starting work, read the repository's `AGENTS.md` file (if present) to understand project-specific conventions. +- **Build and test**: run any build/test commands before creating PRs. If your changes break the build, reject the iteration. diff --git a/workflows/autoloop/program.md b/workflows/autoloop/program.md new file mode 100644 index 0000000..2161ee2 --- /dev/null +++ b/workflows/autoloop/program.md @@ -0,0 +1,125 @@ + + + + +# Autoloop Program + + + +## Goal + + + +REPLACE THIS with your optimization goal. For example: +- "Minimize validation loss on CIFAR-10 within a 5-minute training budget" +- "Maximize test coverage percentage across the project" +- "Reduce production Docker image size" + +## Target + + + +Only modify these files: +- `REPLACE_WITH_FILE_1` — (describe what this file does) +- `REPLACE_WITH_FILE_2` — (describe what this file does) + +Do NOT modify: +- (list files that should never be touched, e.g., evaluation scripts, data files) + +## Evaluation + + + +Run the following command to evaluate: + +```bash +REPLACE_WITH_YOUR_EVALUATION_COMMAND +``` + +The metric is `REPLACE_WITH_METRIC_NAME` from the output. **Lower/Higher is better.** (pick one) + +A change is accepted if the metric strictly improves over the previous best. +The first run establishes the baseline. + +--- + + From e1b60611d33c20ec00ef7209d44de7c026c45601 Mon Sep 17 00:00:00 2001 From: mrjf Date: Wed, 11 Mar 2026 20:52:24 -0700 Subject: [PATCH 2/7] Support multiple independent loops per repository - Programs now live in .github/autoloop/programs/ (one .md file per loop) - Each program gets its own namespace: metrics, memory, branches, PRs, experiment log - Programs can specify per-program schedule and timeout via YAML frontmatter - Slash command supports program targeting: /autoloop training: try X - Template moved to programs/example.md with instructions to rename Co-Authored-By: Claude Opus 4.6 --- docs/autoloop.md | 77 +++++++++------ workflows/autoloop.md | 95 ++++++++++++++----- .../{program.md => programs/example.md} | 17 ++++ 3 files changed, 136 insertions(+), 53 deletions(-) rename workflows/autoloop/{program.md => programs/example.md} (89%) diff --git a/docs/autoloop.md b/docs/autoloop.md index f4a36c6..b5a0556 100644 --- a/docs/autoloop.md +++ b/docs/autoloop.md @@ -4,7 +4,7 @@ **Iterative optimization agent inspired by [Autoresearch](https://github.com/karpathy/autoresearch) and Claude Code's `/loop`** -The [Autoloop workflow](../workflows/autoloop.md?plain=1) runs on a schedule to autonomously improve a target artifact toward a measurable goal. Each iteration proposes a change, evaluates it against a metric, and keeps only improvements. +The [Autoloop workflow](../workflows/autoloop.md?plain=1) runs on a schedule to autonomously improve target artifacts toward measurable goals. Each iteration proposes a change, evaluates it against a metric, and keeps only improvements. Supports **multiple independent loops** in the same repository. ## Installation @@ -22,27 +22,28 @@ This walks you through adding the workflow to your repository. ```mermaid graph LR - A[Scheduled Run] --> B[Read program.md] - B --> C[Review History] - C --> D[Propose Change] - D --> E[Implement on Branch] - E --> F[Run Evaluation] - F --> G{Metric Improved?} - G -->|Yes| H[Create Draft PR] - G -->|No| I[Record & Reject] - H --> J[Update Experiment Log] - I --> J + A[Scheduled Run] --> B[Discover Programs] + B --> C[For Each Program] + C --> D[Review History] + D --> E[Propose Change] + E --> F[Implement on Branch] + F --> G[Run Evaluation] + G --> H{Metric Improved?} + H -->|Yes| I[Create Draft PR] + H -->|No| J[Record & Reject] + I --> K[Update Experiment Log] + J --> K ``` ## Getting Started -When you install Autoloop, a **template `program.md`** is added to your repo at `.github/autoloop/program.md`. This template has placeholder sections you must fill in — the workflow **will not run** until you do. +When you install Autoloop, a **template program file** is added at `.github/autoloop/programs/example.md`. This template has placeholder sections you must fill in — the workflow **will not run** until you do. ### Setup flow ```mermaid graph LR - A[Install Workflow] --> B[Edit program.md] + A[Install Workflow] --> B[Rename & Edit Program] B --> C[Define Goal, Targets, Evaluation] C --> D[Remove UNCONFIGURED sentinel] D --> E[Commit & Push] @@ -50,15 +51,29 @@ graph LR ``` 1. **Install** — `gh aw add-wizard githubnext/agentics/autoloop` -2. **Edit** — Open `.github/autoloop/program.md` and replace the placeholders with your project's goal, target files, and evaluation command. The template includes three complete examples (ML training, test coverage, build performance) for inspiration. -3. **Activate** — Remove the `` line at the top of the file. -4. **Compile & push** — `gh aw compile && git add . && git commit -m "Configure autoloop" && git push` +2. **Rename** — Rename `.github/autoloop/programs/example.md` to something meaningful (e.g., `training.md`, `coverage.md`). The filename becomes the program name. +3. **Edit** — Replace the placeholders with your project's goal, target files, and evaluation command. The template includes three complete examples for inspiration. +4. **Activate** — Remove the `` line at the top. +5. **Compile & push** — `gh aw compile && git add . && git commit -m "Configure autoloop" && git push` If you forget to edit the template, the first scheduled run will create a GitHub issue reminding you, with a direct link to edit the file. +### Adding more loops + +To run multiple optimization loops in parallel, just add more `.md` files to `.github/autoloop/programs/`: + +``` +.github/autoloop/programs/ +├── training.md ← optimize model training loss +├── coverage.md ← maximize test coverage +└── build-perf.md ← minimize build time +``` + +Each program runs independently with its own metric tracking, experiment log issue, and PR namespace. Copy the template, fill it in, and push — the next scheduled run picks it up automatically. + ## Configuration -The `program.md` file (at `.github/autoloop/program.md` or repo root) has three sections: +Each program file in `.github/autoloop/programs/` has three sections: ### 1. Goal — What to optimize @@ -72,7 +87,7 @@ List the files the agent is allowed to modify. Everything else is off-limits. Provide a command to run and a metric to extract. Specify whether higher or lower is better. -### Example program.md +### Example program file ````markdown # Autoloop Program @@ -107,11 +122,13 @@ Edit the workflow's `schedule` field. Examples: After editing, run `gh aw compile` to update the workflow. +Note: The schedule applies to the workflow as a whole — all programs iterate on the same schedule. To run programs at different frequencies, you can install the workflow multiple times with different schedules, each pointing to a subset of programs. + ## Usage ### Automatic mode -Once `program.md` exists and the workflow is installed, iterations run automatically on schedule. Each run: +Once at least one configured program exists, iterations run automatically on schedule. Each run processes every configured program: 1. Reads the program definition and past history 2. Proposes a single targeted change @@ -121,23 +138,26 @@ Once `program.md` exists and the workflow is installed, iterations run automatic ### Manual trigger ```bash -# Run an iteration now +# Run all programs now gh aw run autoloop -# Give specific instructions -gh aw run autoloop -- "try using cosine annealing for the learning rate schedule" +# Target a specific program +gh aw run autoloop -- "training: try using cosine annealing" + +# If only one program exists, no prefix needed +gh aw run autoloop -- "try batch size 64 instead of 32" ``` ### Slash command Comment on any issue or PR: ``` -/autoloop try batch size 64 instead of 32 +/autoloop training: try batch size 64 instead of 32 ``` ## Experiment Tracking -All iterations are logged in a monthly GitHub issue titled `[Autoloop] Experiment Log {YYYY-MM}`. The issue tracks: +Each program gets its own monthly experiment log issue titled `[Autoloop: {program-name}] Experiment Log {YYYY-MM}`. The issue tracks: - Current best metric value - Full iteration history with accept/reject status @@ -148,14 +168,15 @@ All iterations are logged in a monthly GitHub issue titled `[Autoloop] Experimen - **Review draft PRs** — accepted improvements appear as draft PRs for human review - **Merge or close** — you decide which optimizations to keep -- **Adjust the program** — update `program.md` to change the goal, targets, or evaluation -- **Steer via slash command** — use `/autoloop ` to direct specific experiments -- **Pause** — disable the workflow schedule to stop iterations +- **Adjust programs** — edit any program file to change the goal, targets, or evaluation +- **Add/remove loops** — add or delete files in `.github/autoloop/programs/` +- **Steer via slash command** — use `/autoloop {program}: {instructions}` to direct experiments +- **Pause** — disable the workflow schedule to stop all loops, or add the sentinel back to a single program file to pause just that loop ## Security - Runs with read-only GitHub permissions -- Only modifies files listed in `program.md`'s Target section +- Only modifies files listed in each program's Target section - Never modifies evaluation scripts - All changes go through draft PRs requiring human approval - Uses "safe outputs" to constrain what the agent can create diff --git a/workflows/autoloop.md b/workflows/autoloop.md index 5c6a7b4..2dcb2b6 100644 --- a/workflows/autoloop.md +++ b/workflows/autoloop.md @@ -74,58 +74,102 @@ An iterative optimization agent that proposes changes, evaluates them against a Take heed of **instructions**: "${{ steps.sanitized.outputs.text }}" If these are non-empty (not ""), then you have been triggered via `/autoloop `. The instructions may be: -- **A one-off directive**: e.g., `/autoloop try a different approach to the loss function`. Execute it as a single iteration using the program.md context, then report results. -- **A configuration change**: e.g., `/autoloop set metric to accuracy instead of loss`. Update the relevant program.md section and confirm. +- **A one-off directive targeting a specific program**: e.g., `/autoloop training: try a different approach to the loss function`. The text before the colon is the program name (matching a file in `.github/autoloop/programs/`). Execute it as a single iteration for that program, then report results. +- **A general directive**: e.g., `/autoloop try cosine annealing`. If no program name prefix is given and only one program exists, use that one. If multiple exist, ask which program to target. +- **A configuration change**: e.g., `/autoloop training: set metric to accuracy instead of loss`. Update the relevant program file and confirm. Then exit — do not run the normal loop after completing the instructions. +## Multiple Programs + +Autoloop supports **multiple independent optimization loops** in the same repository. Each loop is defined by a separate markdown file in `.github/autoloop/programs/`. For example: + +``` +.github/autoloop/programs/ +├── training.md ← optimize model training +├── coverage.md ← maximize test coverage +└── build-perf.md ← minimize build time +``` + +Each program runs independently with its own: +- Goal, target files, and evaluation command +- Metric tracking and best-metric history +- Experiment log issue: `[Autoloop: {program-name}] Experiment Log {YYYY-MM}` +- Branch namespace: `autoloop/{program-name}/iteration--` +- PR title prefix: `[Autoloop: {program-name}]` +- Repo memory namespace: keyed by program name + +On each scheduled run, the workflow iterates through **all configured programs** and runs one iteration per program. Programs with the `` sentinel are skipped. + +### Per-Program Schedule and Timeout + +Programs can optionally specify their own schedule and timeout in a YAML frontmatter block at the top of the file (after the sentinel, if present): + +```markdown +--- +schedule: every 1h +timeout-minutes: 30 +--- + +# Autoloop Program +... +``` + +- **`schedule`**: Controls how often this program runs. On each workflow trigger, check if the program is due based on its schedule and the `last_run` timestamp in memory. If the program's schedule hasn't elapsed since its last run, skip it. If omitted, the program runs on every workflow trigger. +- **`timeout-minutes`**: Maximum time for this program's iteration. If omitted, the program shares the workflow's overall timeout. + +This lets you run a fast coverage check every hour while running a slow training loop once a day — all from the same workflow. + ## Program Definition -The user configures the optimization loop by editing **`.github/autoloop/program.md`** — a template file installed alongside this workflow. This file defines three things: +Each program file in `.github/autoloop/programs/` defines three things: 1. **Goal**: What the agent is trying to optimize (natural language description) 2. **Target**: Which files the agent is allowed to modify 3. **Evaluation**: How to measure whether a change is an improvement +The **program name** is the filename without the `.md` extension (e.g., `training.md` → program name is `training`). + ### Setup Guard -**The workflow will not run until the user has edited the template.** The installed template contains a sentinel line: +A template program file is installed at `.github/autoloop/programs/example.md`. **Programs will not run until the user has edited them.** Each template contains a sentinel line: ``` ``` -At the start of every run, check for this sentinel. If it is present: +At the start of every run, check each program file for this sentinel. For any program where it is present: -1. **Do not run any iterations.** -2. Create a single GitHub issue (if one doesn't already exist) titled `[Autoloop] Action required: configure your program.md` with: - - A clear explanation that the workflow is installed but paused until `program.md` is configured. - - A direct link to the file: `${{ github.server_url }}/${{ github.repository }}/edit/${{ github.ref_name }}/.github/autoloop/program.md` +1. **Skip that program — do not run any iterations for it.** +2. If no setup issue exists for that program, create one titled `[Autoloop: {program-name}] Action required: configure your program` with: + - A clear explanation that this program is installed but paused until configured. + - A direct link to the file: `${{ github.server_url }}/${{ github.repository }}/edit/${{ github.ref_name }}/.github/autoloop/programs/{program-name}.md` - A brief guide: "Open the file, replace the placeholder sections with your project's goal, target files, and evaluation command, then remove the `` line." - Two or three example programs for inspiration (ML training, test coverage, build performance). -3. Exit. -If the sentinel is absent, proceed with the iteration loop. +If **all** programs are unconfigured, exit after creating the setup issues. Otherwise, proceed with the configured programs. -### Reading the Program +### Reading Programs At the start of every run: -1. Read `.github/autoloop/program.md` (or `program.md` in the repo root as a fallback). -2. Check for the `` sentinel — if present, run the **Setup Guard** flow above. -3. Parse the three sections: Goal, Target, Evaluation. -4. Validate that all three sections have non-placeholder content. If any section still contains `TODO` or `REPLACE` markers, treat it as unconfigured — create/update the setup issue and exit. -5. Read the current state of all target files. -6. Read repo memory for the current best metric value and iteration history. +1. List all `.md` files in `.github/autoloop/programs/`. +2. If the directory is empty or doesn't exist, also check for a single `.github/autoloop/program.md` or `program.md` in the repo root as a fallback (for single-program setups). +3. For each program file: + a. Check for the `` sentinel — if present, run the **Setup Guard** for that program and skip it. + b. Parse the three sections: Goal, Target, Evaluation. + c. Validate that all three sections have non-placeholder content. If any section still contains `TODO` or `REPLACE` markers, treat it as unconfigured — create/update the setup issue for that program and skip it. + d. Read the current state of all target files. + e. Read repo memory for that program's metric history (keyed by program name). ## Iteration Loop -Each run executes **one iteration** of the optimization loop: +Each run executes **one iteration per configured program**. For each program: ### Step 1: Read State -1. Read `program.md` to understand the goal, targets, and evaluation method. -2. Read repo memory to get: +1. Read the program file to understand the goal, targets, and evaluation method. +2. Read repo memory (keyed by program name) to get: - `best_metric`: The current best metric value (null if first run). - `iteration_count`: How many iterations have been completed. - `history`: Summary of recent iterations (last 20). @@ -145,7 +189,7 @@ Each run executes **one iteration** of the optimization loop: ### Step 3: Implement -1. Create a fresh branch: `autoloop/iteration--` from the default branch. +1. Create a fresh branch: `autoloop/{program-name}/iteration--` from the default branch. 2. Make the proposed changes to the target files only. 3. **Respect the program constraints**: do not modify files outside the target list. @@ -160,7 +204,7 @@ Each run executes **one iteration** of the optimization loop: **If the metric improved** (or this is the first run establishing a baseline): 1. Record the new `best_metric` in repo memory. 2. Create a draft PR with: - - Title: `[Autoloop] Iteration : ` + - Title: `[Autoloop: {program-name}] Iteration : ` - Body includes: what was changed, why, the old metric, the new metric, and the improvement delta. - AI disclosure: `🤖 *This change was proposed and validated by Autoloop.*` 3. Add an entry to the experiment log issue. @@ -180,7 +224,7 @@ Each run executes **one iteration** of the optimization loop: ## Experiment Log Issue -Maintain a single open issue titled `[Autoloop] Experiment Log {YYYY}-{MM}` as a rolling record of all iterations. +Maintain a single open issue **per program** titled `[Autoloop: {program-name}] Experiment Log {YYYY}-{MM}` as a rolling record of that program's iterations. ### Issue Body Format @@ -219,10 +263,11 @@ Maintain a single open issue titled `[Autoloop] Experiment Log {YYYY}-{MM}` as a ## Memory Schema -Store the following in repo memory: +Store state in repo memory **keyed by program name**. Each program gets its own memory namespace (e.g., `autoloop/training`, `autoloop/coverage`): ```json { + "program_name": "training", "best_metric": 0.0234, "metric_name": "validation_loss", "metric_direction": "lower", diff --git a/workflows/autoloop/program.md b/workflows/autoloop/programs/example.md similarity index 89% rename from workflows/autoloop/program.md rename to workflows/autoloop/programs/example.md index 2161ee2..7339bde 100644 --- a/workflows/autoloop/program.md +++ b/workflows/autoloop/programs/example.md @@ -2,6 +2,18 @@ + + # Autoloop Program From 67a741a47f467a657dd19736429e563bcd80ffe0 Mon Sep 17 00:00:00 2001 From: mrjf Date: Wed, 11 Mar 2026 21:09:10 -0700 Subject: [PATCH 3/7] Add lightweight pre-step to skip agent when no programs are due The pre-step runs in Python before the agent starts and checks: - Which programs are due based on per-program schedule + last_run - Which are unconfigured (sentinel/placeholders still present) - Which are paused or plateaued (5+ consecutive rejections) If no programs are due, the workflow exits with no agent invocation. This avoids burning agent compute on schedule ticks where nothing needs to happen. Co-Authored-By: Claude Opus 4.6 --- workflows/autoloop.md | 147 +++++++++++++++++++++++++++++++++++++++--- 1 file changed, 137 insertions(+), 10 deletions(-) diff --git a/workflows/autoloop.md b/workflows/autoloop.md index 2dcb2b6..df91b84 100644 --- a/workflows/autoloop.md +++ b/workflows/autoloop.md @@ -63,6 +63,131 @@ tools: imports: - shared/reporting.md +steps: + - name: Check which programs are due + run: | + python3 - << 'PYEOF' + import os, json, re, glob, sys + from datetime import datetime, timezone, timedelta + + programs_dir = ".github/autoloop/programs" + memory_dir = ".github/repo-memory/autoloop" + + # Find all program files + program_files = [] + if os.path.isdir(programs_dir): + program_files = sorted(glob.glob(os.path.join(programs_dir, "*.md"))) + if not program_files: + # Fallback to single-file locations + for path in [".github/autoloop/program.md", "program.md"]: + if os.path.isfile(path): + program_files = [path] + break + + if not program_files: + print("NO_PROGRAMS_FOUND") + with open("/tmp/gh-aw/autoloop.json", "w") as f: + json.dump({"due": [], "skipped": [], "unconfigured": [], "no_programs": True}, f) + sys.exit(0) + + now = datetime.now(timezone.utc) + due = [] + skipped = [] + unconfigured = [] + + # Schedule string to timedelta + def parse_schedule(s): + s = s.strip().lower() + m = re.match(r"every\s+(\d+)\s*h", s) + if m: + return timedelta(hours=int(m.group(1))) + m = re.match(r"every\s+(\d+)\s*m", s) + if m: + return timedelta(minutes=int(m.group(1))) + if s == "daily": + return timedelta(hours=24) + if s == "weekly": + return timedelta(days=7) + return None # No per-program schedule — always due + + for pf in program_files: + name = os.path.splitext(os.path.basename(pf))[0] + with open(pf) as f: + content = f.read() + + # Check sentinel + if "" in content: + unconfigured.append(name) + continue + + # Check for TODO/REPLACE placeholders + if re.search(r'\bTODO\b|\bREPLACE', content): + unconfigured.append(name) + continue + + # Parse optional YAML frontmatter for schedule + schedule_delta = None + fm_match = re.match(r"^---\s*\n(.*?)\n---\s*\n", content, re.DOTALL) + if fm_match: + for line in fm_match.group(1).split("\n"): + if line.strip().startswith("schedule:"): + schedule_str = line.split(":", 1)[1].strip() + schedule_delta = parse_schedule(schedule_str) + + # Check last_run from repo memory + mem_file = os.path.join(memory_dir, f"{name}.json") + last_run = None + if os.path.isfile(mem_file): + try: + with open(mem_file) as f: + mem = json.load(f) + lr = mem.get("last_run") + if lr: + last_run = datetime.fromisoformat(lr.replace("Z", "+00:00")) + except (json.JSONDecodeError, ValueError): + pass + + # Check if paused (e.g., plateau or recurring errors) + if os.path.isfile(mem_file): + try: + with open(mem_file) as f: + mem = json.load(f) + if mem.get("paused"): + skipped.append({"name": name, "reason": f"paused: {mem.get('pause_reason', 'unknown')}"}) + continue + # Auto-pause on plateau: 5+ consecutive rejections + recent = mem.get("history", [])[-5:] + if len(recent) >= 5 and all(h.get("status") == "rejected" for h in recent): + skipped.append({"name": name, "reason": "plateau: 5 consecutive rejections"}) + continue + except (json.JSONDecodeError, ValueError): + pass + + # Check if due based on per-program schedule + if schedule_delta and last_run: + if now - last_run < schedule_delta: + skipped.append({"name": name, "reason": "not due yet", + "next_due": (last_run + schedule_delta).isoformat()}) + continue + + due.append(name) + + result = {"due": due, "skipped": skipped, "unconfigured": unconfigured, "no_programs": False} + + os.makedirs("/tmp/gh-aw", exist_ok=True) + with open("/tmp/gh-aw/autoloop.json", "w") as f: + json.dump(result, f, indent=2) + + print("=== Autoloop Program Check ===") + print(f"Programs due: {due or '(none)'}") + print(f"Programs skipped: {[s['name'] for s in skipped] or '(none)'}") + print(f"Programs unconfigured: {unconfigured or '(none)'}") + + if not due and not unconfigured: + print("\nNo programs due this run. Exiting early.") + sys.exit(1) # Non-zero exit skips the agent step + PYEOF + --- # Autoloop @@ -99,7 +224,7 @@ Each program runs independently with its own: - PR title prefix: `[Autoloop: {program-name}]` - Repo memory namespace: keyed by program name -On each scheduled run, the workflow iterates through **all configured programs** and runs one iteration per program. Programs with the `` sentinel are skipped. +On each scheduled run, a lightweight pre-step checks which programs are due (based on per-program schedules and `last_run` timestamps). **If no programs are due, the workflow exits before the agent starts — zero agent cost.** Only due programs get iterated. ### Per-Program Schedule and Timeout @@ -151,16 +276,18 @@ If **all** programs are unconfigured, exit after creating the setup issues. Othe ### Reading Programs -At the start of every run: +The pre-step has already determined which programs are due, unconfigured, or skipped. Read `/tmp/gh-aw/autoloop.json` at the start of your run to get: + +- **`due`**: List of program names to run iterations for this run. +- **`unconfigured`**: Programs that still have the sentinel or placeholder content — run the **Setup Guard** for each of these (create setup issues). +- **`skipped`**: Programs not due yet based on their per-program schedule — ignore these entirely. +- **`no_programs`**: If `true`, no program files exist at all — create a single issue explaining how to add a program. -1. List all `.md` files in `.github/autoloop/programs/`. -2. If the directory is empty or doesn't exist, also check for a single `.github/autoloop/program.md` or `program.md` in the repo root as a fallback (for single-program setups). -3. For each program file: - a. Check for the `` sentinel — if present, run the **Setup Guard** for that program and skip it. - b. Parse the three sections: Goal, Target, Evaluation. - c. Validate that all three sections have non-placeholder content. If any section still contains `TODO` or `REPLACE` markers, treat it as unconfigured — create/update the setup issue for that program and skip it. - d. Read the current state of all target files. - e. Read repo memory for that program's metric history (keyed by program name). +For each program in `due`: +1. Read the program file from `.github/autoloop/programs/{name}.md`. +2. Parse the three sections: Goal, Target, Evaluation. +3. Read the current state of all target files. +4. Read repo memory for that program's metric history (keyed by program name). ## Iteration Loop From 38de50682e04d158dcf47a27b2816f22b7203c0e Mon Sep 17 00:00:00 2001 From: mrjf Date: Wed, 11 Mar 2026 21:13:01 -0700 Subject: [PATCH 4/7] Fix CI: remove unauthorized github.ref_name expression The gh-aw compiler doesn't allow github.ref_name. Replace with a generic instruction to use the default branch in the edit URL. Co-Authored-By: Claude Opus 4.6 --- workflows/autoloop.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/workflows/autoloop.md b/workflows/autoloop.md index df91b84..38b0bfe 100644 --- a/workflows/autoloop.md +++ b/workflows/autoloop.md @@ -268,7 +268,7 @@ At the start of every run, check each program file for this sentinel. For any pr 1. **Skip that program — do not run any iterations for it.** 2. If no setup issue exists for that program, create one titled `[Autoloop: {program-name}] Action required: configure your program` with: - A clear explanation that this program is installed but paused until configured. - - A direct link to the file: `${{ github.server_url }}/${{ github.repository }}/edit/${{ github.ref_name }}/.github/autoloop/programs/{program-name}.md` + - A direct link to edit the file on GitHub (use the repository's default branch in the URL). - A brief guide: "Open the file, replace the placeholder sections with your project's goal, target files, and evaluation command, then remove the `` line." - Two or three example programs for inspiration (ML training, test coverage, build performance). From ed4b671d110ebb0186e7f8f57b1fbea8eb6c6375 Mon Sep 17 00:00:00 2001 From: mrjf Date: Thu, 12 Mar 2026 13:16:00 -0700 Subject: [PATCH 5/7] Bootstrap program template on first run via pre-step Since add-wizard only installs the workflow .md file and can't copy companion files, the pre-step now creates .github/autoloop/programs/ and writes the template example.md on first run if the directory doesn't exist. Commits and pushes the template so the user can find and edit it in their repo. Co-Authored-By: Claude Opus 4.6 --- workflows/autoloop.md | 54 ++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 51 insertions(+), 3 deletions(-) diff --git a/workflows/autoloop.md b/workflows/autoloop.md index 38b0bfe..8928b35 100644 --- a/workflows/autoloop.md +++ b/workflows/autoloop.md @@ -72,11 +72,57 @@ steps: programs_dir = ".github/autoloop/programs" memory_dir = ".github/repo-memory/autoloop" + template_file = os.path.join(programs_dir, "example.md") + + # Bootstrap: create programs directory and template if missing + if not os.path.isdir(programs_dir): + os.makedirs(programs_dir, exist_ok=True) + with open(template_file, "w") as f: + f.write("""\ + + + + +# Autoloop Program + + + +## Goal + + + +REPLACE THIS with your optimization goal. + +## Target + + + +Only modify these files: +- `REPLACE_WITH_FILE` — (describe what this file does) + +Do NOT modify: +- (list files that must not be touched) + +## Evaluation + + + +```bash +REPLACE_WITH_YOUR_EVALUATION_COMMAND +``` + +The metric is `REPLACE_WITH_METRIC_NAME`. **Lower/Higher is better.** (pick one) +""") + # Commit the template so the user can see and edit it + os.system(f'git add "{template_file}"') + os.system('git commit -m "[Autoloop] Bootstrap: add program template for configuration"') + os.system('git push') + print(f"BOOTSTRAPPED: created {template_file} and pushed to repo") # Find all program files - program_files = [] - if os.path.isdir(programs_dir): - program_files = sorted(glob.glob(os.path.join(programs_dir, "*.md"))) + program_files = sorted(glob.glob(os.path.join(programs_dir, "*.md"))) if not program_files: # Fallback to single-file locations for path in [".github/autoloop/program.md", "program.md"]: @@ -86,10 +132,12 @@ steps: if not program_files: print("NO_PROGRAMS_FOUND") + os.makedirs("/tmp/gh-aw", exist_ok=True) with open("/tmp/gh-aw/autoloop.json", "w") as f: json.dump({"due": [], "skipped": [], "unconfigured": [], "no_programs": True}, f) sys.exit(0) + os.makedirs("/tmp/gh-aw", exist_ok=True) now = datetime.now(timezone.utc) due = [] skipped = [] From e1324f368ddff5bb0fe571889c31fe5cbad8f6da Mon Sep 17 00:00:00 2001 From: mrjf Date: Thu, 12 Mar 2026 13:44:21 -0700 Subject: [PATCH 6/7] Use committed state.json instead of repo-memory for pre-step MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit repo-memory is only available inside the agent runtime, not during the bash pre-step. Split persistence into two layers: 1. .github/autoloop/state.json (committed to repo) — lightweight state the pre-step reads: last_run, best_metric, pause flags, recent_statuses for plateau detection 2. repo-memory (agent only) — full iteration history, rejected approaches, detailed notes The agent updates and commits state.json at the end of every iteration so the pre-step can make scheduling decisions cheaply. Co-Authored-By: Claude Opus 4.6 --- workflows/autoloop.md | 122 ++++++++++++++++++++++++++---------------- 1 file changed, 76 insertions(+), 46 deletions(-) diff --git a/workflows/autoloop.md b/workflows/autoloop.md index 8928b35..12e3286 100644 --- a/workflows/autoloop.md +++ b/workflows/autoloop.md @@ -71,7 +71,7 @@ steps: from datetime import datetime, timezone, timedelta programs_dir = ".github/autoloop/programs" - memory_dir = ".github/repo-memory/autoloop" + state_file = ".github/autoloop/state.json" template_file = os.path.join(programs_dir, "example.md") # Bootstrap: create programs directory and template if missing @@ -182,35 +182,36 @@ The metric is `REPLACE_WITH_METRIC_NAME`. **Lower/Higher is better.** (pick one) schedule_str = line.split(":", 1)[1].strip() schedule_delta = parse_schedule(schedule_str) - # Check last_run from repo memory - mem_file = os.path.join(memory_dir, f"{name}.json") - last_run = None - if os.path.isfile(mem_file): + # Read lightweight state file (committed to repo, not repo-memory) + # state.json tracks: last_run timestamps, pause flags, recent statuses + state = {} + if os.path.isfile(state_file): try: - with open(mem_file) as f: - mem = json.load(f) - lr = mem.get("last_run") - if lr: - last_run = datetime.fromisoformat(lr.replace("Z", "+00:00")) + with open(state_file) as f: + all_state = json.load(f) + state = all_state.get(name, {}) except (json.JSONDecodeError, ValueError): pass - # Check if paused (e.g., plateau or recurring errors) - if os.path.isfile(mem_file): + last_run = None + lr = state.get("last_run") + if lr: try: - with open(mem_file) as f: - mem = json.load(f) - if mem.get("paused"): - skipped.append({"name": name, "reason": f"paused: {mem.get('pause_reason', 'unknown')}"}) - continue - # Auto-pause on plateau: 5+ consecutive rejections - recent = mem.get("history", [])[-5:] - if len(recent) >= 5 and all(h.get("status") == "rejected" for h in recent): - skipped.append({"name": name, "reason": "plateau: 5 consecutive rejections"}) - continue - except (json.JSONDecodeError, ValueError): + last_run = datetime.fromisoformat(lr.replace("Z", "+00:00")) + except ValueError: pass + # Check if paused (e.g., plateau or recurring errors) + if state.get("paused"): + skipped.append({"name": name, "reason": f"paused: {state.get('pause_reason', 'unknown')}"}) + continue + + # Auto-pause on plateau: 5+ consecutive rejections + recent = state.get("recent_statuses", [])[-5:] + if len(recent) >= 5 and all(s == "rejected" for s in recent): + skipped.append({"name": name, "reason": "plateau: 5 consecutive rejections"}) + continue + # Check if due based on per-program schedule if schedule_delta and last_run: if now - last_run < schedule_delta: @@ -344,12 +345,11 @@ Each run executes **one iteration per configured program**. For each program: ### Step 1: Read State 1. Read the program file to understand the goal, targets, and evaluation method. -2. Read repo memory (keyed by program name) to get: - - `best_metric`: The current best metric value (null if first run). - - `iteration_count`: How many iterations have been completed. +2. Read `.github/autoloop/state.json` for this program's `best_metric` and `iteration_count`. +3. Read repo memory (keyed by program name) for detailed history: - `history`: Summary of recent iterations (last 20). - - `current_branch`: Any in-progress branch from a previous run. - `rejected_approaches`: Approaches that were tried and failed (to avoid repeating). + - `consecutive_errors`: Count of consecutive evaluation failures. ### Step 2: Analyze and Propose @@ -377,25 +377,26 @@ Each run executes **one iteration per configured program**. For each program: ### Step 5: Accept or Reject **If the metric improved** (or this is the first run establishing a baseline): -1. Record the new `best_metric` in repo memory. -2. Create a draft PR with: +1. Create a draft PR with: - Title: `[Autoloop: {program-name}] Iteration : ` - Body includes: what was changed, why, the old metric, the new metric, and the improvement delta. - AI disclosure: `🤖 *This change was proposed and validated by Autoloop.*` -3. Add an entry to the experiment log issue. -4. Update memory: add to `history`, increment `iteration_count`, clear `current_branch`. +2. Add an entry to the experiment log issue. +3. Update repo memory: add to `history`, reset `consecutive_errors` to 0. +4. Update `state.json`: set `best_metric`, increment `iteration_count`, set `last_run`, append `"accepted"` to `recent_statuses`. **Commit and push.** **If the metric did not improve** (or evaluation failed): 1. Do NOT create a PR. -2. Record the attempt in `rejected_approaches` in memory with: what was tried, the resulting metric, and why it likely didn't work. +2. Update repo memory: add to `rejected_approaches` with what was tried, the resulting metric, and why it likely didn't work. 3. Add a "rejected" entry to the experiment log issue. -4. Update memory: increment `iteration_count`, clear `current_branch`. +4. Update `state.json`: increment `iteration_count`, set `last_run`, append `"rejected"` to `recent_statuses`. **Commit and push.** **If evaluation could not run** (build failure, missing dependencies, etc.): 1. Do NOT create a PR. -2. Record the error in memory. +2. Update repo memory: increment `consecutive_errors`. 3. Add an "error" entry to the experiment log issue. -4. If this is a recurring error (3+ times), create an issue describing the problem and pause further iterations until resolved. +4. If `consecutive_errors` reaches 3+, set `paused: true` and `pause_reason` in `state.json`, and create an issue describing the problem. +5. Update `state.json`: increment `iteration_count`, set `last_run`, append `"error"` to `recent_statuses`. **Commit and push.** ## Experiment Log Issue @@ -436,19 +437,50 @@ Maintain a single open issue **per program** titled `[Autoloop: {program-name}] - Close the previous month's issue and create a new one at month boundaries. - Maximum 50 iterations per issue; create a continuation issue if exceeded. -## Memory Schema +## State and Memory + +Autoloop uses **two persistence layers**: + +### 1. State file (`.github/autoloop/state.json`) — lightweight, committed to repo + +This file is read by the **pre-step** (before the agent starts) to decide which programs are due. The agent **must update this file and commit it** at the end of every iteration. This is the only way the pre-step can check schedules, plateaus, and pause flags on future runs. + +```json +{ + "training": { + "last_run": "2025-01-15T12:00:00Z", + "best_metric": 0.0234, + "iteration_count": 17, + "paused": false, + "pause_reason": null, + "recent_statuses": ["accepted", "rejected", "rejected", "accepted", "accepted"] + }, + "coverage": { + "last_run": "2025-01-15T06:00:00Z", + "best_metric": 78.4, + "iteration_count": 5, + "paused": false, + "pause_reason": null, + "recent_statuses": ["accepted", "accepted", "rejected", "accepted", "accepted"] + } +} +``` + +**After every iteration** (accepted, rejected, or error), update this program's entry in `state.json`: +- Set `last_run` to the current UTC timestamp. +- Update `best_metric` if the iteration was accepted. +- Increment `iteration_count`. +- Append the status (`"accepted"`, `"rejected"`, or `"error"`) to `recent_statuses` (keep last 10). +- Set `paused`/`pause_reason` if needed. +- **Commit and push** the updated `state.json` to the default branch. + +### 2. Repo memory — full history for the agent -Store state in repo memory **keyed by program name**. Each program gets its own memory namespace (e.g., `autoloop/training`, `autoloop/coverage`): +Use repo-memory (keyed by program name, e.g., `autoloop/training`) for detailed state the agent needs but the pre-step doesn't: ```json { "program_name": "training", - "best_metric": 0.0234, - "metric_name": "validation_loss", - "metric_direction": "lower", - "iteration_count": 17, - "current_branch": null, - "last_run": "2025-01-15T12:00:00Z", "history": [ { "iteration": 17, @@ -467,9 +499,7 @@ Store state in repo memory **keyed by program name**. Each program gets its own "reason": "SGD converges slower within the 5-minute budget" } ], - "consecutive_errors": 0, - "paused": false, - "pause_reason": null + "consecutive_errors": 0 } ``` From 984d8182f7b41c45bbaf5d15c478191e1ad73305 Mon Sep 17 00:00:00 2001 From: mrjf Date: Thu, 12 Mar 2026 14:19:10 -0700 Subject: [PATCH 7/7] Fix compile: escape backticks in bootstrap template The gh-aw compiler treats backticks as reserved characters. Use chr(96) to construct them at runtime in the Python pre-step so they don't appear as literals in the workflow source. Co-Authored-By: Claude Opus 4.6 --- workflows/autoloop.md | 77 ++++++++++++++++++++++--------------------- 1 file changed, 40 insertions(+), 37 deletions(-) diff --git a/workflows/autoloop.md b/workflows/autoloop.md index 12e3286..e94e687 100644 --- a/workflows/autoloop.md +++ b/workflows/autoloop.md @@ -77,44 +77,47 @@ steps: # Bootstrap: create programs directory and template if missing if not os.path.isdir(programs_dir): os.makedirs(programs_dir, exist_ok=True) + bt = chr(96) # backtick — avoid literal backticks that break gh-aw compiler + template = "\n".join([ + "", + "", + "", + "", + "# Autoloop Program", + "", + "", + "", + "## Goal", + "", + "", + "", + "REPLACE THIS with your optimization goal.", + "", + "## Target", + "", + "", + "", + "Only modify these files:", + f"- {bt}REPLACE_WITH_FILE{bt} -- (describe what this file does)", + "", + "Do NOT modify:", + "- (list files that must not be touched)", + "", + "## Evaluation", + "", + "", + "", + f"{bt}{bt}{bt}bash", + "REPLACE_WITH_YOUR_EVALUATION_COMMAND", + f"{bt}{bt}{bt}", + "", + f"The metric is {bt}REPLACE_WITH_METRIC_NAME{bt}. **Lower/Higher is better.** (pick one)", + "", + ]) with open(template_file, "w") as f: - f.write("""\ - - - - -# Autoloop Program - - - -## Goal - - - -REPLACE THIS with your optimization goal. - -## Target - - - -Only modify these files: -- `REPLACE_WITH_FILE` — (describe what this file does) - -Do NOT modify: -- (list files that must not be touched) - -## Evaluation - - - -```bash -REPLACE_WITH_YOUR_EVALUATION_COMMAND -``` - -The metric is `REPLACE_WITH_METRIC_NAME`. **Lower/Higher is better.** (pick one) -""") + f.write(template) # Commit the template so the user can see and edit it os.system(f'git add "{template_file}"') os.system('git commit -m "[Autoloop] Bootstrap: add program template for configuration"')