diff --git a/README.md b/README.md index 45e9ba2..93bb873 100644 --- a/README.md +++ b/README.md @@ -22,6 +22,7 @@ A sample family of reusable [GitHub Agentic Workflows](https://github.github.com ### Research, Status & Planning Workflows +- [🔄 Autoloop](docs/autoloop.md) - Iterative optimization agent that proposes changes, evaluates against a metric, and keeps only improvements - [📚 Weekly Research](docs/weekly-research.md) - Collect research updates and industry trends - [📊 Weekly Issue Summary](docs/weekly-issue-summary.md) - Weekly issue activity report with trend charts and recommendations - [👥 Daily Repo Status](docs/daily-repo-status.md) - Assess repository activity and create status reports diff --git a/docs/autoloop.md b/docs/autoloop.md new file mode 100644 index 0000000..b5a0556 --- /dev/null +++ b/docs/autoloop.md @@ -0,0 +1,182 @@ +# Autoloop + +> For an overview of all available workflows, see the [main README](../README.md). + +**Iterative optimization agent inspired by [Autoresearch](https://github.com/karpathy/autoresearch) and Claude Code's `/loop`** + +The [Autoloop workflow](../workflows/autoloop.md?plain=1) runs on a schedule to autonomously improve target artifacts toward measurable goals. Each iteration proposes a change, evaluates it against a metric, and keeps only improvements. Supports **multiple independent loops** in the same repository. + +## Installation + +```bash +# Install the 'gh aw' extension +gh extension install github/gh-aw + +# Add the workflow to your repository +gh aw add-wizard githubnext/agentics/autoloop +``` + +This walks you through adding the workflow to your repository. + +## How It Works + +```mermaid +graph LR + A[Scheduled Run] --> B[Discover Programs] + B --> C[For Each Program] + C --> D[Review History] + D --> E[Propose Change] + E --> F[Implement on Branch] + F --> G[Run Evaluation] + G --> H{Metric Improved?} + H -->|Yes| I[Create Draft PR] + H -->|No| J[Record & Reject] + I --> K[Update Experiment Log] + J --> K +``` + +## Getting Started + +When you install Autoloop, a **template program file** is added at `.github/autoloop/programs/example.md`. This template has placeholder sections you must fill in — the workflow **will not run** until you do. + +### Setup flow + +```mermaid +graph LR + A[Install Workflow] --> B[Rename & Edit Program] + B --> C[Define Goal, Targets, Evaluation] + C --> D[Remove UNCONFIGURED sentinel] + D --> E[Commit & Push] + E --> F[Loop Begins] +``` + +1. **Install** — `gh aw add-wizard githubnext/agentics/autoloop` +2. **Rename** — Rename `.github/autoloop/programs/example.md` to something meaningful (e.g., `training.md`, `coverage.md`). The filename becomes the program name. +3. **Edit** — Replace the placeholders with your project's goal, target files, and evaluation command. The template includes three complete examples for inspiration. +4. **Activate** — Remove the `` line at the top. +5. **Compile & push** — `gh aw compile && git add . && git commit -m "Configure autoloop" && git push` + +If you forget to edit the template, the first scheduled run will create a GitHub issue reminding you, with a direct link to edit the file. + +### Adding more loops + +To run multiple optimization loops in parallel, just add more `.md` files to `.github/autoloop/programs/`: + +``` +.github/autoloop/programs/ +├── training.md ← optimize model training loss +├── coverage.md ← maximize test coverage +└── build-perf.md ← minimize build time +``` + +Each program runs independently with its own metric tracking, experiment log issue, and PR namespace. Copy the template, fill it in, and push — the next scheduled run picks it up automatically. + +## Configuration + +Each program file in `.github/autoloop/programs/` has three sections: + +### 1. Goal — What to optimize + +Describe the objective in natural language. Be specific about what "better" means. + +### 2. Target — What files can be changed + +List the files the agent is allowed to modify. Everything else is off-limits. + +### 3. Evaluation — How to measure success + +Provide a command to run and a metric to extract. Specify whether higher or lower is better. + +### Example program file + +````markdown +# Autoloop Program + +## Goal + +Optimize the training script to minimize validation loss on CIFAR-10 +within a 5-minute training budget. + +## Target + +Only modify these files: +- `train.py` +- `config.yaml` + +## Evaluation + +```bash +python train.py --epochs 5 && python evaluate.py --output-json results.json +``` + +Metric: `validation_loss` from `results.json`. Lower is better. +```` + +### Customizing the Schedule + +Edit the workflow's `schedule` field. Examples: +- `every 6h` — 4 times a day (default) +- `every 1h` — hourly iterations +- `daily` — once a day +- `0 */2 * * *` — every 2 hours (cron syntax) + +After editing, run `gh aw compile` to update the workflow. + +Note: The schedule applies to the workflow as a whole — all programs iterate on the same schedule. To run programs at different frequencies, you can install the workflow multiple times with different schedules, each pointing to a subset of programs. + +## Usage + +### Automatic mode + +Once at least one configured program exists, iterations run automatically on schedule. Each run processes every configured program: + +1. Reads the program definition and past history +2. Proposes a single targeted change +3. Runs the evaluation command +4. Accepts (creates draft PR) or rejects (logs the attempt) + +### Manual trigger + +```bash +# Run all programs now +gh aw run autoloop + +# Target a specific program +gh aw run autoloop -- "training: try using cosine annealing" + +# If only one program exists, no prefix needed +gh aw run autoloop -- "try batch size 64 instead of 32" +``` + +### Slash command + +Comment on any issue or PR: +``` +/autoloop training: try batch size 64 instead of 32 +``` + +## Experiment Tracking + +Each program gets its own monthly experiment log issue titled `[Autoloop: {program-name}] Experiment Log {YYYY-MM}`. The issue tracks: + +- Current best metric value +- Full iteration history with accept/reject status +- Links to PRs for accepted changes +- Links to GitHub Actions runs + +## Human in the Loop + +- **Review draft PRs** — accepted improvements appear as draft PRs for human review +- **Merge or close** — you decide which optimizations to keep +- **Adjust programs** — edit any program file to change the goal, targets, or evaluation +- **Add/remove loops** — add or delete files in `.github/autoloop/programs/` +- **Steer via slash command** — use `/autoloop {program}: {instructions}` to direct experiments +- **Pause** — disable the workflow schedule to stop all loops, or add the sentinel back to a single program file to pause just that loop + +## Security + +- Runs with read-only GitHub permissions +- Only modifies files listed in each program's Target section +- Never modifies evaluation scripts +- All changes go through draft PRs requiring human approval +- Uses "safe outputs" to constrain what the agent can create diff --git a/workflows/autoloop.md b/workflows/autoloop.md new file mode 100644 index 0000000..e94e687 --- /dev/null +++ b/workflows/autoloop.md @@ -0,0 +1,519 @@ +--- +description: | + An iterative optimization loop inspired by Karpathy's Autoresearch and Claude Code's /loop. + Runs on a configurable schedule to autonomously improve a target artifact toward a measurable goal. + Each iteration: reads the program definition, proposes a change, evaluates against a metric, + and accepts or rejects the change. Tracks all iterations in a rolling GitHub issue. + - User defines the optimization goal and evaluation criteria in a program.md file + - Accepts changes only when they improve the metric (ratchet pattern) + - Persists state between runs via repo memory + - Creates draft PRs for accepted improvements + - Maintains a living experiment log as a GitHub issue + +on: + schedule: every 6h + workflow_dispatch: + slash_command: + name: autoloop + +permissions: read-all + +timeout-minutes: 45 + +network: + allowed: + - defaults + - node + - python + - rust + - java + - dotnet + +safe-outputs: + add-comment: + max: 5 + target: "*" + hide-older-comments: false + create-pull-request: + draft: true + title-prefix: "[Autoloop] " + labels: [automation, autoloop] + protected-files: fallback-to-issue + max: 2 + push-to-pull-request-branch: + target: "*" + title-prefix: "[Autoloop] " + max: 2 + create-issue: + title-prefix: "[Autoloop] " + labels: [automation, autoloop] + max: 2 + update-issue: + target: "*" + title-prefix: "[Autoloop] " + max: 1 + +tools: + web-fetch: + github: + toolsets: [all] + bash: true + repo-memory: true + +imports: + - shared/reporting.md + +steps: + - name: Check which programs are due + run: | + python3 - << 'PYEOF' + import os, json, re, glob, sys + from datetime import datetime, timezone, timedelta + + programs_dir = ".github/autoloop/programs" + state_file = ".github/autoloop/state.json" + template_file = os.path.join(programs_dir, "example.md") + + # Bootstrap: create programs directory and template if missing + if not os.path.isdir(programs_dir): + os.makedirs(programs_dir, exist_ok=True) + bt = chr(96) # backtick — avoid literal backticks that break gh-aw compiler + template = "\n".join([ + "", + "", + "", + "", + "# Autoloop Program", + "", + "", + "", + "## Goal", + "", + "", + "", + "REPLACE THIS with your optimization goal.", + "", + "## Target", + "", + "", + "", + "Only modify these files:", + f"- {bt}REPLACE_WITH_FILE{bt} -- (describe what this file does)", + "", + "Do NOT modify:", + "- (list files that must not be touched)", + "", + "## Evaluation", + "", + "", + "", + f"{bt}{bt}{bt}bash", + "REPLACE_WITH_YOUR_EVALUATION_COMMAND", + f"{bt}{bt}{bt}", + "", + f"The metric is {bt}REPLACE_WITH_METRIC_NAME{bt}. **Lower/Higher is better.** (pick one)", + "", + ]) + with open(template_file, "w") as f: + f.write(template) + # Commit the template so the user can see and edit it + os.system(f'git add "{template_file}"') + os.system('git commit -m "[Autoloop] Bootstrap: add program template for configuration"') + os.system('git push') + print(f"BOOTSTRAPPED: created {template_file} and pushed to repo") + + # Find all program files + program_files = sorted(glob.glob(os.path.join(programs_dir, "*.md"))) + if not program_files: + # Fallback to single-file locations + for path in [".github/autoloop/program.md", "program.md"]: + if os.path.isfile(path): + program_files = [path] + break + + if not program_files: + print("NO_PROGRAMS_FOUND") + os.makedirs("/tmp/gh-aw", exist_ok=True) + with open("/tmp/gh-aw/autoloop.json", "w") as f: + json.dump({"due": [], "skipped": [], "unconfigured": [], "no_programs": True}, f) + sys.exit(0) + + os.makedirs("/tmp/gh-aw", exist_ok=True) + now = datetime.now(timezone.utc) + due = [] + skipped = [] + unconfigured = [] + + # Schedule string to timedelta + def parse_schedule(s): + s = s.strip().lower() + m = re.match(r"every\s+(\d+)\s*h", s) + if m: + return timedelta(hours=int(m.group(1))) + m = re.match(r"every\s+(\d+)\s*m", s) + if m: + return timedelta(minutes=int(m.group(1))) + if s == "daily": + return timedelta(hours=24) + if s == "weekly": + return timedelta(days=7) + return None # No per-program schedule — always due + + for pf in program_files: + name = os.path.splitext(os.path.basename(pf))[0] + with open(pf) as f: + content = f.read() + + # Check sentinel + if "" in content: + unconfigured.append(name) + continue + + # Check for TODO/REPLACE placeholders + if re.search(r'\bTODO\b|\bREPLACE', content): + unconfigured.append(name) + continue + + # Parse optional YAML frontmatter for schedule + schedule_delta = None + fm_match = re.match(r"^---\s*\n(.*?)\n---\s*\n", content, re.DOTALL) + if fm_match: + for line in fm_match.group(1).split("\n"): + if line.strip().startswith("schedule:"): + schedule_str = line.split(":", 1)[1].strip() + schedule_delta = parse_schedule(schedule_str) + + # Read lightweight state file (committed to repo, not repo-memory) + # state.json tracks: last_run timestamps, pause flags, recent statuses + state = {} + if os.path.isfile(state_file): + try: + with open(state_file) as f: + all_state = json.load(f) + state = all_state.get(name, {}) + except (json.JSONDecodeError, ValueError): + pass + + last_run = None + lr = state.get("last_run") + if lr: + try: + last_run = datetime.fromisoformat(lr.replace("Z", "+00:00")) + except ValueError: + pass + + # Check if paused (e.g., plateau or recurring errors) + if state.get("paused"): + skipped.append({"name": name, "reason": f"paused: {state.get('pause_reason', 'unknown')}"}) + continue + + # Auto-pause on plateau: 5+ consecutive rejections + recent = state.get("recent_statuses", [])[-5:] + if len(recent) >= 5 and all(s == "rejected" for s in recent): + skipped.append({"name": name, "reason": "plateau: 5 consecutive rejections"}) + continue + + # Check if due based on per-program schedule + if schedule_delta and last_run: + if now - last_run < schedule_delta: + skipped.append({"name": name, "reason": "not due yet", + "next_due": (last_run + schedule_delta).isoformat()}) + continue + + due.append(name) + + result = {"due": due, "skipped": skipped, "unconfigured": unconfigured, "no_programs": False} + + os.makedirs("/tmp/gh-aw", exist_ok=True) + with open("/tmp/gh-aw/autoloop.json", "w") as f: + json.dump(result, f, indent=2) + + print("=== Autoloop Program Check ===") + print(f"Programs due: {due or '(none)'}") + print(f"Programs skipped: {[s['name'] for s in skipped] or '(none)'}") + print(f"Programs unconfigured: {unconfigured or '(none)'}") + + if not due and not unconfigured: + print("\nNo programs due this run. Exiting early.") + sys.exit(1) # Non-zero exit skips the agent step + PYEOF + +--- + +# Autoloop + +An iterative optimization agent that proposes changes, evaluates them against a metric, and keeps only improvements — running autonomously on a schedule. + +## Command Mode + +Take heed of **instructions**: "${{ steps.sanitized.outputs.text }}" + +If these are non-empty (not ""), then you have been triggered via `/autoloop `. The instructions may be: +- **A one-off directive targeting a specific program**: e.g., `/autoloop training: try a different approach to the loss function`. The text before the colon is the program name (matching a file in `.github/autoloop/programs/`). Execute it as a single iteration for that program, then report results. +- **A general directive**: e.g., `/autoloop try cosine annealing`. If no program name prefix is given and only one program exists, use that one. If multiple exist, ask which program to target. +- **A configuration change**: e.g., `/autoloop training: set metric to accuracy instead of loss`. Update the relevant program file and confirm. + +Then exit — do not run the normal loop after completing the instructions. + +## Multiple Programs + +Autoloop supports **multiple independent optimization loops** in the same repository. Each loop is defined by a separate markdown file in `.github/autoloop/programs/`. For example: + +``` +.github/autoloop/programs/ +├── training.md ← optimize model training +├── coverage.md ← maximize test coverage +└── build-perf.md ← minimize build time +``` + +Each program runs independently with its own: +- Goal, target files, and evaluation command +- Metric tracking and best-metric history +- Experiment log issue: `[Autoloop: {program-name}] Experiment Log {YYYY-MM}` +- Branch namespace: `autoloop/{program-name}/iteration--` +- PR title prefix: `[Autoloop: {program-name}]` +- Repo memory namespace: keyed by program name + +On each scheduled run, a lightweight pre-step checks which programs are due (based on per-program schedules and `last_run` timestamps). **If no programs are due, the workflow exits before the agent starts — zero agent cost.** Only due programs get iterated. + +### Per-Program Schedule and Timeout + +Programs can optionally specify their own schedule and timeout in a YAML frontmatter block at the top of the file (after the sentinel, if present): + +```markdown +--- +schedule: every 1h +timeout-minutes: 30 +--- + +# Autoloop Program +... +``` + +- **`schedule`**: Controls how often this program runs. On each workflow trigger, check if the program is due based on its schedule and the `last_run` timestamp in memory. If the program's schedule hasn't elapsed since its last run, skip it. If omitted, the program runs on every workflow trigger. +- **`timeout-minutes`**: Maximum time for this program's iteration. If omitted, the program shares the workflow's overall timeout. + +This lets you run a fast coverage check every hour while running a slow training loop once a day — all from the same workflow. + +## Program Definition + +Each program file in `.github/autoloop/programs/` defines three things: + +1. **Goal**: What the agent is trying to optimize (natural language description) +2. **Target**: Which files the agent is allowed to modify +3. **Evaluation**: How to measure whether a change is an improvement + +The **program name** is the filename without the `.md` extension (e.g., `training.md` → program name is `training`). + +### Setup Guard + +A template program file is installed at `.github/autoloop/programs/example.md`. **Programs will not run until the user has edited them.** Each template contains a sentinel line: + +``` + +``` + +At the start of every run, check each program file for this sentinel. For any program where it is present: + +1. **Skip that program — do not run any iterations for it.** +2. If no setup issue exists for that program, create one titled `[Autoloop: {program-name}] Action required: configure your program` with: + - A clear explanation that this program is installed but paused until configured. + - A direct link to edit the file on GitHub (use the repository's default branch in the URL). + - A brief guide: "Open the file, replace the placeholder sections with your project's goal, target files, and evaluation command, then remove the `` line." + - Two or three example programs for inspiration (ML training, test coverage, build performance). + +If **all** programs are unconfigured, exit after creating the setup issues. Otherwise, proceed with the configured programs. + +### Reading Programs + +The pre-step has already determined which programs are due, unconfigured, or skipped. Read `/tmp/gh-aw/autoloop.json` at the start of your run to get: + +- **`due`**: List of program names to run iterations for this run. +- **`unconfigured`**: Programs that still have the sentinel or placeholder content — run the **Setup Guard** for each of these (create setup issues). +- **`skipped`**: Programs not due yet based on their per-program schedule — ignore these entirely. +- **`no_programs`**: If `true`, no program files exist at all — create a single issue explaining how to add a program. + +For each program in `due`: +1. Read the program file from `.github/autoloop/programs/{name}.md`. +2. Parse the three sections: Goal, Target, Evaluation. +3. Read the current state of all target files. +4. Read repo memory for that program's metric history (keyed by program name). + +## Iteration Loop + +Each run executes **one iteration per configured program**. For each program: + +### Step 1: Read State + +1. Read the program file to understand the goal, targets, and evaluation method. +2. Read `.github/autoloop/state.json` for this program's `best_metric` and `iteration_count`. +3. Read repo memory (keyed by program name) for detailed history: + - `history`: Summary of recent iterations (last 20). + - `rejected_approaches`: Approaches that were tried and failed (to avoid repeating). + - `consecutive_errors`: Count of consecutive evaluation failures. + +### Step 2: Analyze and Propose + +1. Read the target files and understand the current state. +2. Review the history of previous iterations — what worked, what didn't. +3. **Think carefully** about what change is most likely to improve the metric. Consider: + - What has been tried before and rejected (don't repeat failures). + - What the evaluation criteria reward. + - Small, targeted changes are more likely to succeed than large rewrites. + - If many small optimizations have been exhausted, consider a larger architectural change. +4. Describe the proposed change in your reasoning before implementing it. + +### Step 3: Implement + +1. Create a fresh branch: `autoloop/{program-name}/iteration--` from the default branch. +2. Make the proposed changes to the target files only. +3. **Respect the program constraints**: do not modify files outside the target list. + +### Step 4: Evaluate + +1. Run the evaluation command specified in `program.md`. +2. Parse the metric from the output. +3. Compare against `best_metric` from memory. + +### Step 5: Accept or Reject + +**If the metric improved** (or this is the first run establishing a baseline): +1. Create a draft PR with: + - Title: `[Autoloop: {program-name}] Iteration : ` + - Body includes: what was changed, why, the old metric, the new metric, and the improvement delta. + - AI disclosure: `🤖 *This change was proposed and validated by Autoloop.*` +2. Add an entry to the experiment log issue. +3. Update repo memory: add to `history`, reset `consecutive_errors` to 0. +4. Update `state.json`: set `best_metric`, increment `iteration_count`, set `last_run`, append `"accepted"` to `recent_statuses`. **Commit and push.** + +**If the metric did not improve** (or evaluation failed): +1. Do NOT create a PR. +2. Update repo memory: add to `rejected_approaches` with what was tried, the resulting metric, and why it likely didn't work. +3. Add a "rejected" entry to the experiment log issue. +4. Update `state.json`: increment `iteration_count`, set `last_run`, append `"rejected"` to `recent_statuses`. **Commit and push.** + +**If evaluation could not run** (build failure, missing dependencies, etc.): +1. Do NOT create a PR. +2. Update repo memory: increment `consecutive_errors`. +3. Add an "error" entry to the experiment log issue. +4. If `consecutive_errors` reaches 3+, set `paused: true` and `pause_reason` in `state.json`, and create an issue describing the problem. +5. Update `state.json`: increment `iteration_count`, set `last_run`, append `"error"` to `recent_statuses`. **Commit and push.** + +## Experiment Log Issue + +Maintain a single open issue **per program** titled `[Autoloop: {program-name}] Experiment Log {YYYY}-{MM}` as a rolling record of that program's iterations. + +### Issue Body Format + +```markdown +🤖 *Autoloop — an iterative optimization agent for this repository.* + +## Program + +**Goal**: {one-line summary from program.md} +**Target files**: {list of target files} +**Metric**: {metric name} ({higher/lower} is better) +**Current best**: {best_metric} (established in iteration {N}) + +## Iteration History + +### Iteration {N} — {YYYY-MM-DD HH:MM UTC} — [Run]({run_url}) +- **Status**: ✅ Accepted / ❌ Rejected / ⚠️ Error +- **Change**: {one-line description} +- **Metric**: {value} (previous best: {previous_best}, delta: {delta}) +- **PR**: #{number} (if accepted) + +### Iteration {N-1} — {YYYY-MM-DD HH:MM UTC} — [Run]({run_url}) +- **Status**: ❌ Rejected +- **Change**: {one-line description} +- **Metric**: {value} (previous best: {previous_best}, delta: {delta}) +- **Reason**: {why it was rejected} +``` + +### Format Rules + +- Iterations in **reverse chronological order** (newest first). +- Each iteration heading links to its GitHub Actions run. +- Use `${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}` for the current run URL. +- Close the previous month's issue and create a new one at month boundaries. +- Maximum 50 iterations per issue; create a continuation issue if exceeded. + +## State and Memory + +Autoloop uses **two persistence layers**: + +### 1. State file (`.github/autoloop/state.json`) — lightweight, committed to repo + +This file is read by the **pre-step** (before the agent starts) to decide which programs are due. The agent **must update this file and commit it** at the end of every iteration. This is the only way the pre-step can check schedules, plateaus, and pause flags on future runs. + +```json +{ + "training": { + "last_run": "2025-01-15T12:00:00Z", + "best_metric": 0.0234, + "iteration_count": 17, + "paused": false, + "pause_reason": null, + "recent_statuses": ["accepted", "rejected", "rejected", "accepted", "accepted"] + }, + "coverage": { + "last_run": "2025-01-15T06:00:00Z", + "best_metric": 78.4, + "iteration_count": 5, + "paused": false, + "pause_reason": null, + "recent_statuses": ["accepted", "accepted", "rejected", "accepted", "accepted"] + } +} +``` + +**After every iteration** (accepted, rejected, or error), update this program's entry in `state.json`: +- Set `last_run` to the current UTC timestamp. +- Update `best_metric` if the iteration was accepted. +- Increment `iteration_count`. +- Append the status (`"accepted"`, `"rejected"`, or `"error"`) to `recent_statuses` (keep last 10). +- Set `paused`/`pause_reason` if needed. +- **Commit and push** the updated `state.json` to the default branch. + +### 2. Repo memory — full history for the agent + +Use repo-memory (keyed by program name, e.g., `autoloop/training`) for detailed state the agent needs but the pre-step doesn't: + +```json +{ + "program_name": "training", + "history": [ + { + "iteration": 17, + "status": "accepted", + "description": "Reduced learning rate warmup from 5 to 3 epochs", + "metric": 0.0234, + "previous_best": 0.0241, + "pr": 42 + } + ], + "rejected_approaches": [ + { + "iteration": 16, + "description": "Switched from Adam to SGD with momentum", + "metric": 0.0298, + "reason": "SGD converges slower within the 5-minute budget" + } + ], + "consecutive_errors": 0 +} +``` + +## Guidelines + +- **One change per iteration.** Keep changes small and targeted. A single hyperparameter tweak, a minor architectural modification, or a focused code optimization. This makes it clear what caused metric changes. +- **No breaking changes.** Target files must remain functional even if the iteration is rejected. +- **Respect the evaluation budget.** If the evaluation command has a time constraint (e.g., 5-minute training), respect it. Do not modify evaluation scripts or timeout settings. +- **Learn from history.** The rejected_approaches list exists to prevent repeating failures. Read it carefully before proposing changes. +- **Diminishing returns.** If the last 5 consecutive iterations were rejected, post a comment on the experiment log suggesting the user review the program definition — the optimization may have plateaued. +- **Transparency.** Every PR and comment must include AI disclosure with 🤖. +- **Safety.** Never modify files outside the target list. Never modify the evaluation script. Never modify program.md (except via `/autoloop` command mode). +- **Read AGENTS.md first**: before starting work, read the repository's `AGENTS.md` file (if present) to understand project-specific conventions. +- **Build and test**: run any build/test commands before creating PRs. If your changes break the build, reject the iteration. diff --git a/workflows/autoloop/programs/example.md b/workflows/autoloop/programs/example.md new file mode 100644 index 0000000..7339bde --- /dev/null +++ b/workflows/autoloop/programs/example.md @@ -0,0 +1,142 @@ + + + + + + +# Autoloop Program + + + +## Goal + + + +REPLACE THIS with your optimization goal. For example: +- "Minimize validation loss on CIFAR-10 within a 5-minute training budget" +- "Maximize test coverage percentage across the project" +- "Reduce production Docker image size" + +## Target + + + +Only modify these files: +- `REPLACE_WITH_FILE_1` — (describe what this file does) +- `REPLACE_WITH_FILE_2` — (describe what this file does) + +Do NOT modify: +- (list files that should never be touched, e.g., evaluation scripts, data files) + +## Evaluation + + + +Run the following command to evaluate: + +```bash +REPLACE_WITH_YOUR_EVALUATION_COMMAND +``` + +The metric is `REPLACE_WITH_METRIC_NAME` from the output. **Lower/Higher is better.** (pick one) + +A change is accepted if the metric strictly improves over the previous best. +The first run establishes the baseline. + +--- + +