You are running an optimization iteration for the autoagent skill.
Read these files from the sandbox:
current-guidance.md- The guidance being optimizedscores.md- History of scores and changesscoring.md- How to measure successfixtures/test-cases.json- Test inputs
- Read current guidance
- Review score history (last 10 runs)
- Identify patterns in what's been tried
- Note current score
Generate ONE specific edit to the guidance that might improve the score. The edit should:
- Be specific and actionable
- Address a weakness identified in scoring
- Not be identical to recently tried changes
Format:
## Proposed Edit
**Rationale:** Why this change might help
**Change:**[Show exact diff or new text]
Write the edited guidance to current-guidance.md
Use a subagent to run the task with the new guidance:
- Give the subagent current-guidance.md
- Provide test inputs from fixtures/test-cases.json
- Capture the output
Evaluate the output against scoring.md criteria. Generate a score 0-100.
Append to scores.md:
| N | Description of change | SCORE | keep/discard |Where N is the run number (increment from last).
- If score improved: Keep the edit (current-guidance.md is already updated)
- If score declined: Revert current-guidance.md to previous version
If last 10 scores are identical:
- Log "Plateau detected - pausing"
- Notify user
- Stop the cron (or pause and await user override)
After completing:
- Updated scores.md
- Possibly updated current-guidance.md (if change kept)
- Brief summary of what happened
- Be specific in edits - vague changes won't score well
- Learn from score history - don't repeat failed approaches
- Non-destructive: original guidance stays in guidance-under-test.md