fix: flywheel kill outcome stamping by n1arash · Pull Request #21 · VisionForge-OU/foreman

n1arash · 2026-06-25T07:51:48Z

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…urns (VisionForge-OU#1) Turn-budget exhaustion was the dominant failure in the dogfood soak test (49% of runs killed_turns) because the per-run turn cap had no relationship to the model running the work — a 30-turn cap starves a small/cheap model. - turns.py: pure policy. classify_model (tier by family substring; unknown -> small/generous), effective_turns (exact per-model pin, else floor = max(configured, tier_floor x phase_factor)), resolve_budget. - config.py: turn_budget_by_model / turn_tiers / phase_turn_factors; extension_wall_min (30) + extension_cost_usd (3.0); max_turn_extensions default 2 -> 6 (now a backstop). Validation + round-trip + installer YAML. - runner.should_extend: cumulative wall + cost ceiling is the primary stop; count is a backstop (0 disables it). New run_duration_min helper. - Wired resolve_budget + chain wall/cost tracking into all 3 extension loops (pipeline._spawn, issue_run worker, scheduler _run_agent_with_extensions) and the init spec. - Loud killed_turns: TUI worker_finished flags a non-clean terminal_reason; BuildReport gains a "Turn-killed runs" section populated from usage records. Tests: +38 (test_turns, config, budget_policy, controller, pipeline wiring, scheduler report). Full suite 406 passed. Existing turn-kill tests neutralize the new floor (they target the extension loop, not budget sizing). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

The dogfood campaign drafted 0 retro proposals while 49% of runs ended killed_turns. Intermediate turn-extension runs (and phase agents) were persisted with a blank outcome, the read side maps blank -> legacy, and cluster_failures skips legacy — so the dominant failure was invisible and retro correctly-but-uselessly found nothing. - metrics: add completed / killed_* / error as first-class outcomes, KILL_OUTCOMES, and terminal_outcome() - state.write_run_record: default a blank outcome from terminal_reason at the one choke point every run flows through (worker intermediates, phase agents, bounced attempts) — no terminal run is ever persisted blank again - retro._signature: cluster the kill reasons (surfacing killed_turns); exclude the non-failures completed / killed_user - retro.propose_for_clusters: a high kill rate (>=20% of runs, >=2 runs) deterministically drafts a concrete, gated fix — killed_turns proposes the turn-budget fix directly - driver.analyze: prepend the deterministic kill-rate proposals (deduped) so a silent analysis agent no longer means "found nothing" Full suite: 417 passed. 11 new tests cover every acceptance criterion, including a regression: a fixture of killed runs yields a non-empty proposal set. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

n1arash and others added 3 commits June 20, 2026 19:40

docs: design spec for model-aware turn budgets (issue VisionForge-OU#1)

b80e0bb

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: flywheel kill outcome stamping#21

fix: flywheel kill outcome stamping#21
n1arash wants to merge 3 commits into
VisionForge-OU:mainfrom
n1arash:fix/flywheel-kill-outcome-stamping

n1arash commented Jun 25, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

n1arash commented Jun 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

n1arash commented Jun 25, 2026 •

edited

Loading