Skip to content

fix: flywheel kill outcome stamping#21

Open
n1arash wants to merge 3 commits into
VisionForge-OU:mainfrom
n1arash:fix/flywheel-kill-outcome-stamping
Open

fix: flywheel kill outcome stamping#21
n1arash wants to merge 3 commits into
VisionForge-OU:mainfrom
n1arash:fix/flywheel-kill-outcome-stamping

Conversation

@n1arash

@n1arash n1arash commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

#2

n1arash and others added 3 commits June 20, 2026 19:40
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…urns (VisionForge-OU#1)

Turn-budget exhaustion was the dominant failure in the dogfood soak test
(49% of runs killed_turns) because the per-run turn cap had no relationship
to the model running the work — a 30-turn cap starves a small/cheap model.

- turns.py: pure policy. classify_model (tier by family substring; unknown ->
  small/generous), effective_turns (exact per-model pin, else floor =
  max(configured, tier_floor x phase_factor)), resolve_budget.
- config.py: turn_budget_by_model / turn_tiers / phase_turn_factors;
  extension_wall_min (30) + extension_cost_usd (3.0); max_turn_extensions
  default 2 -> 6 (now a backstop). Validation + round-trip + installer YAML.
- runner.should_extend: cumulative wall + cost ceiling is the primary stop;
  count is a backstop (0 disables it). New run_duration_min helper.
- Wired resolve_budget + chain wall/cost tracking into all 3 extension loops
  (pipeline._spawn, issue_run worker, scheduler _run_agent_with_extensions)
  and the init spec.
- Loud killed_turns: TUI worker_finished flags a non-clean terminal_reason;
  BuildReport gains a "Turn-killed runs" section populated from usage records.

Tests: +38 (test_turns, config, budget_policy, controller, pipeline wiring,
scheduler report). Full suite 406 passed. Existing turn-kill tests neutralize
the new floor (they target the extension loop, not budget sizing).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The dogfood campaign drafted 0 retro proposals while 49% of runs ended
killed_turns. Intermediate turn-extension runs (and phase agents) were
persisted with a blank outcome, the read side maps blank -> legacy, and
cluster_failures skips legacy — so the dominant failure was invisible and
retro correctly-but-uselessly found nothing.

- metrics: add completed / killed_* / error as first-class outcomes,
  KILL_OUTCOMES, and terminal_outcome()
- state.write_run_record: default a blank outcome from terminal_reason at the
  one choke point every run flows through (worker intermediates, phase agents,
  bounced attempts) — no terminal run is ever persisted blank again
- retro._signature: cluster the kill reasons (surfacing killed_turns); exclude
  the non-failures completed / killed_user
- retro.propose_for_clusters: a high kill rate (>=20% of runs, >=2 runs)
  deterministically drafts a concrete, gated fix — killed_turns proposes the
  turn-budget fix directly
- driver.analyze: prepend the deterministic kill-rate proposals (deduped) so a
  silent analysis agent no longer means "found nothing"

Full suite: 417 passed. 11 new tests cover every acceptance criterion,
including a regression: a fixture of killed runs yields a non-empty proposal set.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant