Skip to content

feat(compile): autorewrite front matter via detection-based codemods#476

Open
jamesadevine wants to merge 8 commits intomainfrom
feat/frontmatter-migrations
Open

feat(compile): autorewrite front matter via detection-based codemods#476
jamesadevine wants to merge 8 commits intomainfrom
feat/frontmatter-migrations

Conversation

@jamesadevine
Copy link
Copy Markdown
Collaborator

@jamesadevine jamesadevine commented May 8, 2026

Summary

Every breaking change to the front-matter grammar (e.g. engine: claude-opus-4.5engine: { id: copilot, model: ... }, read-only-service-connectionpermissions:, upload-build-artifactupload-build-attachment) currently produces a hard YAML deserialization error for users on older sources. They have to upgrade by hand before the compiler will even read their file.

This PR introduces a codemod framework — flat, append-only, detection-based transformations that rewrite deprecated front-matter shapes to current ones. The model mirrors gh-aw's pkg/cli/fix_codemods.go; the project we explicitly say we're inspired by in AGENTS.md.

The compiler runs the registry in memory during compile, then — only on a fully successful compile — atomically rewrites the source .md and emits a clear stderr warning. Failed compiles leave the source byte-identical. There is no schema-version field on user front matter; users never see the framework's bookkeeping.

The framework ships with an empty registry; the historical breaking changes can land in dedicated follow-up PRs. See docs/codemods.md for a worked example (engine_id_split).

What's in

  • Codemod data model: { id, summary, introduced_in, apply: fn(&mut Mapping, &Ctx) -> Result<bool> }. apply returns true when it modified the mapping, false for no-op (mapping doesn't match), Err for hard failure.
  • CODEMODS: &[&'static Codemod] static registry in src/compile/codemods/mod.rs. Append-only; order matters when codemods compose. CodemodContext is #[non_exhaustive] for forward-compat.
  • Runner is a flat for-loop. Idempotent codemods can re-run on every compile cheaply when nothing matches.
  • Codemod helpers: take_key, insert_no_overwrite, rename_key(ConflictPolicy). Default policy is error (no silent overwrite). rename_key is transactional — mapping unchanged on the error path.
  • Three-stage parse (parse_markdown_detailed): untyped Mapping → run codemods → typed FrontMatter. Rejects non-mapping top-level YAML. Body preserved byte-for-byte. Leading whitespace before the opening --- preserved.
  • Atomic write helper (tempfile + rename, preserves Unix mode, replaces symlinks). Used for both source rewrites and .lock.yml writes.
  • Compile flow runs full compile first; only on success does it perform a SHA-256 lost-update check + atomic source rewrite + stderr warning.
  • check command bails with <path> contains deprecated front-matter shapes that need codemod fixes. hint: run ado-aw compile to apply the codemods in place. when codemods would fire. Rationale: in-pipeline integrity check pins to its own ado-aw version, so failing locally only impacts a developer running newer ado-aw against an older source — exactly when we want to fail loudly.
  • execute (Stage 3) runs codemods in memory only via log::warn!; never rewrites source.
  • compile_all_pipelines summary line includes ; N source file(s) rewritten by codemods. when any rewrite occurred.

Why codemods (not version-stamped migrations)?

Earlier in the branch's history we shipped a Django/Rails-style versioned migration chain with a schema-version: <u32> field on FrontMatter. After landing it through review, we noticed gh-aw — our explicit reference — uses codemods with no version stamping. We aligned to keep:

  • No user-visible compiler bookkeeping in front matter (gh-aw users never see a version stamp).
  • The compile-time autorewrite UX we already had (better than gh-aw's gh aw fix --write for our users).
  • The same idempotency-by-construction guarantee gh-aw relies on.

Trade-offs

  • Front-matter comments are not preserved when codemods rewrite the source (serde_yaml round-trip drops them). Body markdown is preserved byte-for-byte. Documented prominently in the warning and docs/codemods.md.
  • The rewrite uses serde_yaml's default formatting; quote/scalar styles may be normalized. The skip-rewrite condition (new_content != original) avoids gratuitous diffs when no semantic change occurred.
  • Initial PR ships zero codemods — historical breaking changes can land as one-file follow-up PRs (see the worked example in docs/codemods.md).

Test plan

cargo build clean. cargo test runs 1349 tests, all passing across four layers:

  • Unit (~25 tests in src/compile/codemods/{mod.rs,helpers.rs}): registry id uniqueness, filename-prefix sanity, runner empty/all-no-op/mixed/error-context/idempotency paths, helper conflict policies, transactional rename_key.
  • Type/parse (~10 tests in src/compile/{types.rs,common.rs}): parse_markdown_detailed byte preservation across trailing-newline / no-trailing-newline / blank-lines variants, leading whitespace tolerance, non-mapping top-level rejection, source SHA-256 stability, atomic_write symlink replacement and Unix mode preservation.
  • White-box integration (4 tests in src/compile/codemod_integration_test.rs): exercise the rewrite path end-to-end using a stub Codemod registry injected via crate-private compile_pipeline_with_registry / parse_markdown_detailed_with_registry hooks. Covers stale-source rewrite + body byte-preservation, no-op skip when no codemods apply, no-op skip when codemod returns false, lost-update guard fires on concurrent edits.
  • Black-box CLI subprocess integration (3 tests in tests/codemod_tests.rs): healthy compile of a current source (verifies no rewrite, source byte-identical), full compilecheck round-trip, non-mapping top-level YAML rejection.

Manual smoke test:

$ cat agent.md
---
name: smoke
description: smoke
---
## Body

$ ado-aw compile agent.md
Generated standalone pipeline: agent.lock.yml

$ cat agent.md   # byte-identical, no schema-version added
---
name: smoke
description: smoke
---
## Body

No new clippy warnings beyond the 4 pre-existing ones on main.

Introduces a Django/Rails-style schema-version field on FrontMatter and a
registered migration framework so breaking grammar changes auto-migrate
existing user sources during compile rather than hard-failing.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

🔍 Rust PR Review

Summary: Solid design, well-tested — a couple of small issues worth fixing before merge.


Findings

🐛 Bugs / Logic Issues

  • src/compile/mod.rs — Double "Error: error:" in check_pipeline's bail message

    anyhow::bail!(
        "error: {} is at schema-version {}; ...",   // ← "error:" prefix

    anyhow::bail! already produces an error value; when the top-level handler prints it (typically eprintln!("Error: {:#}", e)), the user will see Error: error: .... The "error:" prefix should be removed from the string literal, and the "hint:" clause should be on a separate line via \n hint: ... (consistent with how hints are typically formatted in CLIs).

  • src/compile/mod.rs — Unnecessary FrontMatter round-trip in compile_pipeline_inner

    When migrations.changed(), a fresh ParsedSource is constructed just to pass to perform_source_rewrite_if_needed:

    let parsed_for_rewrite = common::ParsedSource {
        front_matter: serde_yaml::from_value(serde_yaml::Value::Mapping(
            front_matter_mapping.clone(),
        ))
        .context("Failed to round-trip migrated front matter for rewrite")?,
        ...
    };

    perform_source_rewrite_if_needed only reads parsed.front_matter_mapping, parsed.body_raw, parsed.source_sha256, and parsed.migrations — it never touches parsed.front_matter. So this extra serde_yaml::from_value deserialization is dead work. It's not a correctness bug (the second deserialize should succeed if the first one did in stage 3), but it adds a spurious failure surface: if from_value fails here for any reason, the rewrite aborts after pipeline_yaml has been computed, potentially leaving the user confused about why rewrite failed despite compilation succeeding. Consider either removing the front_matter field from ParsedSource altogether, or making perform_source_rewrite_if_needed take the three fields it actually needs rather than the full struct.

⚠️ Suggestions

  • tests/migration_tests.rs:45 — Misleading comment on rand_suffix

    The comment reads // Add a thread-local counter to avoid intra-process collisions but no such counter is implemented — only nanosecond timestamp is used. Two tests scheduled to the same CPU in rapid succession could theoretically get the same suffix. Since the label arg already disambiguates most cases this is low-risk, but the comment is factually wrong and should either be removed or the counter should be added (e.g. std::sync::atomic::AtomicU64).

  • src/compile/common.rs — Leading whitespace silently stripped on migration rewrite

    parse_markdown_detailed tolerates leading ASCII whitespace (via leading_ws skip), but reconstruct_source always emits "---\n..." with no leading whitespace. If a source file starts with whitespace and a migration applies, the rewrite silently strips that leading whitespace in addition to bumping schema-version. The stderr warning already says "front-matter comments are not preserved" — it should also note that leading whitespace before --- is normalised, to avoid surprising users whose editors/tools add a BOM or leading blank line.

✅ What Looks Good

  • The three-stage parse (untyped Value → migrate_front_matter_with → typed FrontMatter) is exactly the right architecture. Separating the migration pass from typed deserialization means old shapes that no longer parse as FrontMatter can still be upgraded.
  • The rewrite-ordering guarantee is airtight: compile first (in memory), then rewrite source, then write .lock.yml. A failure at any step leaves the other files untouched. The inline comment explains the rationale clearly.
  • Lost-update guard (SHA-256 snapshot + re-read before write) is a clean solution to the TOCTOU race.
  • atomic_write using tempfile::NamedTempFile + persist (same-filesystem rename) is the correct approach; Unix mode preservation is a nice touch.
  • MigrationContext as a #[non_exhaustive] empty struct is the right forward-compat pattern for evolving migration signatures without breaking existing fn pointers.
  • Four-layer test coverage (unit → parse/type → white-box integration with stub registry → black-box CLI subprocess) is thorough and well-organized.
  • CURRENT_SCHEMA_VERSION = 1 + MIGRATIONS.len() as u32 — can never drift from the registry. Elegant.

Generated by Rust PR Reviewer for issue #476 · ● 1M ·

Address code review finding: the runner did `1 + registry.len() as u32`
and `m.to_version == current + 1` without overflow checks. With
realistic registries this is unreachable, but rust panics on overflow
in debug mode and wraps in release. Switch to checked_add so we
surface a clear error either way; preserves existing behavior on all
realistic inputs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

🔍 Rust PR Review

Summary: Strong design with a few correctness issues worth addressing before merge.

Findings

🐛 Bugs / Logic Issues

  • check_pipeline double-prefixed error message (src/compile/mod.rs, the new check_pipeline hunk): The anyhow::bail! message starts with "error: ... \nhint: ...". When anyhow formats the error at the call site (typically as "Error: {e:#}"), users see "Error: error: ... hint: ..." — the error: prefix is duplicated. Drop the embedded "error:" prefix from the bail string; it's redundant with anyhow's own formatting.

  • Leading whitespace silently dropped on rewrite (src/compile/common.rs, reconstruct_source): reconstruct_source always emits "---\n..." with no leading bytes. The new parser accepts leading ASCII whitespace before --- (the leading_ws stripping), but that whitespace is never stored in ParsedSource. If a file has leading whitespace and a migration runs, the atomic rewrite will strip it silently — the user's source changes more than just the migrated fields. The stderr warning only mentions "front-matter comments not preserved"; it doesn't mention this stripping. Either strip unconditionally in the parser (like the old content.trim() did) and store a note in ParsedSource, or extend the warning. Given the old parse_markdown did content.trim() anyway, stripping on rewrite is arguably correct behaviour, but it should be called out in the emitted warning and in docs/migrations.md.

⚠️ Suggestions

  • Unnecessary FrontMatter round-trip in compile_pipeline_inner (src/compile/mod.rs): Inside if migrations.changed() you construct parsed_for_rewrite and re-deserialize front_matter_mapping into a typed FrontMatter. But perform_source_rewrite_if_needed only calls reconstruct_source, which uses front_matter_mapping and body_raw — the typed front_matter field is never read. You can either refactor perform_source_rewrite_if_needed to accept the individual fields it actually needs, or document why the full ParsedSource wrapper is kept (e.g., "future-proofing").

  • MigrationContext should be #[non_exhaustive] (src/compile/migrations/mod.rs): The comment says it's kept in the signature so future migrations can receive extra context. Without #[non_exhaustive], adding a required field to MigrationContext is a breaking change for any downstream consumer of the migration crate API (even internal ones — today it's let ctx = MigrationContext {}; in the runner). Since the intent is extensibility, add #[non_exhaustive]. All migration apply functions already take _ctx: &MigrationContext and never construct the context themselves, so this is a zero-diff change for existing code.

  • schema_version field in FrontMatter is read-only dead weight (src/compile/types.rs): The field exists to satisfy deny_unknown_fields (so deserialization doesn't reject a migrated source that already has schema-version: N), which is valid. But the #[allow(dead_code)] annotation is a code smell signalling that the compiler agrees. Consider either (a) adding a validation in compile_pipeline_inner that asserts front_matter.schema_version == migrations::CURRENT_SCHEMA_VERSION after a successful migration (giving the field a purpose), or (b) keeping the allow but adding an explicit comment: // Field exists to satisfy deny_unknown_fields; the authoritative version lives in the untyped mapping.

✅ What Looks Good

  • Atomic write implementation is correct: tempfile in the same directory guarantees a same-filesystem rename; Unix mode preservation; symlink replacement (not symlink-follow); fsync before rename. The test coverage is thorough.
  • Lost-update guard: SHA-256 snapshot taken at parse time, re-verified before persist — exactly right for a tool that can be run concurrently.
  • Registry contiguity validation at runtime (ensure! in the runner) + the migration_filenames_match_from_version test covering the filesystem layout — makes it very hard to introduce a gap silently.
  • Defensive predicates for from_version: 1 are well-documented in docs/migrations.md. The worked example is a model reference.
  • Error chaining throughout uses .with_context(||...) correctly; no silent unwrap()s on user-facing paths.
  • The three-layer test strategy (unit → white-box integration via registry injection → black-box CLI subprocess) is well-structured. The stub registry injection approach avoids polluting the production registry for test coverage.

Generated by Rust PR Reviewer for issue #476 · ● 1.8M ·

Four findings from the Rust PR Reviewer bot:

1. check_pipeline's bail message had a redundant "error:" prefix that
   produced "Error: error: ..." once anyhow's top-level handler added
   its own "Error:" wrapper. Drop the prefix and reformat the hint
   onto an indented continuation line.

2. compile_pipeline_inner did a redundant serde_yaml::from_value
   round-trip just to satisfy ParsedSource.front_matter, even though
   perform_source_rewrite_if_needed never reads that field. Refactor
   the helper to take the four primitive fields it actually uses
   (mapping, body_raw, source_sha256, migrations) instead of the full
   struct. reconstruct_source likewise takes individual fields now.

3. Misleading comment in tests/migration_tests.rs claimed a
   "thread-local counter" was used; only a nanosecond timestamp was.
   Replaced rand_suffix() with timestamp + AtomicU64 monotonic seq so
   parallel tests scheduled in the same nanosecond always get
   distinct dirs.

4. Leading whitespace before the opening `---` was silently stripped
   on migration rewrite. parse_markdown_detailed already tolerates it
   on read; capture it as a new ParsedSource.leading_whitespace field
   and emit it from reconstruct_source so byte-faithful preservation
   extends to whitespace prefixes (BOM-stripped editor blank lines,
   etc.).

All 1369 tests still pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@jamesadevine jamesadevine force-pushed the feat/frontmatter-migrations branch from 0545068 to 72e97c7 Compare May 8, 2026 21:14
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

🔍 Rust PR Review

Summary: Solid architecture and excellent test coverage — one real bug in rename_key, two minor nits worth fixing.


Findings

🐛 Bugs / Logic Issues

  • src/compile/migrations/helpers.rsrename_key leaves mapping inconsistent on ConflictPolicy::Error

    take_key(m, old) removes the old key from the mapping before the conflict check runs. When both old and new exist and policy is Error, bail! fires but old is already gone — the mapping is left partially mutated:

    // old already removed here ↓
    let Some(old_value) = take_key(m, old) else { ... };
    let new_present = m.contains_key(...);
    match (new_present, policy) {
        (true, ConflictPolicy::Error) => bail!(...),  // old is lost!
    ```
    
    The existing test (`rename_key_new_present_with_error_policy_fails`) only asserts the error message; it doesn't verify the mapping state after the error, so it misses this.
    
    **Practical impact is nil** — the error propagates up through `migrate_front_matter_with` → `parse_markdown_detailed_with_registry`, compilation fails, and `perform_source_rewrite_if_needed` is never called. The on-disk file is never touched. But a future migration that chains multiple `rename_key` calls would see a partially-mutated mapping for any error diagnostics added after the call, and the helper's documented contract ("error on conflict") implies the mapping should be unchanged on error.
    
    Fix: check for `new_present` before calling `take_key`, or re-insert the old value on the error path.

⚠️ Suggestions

  • src/compile/migrations/mod.rs:set_schema_version — key appended to end of YAML block

    Mapping::insert appends the key when absent. For sources that never had schema-version, after migration it appears at the bottom of the front-matter block rather than at the top as shown in the documentation examples and the docs/front-matter.md template. The feature still works correctly, but users comparing the output to docs will see a different position. Consider inserting into a fresh Mapping (schema-version first, then the other keys) inside reconstruct_source so the serialized output matches the canonical format.

  • tests/migration_tests.rs:18 — stale module path in doc comment

    The comment says:

    //! by the white-box tests in `src/compile/migrations/integration_test.rs`,
    

    The actual path is src/compile/migration_integration_test.rs (directly under compile/, not under migrations/).

✅ What Looks Good

  • Three-stage parse (untyped → migrate → typed) is exactly the right architecture for surviving renamed/removed fields that would trip deny_unknown_fields.
  • Lost-update protection with SHA-256 snapshot + re-read before atomic rename is clean and handles the concurrent-editor case correctly.
  • atomic_write with tempfile + rename + sync_all + Unix mode preservation is production-quality; the drive-by fix to .lock.yml writes is a nice bonus.
  • Registry contiguity + filename-prefix tests make it hard to ship a malformed registry silently after a bad rebase.
  • Test depth across four layers (unit, parse, white-box integration, black-box CLI subprocess) is exactly right for a framework that modifies files on disk.
  • check fails on pending migrations and execute migrates in-memory only — both are the right policy choices and are clearly documented.

Generated by Rust PR Reviewer for issue #476 · ● 1.2M ·

Four findings from a fresh PR Reviewer pass:

1. `rename_key` left the mapping partially mutated when
   `ConflictPolicy::Error` fired: `take_key` removed `old` before the
   conflict check ran, so the bail observed a half-mutated state.
   Refactor to be transactional — check both keys + policy first, then
   mutate. Document the invariant in the doc comment and extend the
   existing test to assert both keys are intact on error.

2. `set_schema_version` appended the field to the bottom of the
   mapping (default `Mapping::insert` behavior), so rewritten sources
   showed `schema-version` at the end of the front-matter block,
   diverging from the canonical template in `docs/front-matter.md`.
   Add a `hoist_schema_version` step inside `reconstruct_source` so
   the serialized output places `schema-version` first while
   preserving the relative order of the user's other keys.

3. Doc comment in `tests/migration_tests.rs` referenced the wrong
   path (`migrations/integration_test.rs`); the actual file is
   `migration_integration_test.rs` directly under `compile/`.

4. The `#[allow(dead_code)]` on `FrontMatter::schema_version` was
   noted as a code smell. Add an explicit explanatory comment that
   the field exists solely to satisfy `deny_unknown_fields` and that
   the authoritative version lives in the untyped mapping (set by the
   migration runner before typed deserialization).

All 1370 tests pass. New test: `reconstruct_source_hoists_schema_version_to_top`.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

🔍 Rust PR Review

Summary: Well-designed, production-ready migration framework — architecture is sound, error handling is thorough, and the test pyramid (unit → white-box integration → black-box CLI) gives strong confidence. A few observations worth noting.

Findings

⚠️ Suggestions

  • src/compile/mod.rs:26-27parse_markdown exported with #[allow(unused_imports)]
    parse_markdown is now backed by parse_markdown_detailed but is exported with #[allow(unused_imports)] and the implementation has #[allow(dead_code)]. If external consumers rely on it that's fine, but if there are none the symbol and its suppressed warnings could quietly rot. Worth confirming whether any downstream consumer (outside this crate) needs it; if not, removing the re-export now avoids future confusion.

  • src/compile/common.rs — sanitized values not persisted on rewrite
    sanitize_config_fields() mutates the typed FrontMatter but the front_matter_mapping (used for the rewrite) is left unsanitized. The in-code comment explicitly acknowledges this intent ("we don't want to silently rewrite user source bytes via sanitize on disk"), which is reasonable. Just worth ensuring this is also documented in docs/migrations.md so future migration authors don't assume the mapping they receive is sanitized. Right now docs/migrations.md is silent on this.

  • src/compile/migrations/mod.rs:86CURRENT_SCHEMA_VERSION and migrate_front_matter both have #[allow(dead_code)]
    These are only dead today because the registry is empty. Once migrations exist, both will be used naturally. The attributes are defensive noise; they'll need removal once the first real migration lands. Low priority, but worth noting for follow-up contributors.

  • tests/migration_tests.rs:69-71 — temp dir not cleaned up on test panic
    let _ = fs::remove_dir_all(&dir) only runs if the test passes to that line. A panicking assertion leaves the temp dir behind. Using tempfile::tempdir() (which auto-removes on drop) or a custom RAII guard would be more robust. Pre-existing pattern in compiler_tests.rs, but this is a new file and an opportunity to use the cleaner pattern.

✅ What Looks Good

  • SHA-256 lost-update guard is elegant: snapshot the hash at parse time, re-verify before writing. Clean and correct.
  • atomic_write: tempfile-in-same-directory + rename is the right approach; mode preservation on Unix and symlink-replacement semantics are both tested.
  • Two-stage parse (untyped mapping → migrate → typed FrontMatter) is exactly the right design — migrations run before typed deserialization so they can fix shapes that deny_unknown_fields would otherwise reject.
  • Runtime contiguity validation (ensure! on from_version/to_version at each step) guards against bad rebases or cherry-picks that could silently skip a migration.
  • #[non_exhaustive] on MigrationContext is good API foresight — future fields can be added without breaking existing apply fn pointers.
  • Empty registry ships: avoids a large historical-migration blast radius in this PR; follow-up PRs can self-heal one breaking change at a time.
  • check bails on pending migrations with a clear hint pointing at compile — exactly the right failure mode for the in-pipeline integrity check.

Generated by Rust PR Reviewer for issue #476 · ● 958K ·

…codemods

Aligns with gh-aw's codemod model. The previous design (versioned
chain + `schema-version` field on user front matter) leaked an
internal compiler concern into user files. gh-aw — the project we
explicitly say we're inspired by in AGENTS.md — uses 41 codemods
with no version stamping; users never see a `schema-version` field.

This refactor brings ado-aw to the same model while keeping our
ergonomic compile-time autorewrite UX (gh-aw uses a separate
`gh aw fix --write` command, we don't).

Changes:

- Drop FrontMatter.schema_version field, default_schema_version fn,
  and the SCHEMA_VERSION_KEY const.
- Rename src/compile/migrations/ to src/compile/codemods/. Rename
  src/compile/migration_integration_test.rs to
  src/compile/codemod_integration_test.rs. Rename
  tests/migration_tests.rs to tests/codemod_tests.rs. Rename
  docs/migrations.md to docs/codemods.md (rewritten from scratch).
- Replace Migration / MigrationContext / MigrationReport /
  AppliedMigration / MIGRATIONS with Codemod / CodemodContext /
  CodemodReport / AppliedCodemod / CODEMODS.
- Codemod.apply signature changes from `fn(&mut Mapping, &Ctx) ->
  Result<()>` to `fn(&mut Mapping, &Ctx) -> Result<bool>` so each
  codemod reports whether it actually modified the mapping.
- Simplify the runner: flat per-codemod loop, no version arithmetic,
  no contiguity checks, no overflow guards, no future-version
  rejection.
- Drop hoist_schema_version from reconstruct_source — no
  schema-version field to hoist.
- Update parse_markdown_detailed, compile_pipeline_inner,
  perform_source_rewrite_if_needed, check_pipeline, run_execute,
  and compile_all_pipelines to use the codemod types and reword
  their messages.
- Helpers (take_key, insert_no_overwrite, rename_key,
  ConflictPolicy) are unchanged — they're as useful for codemods as
  they were for migrations. rename_key already returned
  Result<bool>, matching the new codemod contract.
- Drop ~13 schema-version-specific tests (registry contiguity,
  future version, non-integer, etc.). Add 7 codemod-runner tests
  (empty registry, all-no-op, mixed, error context, idempotency).
  Black-box CLI tests reduced from 10 to 3 (deleted tests asserted
  on schema-version-specific behavior that no longer exists).

PR #476 has not merged; this lands as a follow-up commit on the
same branch so users never see the schema-version concept ship.

All 1349 tests pass. PR title and body to be updated separately.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@jamesadevine jamesadevine changed the title feat(compile): autorewrite front matter via versioned migration chain feat(compile): autorewrite front matter via detection-based codemods May 8, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

🔍 Rust PR Review

Summary: Looks good overall — the framework design is sound, error handling is thorough, and test coverage is comprehensive. Two concrete issues worth addressing before merge.

Findings

🐛 Bugs / Logic Issues

  • docs/codemods.md (line ~48) — documentation inaccuracy: The docs state "Front-matter key order is preserved by serde_yaml's insertion-ordered mapping." This is not accurate after a rename. rename_key calls m.remove(&old_key) then m.insert(new_key, old_value), and serde_yaml::Mapping::insert appends to the end when the key is new. The result is that any renamed key migrates to the bottom of the front-matter block in the rewritten source. For a codemod that renames the top-level name field, the user would see name: suddenly jump to the end of their YAML. The warning message and docs should acknowledge this: "Renamed keys may move to the end of the front-matter block."

  • src/compile/mod.rs (line ~256) — naming clash: The private helper perform_source_rewrite_if_needed has a parameter named codemods: &codemods::CodemodReport that shadows the sibling module codemods. Inside the function body the module isn't used so there's no functional breakage, but a future author adding a codemods::something() call inside the function will get a confusing type error. Renaming the parameter to report or codemod_report removes the ambiguity.

⚠️ Suggestions

  • src/compile/codemods/mod.rscodemod_filenames_match_registry_count test: Currently both the file count and CODEMODS.len() are 0, so the test trivially passes. It will prove its worth once real codemods land — that's fine — but consider noting in the test comment that the assertion is vacuously true until the first codemod is registered, so future reviewers don't mistake a green test for a meaningful invariant right now.

  • src/compile/common.rsparse_markdown_detailed_with_registry is pub(crate) but production callers go through parse_markdown_detailed. The test hook is appropriately scoped, good pattern.

  • src/compile/codemods/helpers.rs#[allow(dead_code)] on take_key, insert_no_overwrite, rename_key, ConflictPolicy: All justified by the empty registry at launch. Might be worth a comment on the pub use re-exports in mod.rs noting these become live once the first codemod lands, so a future reviewer doesn't strip the attributes prematurely.

✅ What Looks Good

  • Atomic write implementation is correct: tempfile created in the destination's parent directory (same filesystem → rename is atomic), fsync before rename, Unix mode preserved, symlinks replaced not followed.
  • Lost-update guard (SHA-256 snapshot at parse → re-read before rename) is a clean defense against the TOCTOU window. The error message is actionable.
  • Three-stage parse (untyped Mapping → codemods → typed FrontMatter) elegantly avoids the "can't even parse deprecated fields" problem without requiring schema-version stamps.
  • CodemodContext is #[non_exhaustive] — correct forward-compatibility choice; adding context fields later won't break existing codemod fn signatures.
  • rename_key is transactional on the error path — the mapping is unchanged when ConflictPolicy::Error fires, so chained codemods can't observe a half-mutated state.
  • execute (Stage 3) in-memory-only behavior with log::warn! instead of source rewrite is the right call for a non-source-of-truth working tree.
  • The test layering (unit → white-box integration with stub registry → black-box CLI subprocess) is thorough and well-structured.

Generated by Rust PR Reviewer for issue #476 · ● 941.4K ·

Two findings + two suggestions from the latest Rust PR Reviewer pass:

1. Naming clash in `perform_source_rewrite_if_needed`: the
   parameter `codemods: &codemods::CodemodReport` shadowed the
   sibling `codemods` module. Functionally fine (the module isn't
   used inside the function body) but a future caller adding a
   `codemods::something()` reference would hit a confusing type
   error. Renamed to `report`.

2. `docs/codemods.md` claimed "Front-matter key order is preserved"
   without qualification. Renamed keys move to the END of the
   front-matter block because `Mapping::insert` appends new keys
   when absent. Updated the doc to call this out, and added the
   same note to the stderr warning emitted when codemods rewrite
   the source.

3. Marked the two registry-health tests
   (`registry_ids_are_unique`, `codemod_filenames_match_registry_count`)
   as vacuously true while CODEMODS is empty, so future reviewers
   don't mistake the green test for a meaningful invariant today.

4. Added a comment on the `pub use helpers::*` re-export and on
   the helper-level `#[allow(dead_code)]`s explaining they're
   defensive against the empty-registry warning surface and should
   be stripped once the first real codemod lands.

Plus, while in the docs, added an "unsanitized input" invariant
to the codemod author contract: the runner sanitizes the typed
`FrontMatter` AFTER codemods run, so the raw `Mapping` codemods
receive is whatever the user wrote. Codemods should treat values
as opaque rather than parse/interpolate them.

All 1349 tests pass. No behavior changes outside the docs and the
warning text.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

🔍 Rust PR Review

Summary: Looks good overall — the framework is well-designed and well-tested. Two issues worth addressing before merge.


Findings

🐛 Bugs / Logic Issues

  • src/compile/common.rs line 1420-1421 — ambiguous error context after codemod stage

    The typed deserialization in Stage 3 uses the same .context("Failed to parse YAML front matter") message as the Stage 1 untyped parse. If a codemod produces a Mapping that fails typed deserialization (i.e. outputs an unrecognized shape), the user sees the same error they'd see for malformed raw YAML — with no indication that a codemod is at fault. The comment on that block even notes "errors here mean either the user wrote an unsupported shape or a codemod produced invalid output", so the distinction is known but not surfaced.

    Suggested fix:

    serde_yaml::from_value(serde_yaml::Value::Mapping(mapping.clone()))
        .context("Failed to parse YAML front matter (after codemods ran; \
                  check that the applied codemod produced a valid shape)")?;
  • src/compile/codemods/helpers.rs rename_key with PreferNew — triggers source rewrite when new key is unchanged

    When both old and new keys are present and ConflictPolicy::PreferNew is chosen, the function drops old and returns Ok(true). That is logically correct (the mapping was mutated: old was removed), but any codemod that uses PreferNew will fire a source rewrite warning even when the new value was already the current correct value and the user just had a stale leftover old key. This is arguably the right behavior (removing the stale key is a real change), but it should be called out explicitly in the rename_key doc comment so codemod authors don't assume Ok(true) means "we migrated something" vs "we cleaned up a remnant". The current doc only says "Returns Ok(true) when the rename happened (regardless of policy branch)".


⚠️ Suggestions

  • src/compile/codemods/{mod.rs,helpers.rs} — 8+ #[allow(dead_code)] annotations will accumulate

    The comment explains they should be removed when the first real codemod lands, but there's no enforcement mechanism. Consider a TODO(codemods): remove when first real codemod is registered so search-based cleanup remains easy, or link a tracking issue.

  • src/compile/common.rs reconstruct_sourceserde_yaml::to_string on an empty Mapping emits {}\n

    If codemods somehow reduce a mapping to empty, reconstruct_source produces ---\n{}\n---<body>, which is valid YAML but will fail FrontMatter deserialization (missing required name/description). Not a security issue, but the error that surfaces downstream says "Failed to parse YAML front matter" again (see issue 1). Low priority given required fields prevent an empty mapping from reaching this path normally.

  • tests/codemod_tests.rs fresh_temp_dir — leaks temp dirs on test failure

    The white-box integration tests correctly use tempfile::TempDir (RAII cleanup on panic). The CLI tests use a PathBuf-returning fresh_temp_dir with explicit fs::remove_dir_all at end of each test — these leak on assertion failure. Minor inconsistency with the existing pattern.


✅ What Looks Good

  • atomic_write: tempfile + rename + fsync + Unix mode preservation is exactly right. Better than the tokio::fs::write it replaces for the .lock.yml.
  • Lost-update guard: SHA-256 snapshot at parse time + re-read before rename is the correct approach. The error message is actionable.
  • Three-stage parse: untyped → codemods → typed is the correct layering so renamed/removed fields don't cause deserialization errors before codemods get a chance to fix them.
  • rename_key transactional invariant: mapping is unchanged on the error path. Well-tested.
  • #[non_exhaustive] on CodemodContext: good forward-compat discipline — adding context fields won't be a breaking API change.
  • Empty registry ships without footguns: no schema-version stamp, no runtime overhead when nothing matches, and codemod_filenames_match_registry_count will catch the common "added file, forgot registry" mistake.

Generated by Rust PR Reviewer for issue #476 · ● 920.8K ·

Two findings + three suggestions from the latest Rust PR Reviewer pass:

1. Stage-3 typed-deserialize errors after codemods reused the same
   "Failed to parse YAML front matter" context as the Stage-1 raw
   parse, so a codemod that produced an invalid shape was
   indistinguishable from a user typo. Switch to with_context that
   surfaces the applied codemod ids when codemods ran:
   "Failed to parse YAML front matter after applying codemods (id1, id2);
    a codemod likely produced an invalid shape".

2. rename_key with ConflictPolicy::PreferNew returns Ok(true) when
   it drops a stale `old` key even though `new` is unchanged. That
   triggers a source-rewrite warning saying "we migrated" when
   really we "cleaned up a remnant". The behavior is correct (the
   mapping did mutate); the doc was misleading. Rewrote the
   rename_key doc comment to enumerate exactly when Ok(true) fires
   and call out the PreferNew/cleanup case.

3. Added `// TODO(codemods): remove when the first real codemod is
   registered.` markers next to every #[allow(dead_code)] /
   #[allow(unused_imports)] in src/compile/codemods/{mod,helpers}.rs
   so search-based cleanup is trivial when the registry stops being
   empty.

4. tests/codemod_tests.rs leaked temp dirs on assertion failure
   because `let _ = fs::remove_dir_all(&dir)` only ran on the happy
   path. Refactored to return tempfile::TempDir from the helpers
   and rely on RAII cleanup, matching the pattern already used by
   src/compile/codemod_integration_test.rs.

Skipped: the empty-mapping concern in reconstruct_source (a
mapping reduced to empty wouldn't deserialize as FrontMatter
anyway because of required name/description fields, so the
existing typed-deserialize error path catches it; #1 makes that
error informative).

All 1349 tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

🔍 Rust PR Review

Summary: Solid, well-structured framework — one real bug to fix before merge, plus a few minor observations.


Findings

🐛 Bugs / Logic Issues

  • src/compile/common.rs:53 — cross-device link failure for bare filenames

    When path has no directory component (e.g. "agent.md" in CWD), path.parent() returns Some("") which the filter strips to None, falling through to NamedTempFile::new(). That creates the tempfile in std::env::temp_dir() (/tmp), not in .. When /tmp is a tmpfs mount (common on Linux), tmp.persist(path) will fail with EXDEV ("Invalid cross-device link") because rename cannot cross filesystems.

    The doc-comment already states the correct intent ("falls back to ."), but the code doesn't implement it:

    // Bug:
    None => tempfile::NamedTempFile::new()
        .context("failed to create temporary file in current directory")?,
    
    // Fix:
    None => tempfile::NamedTempFile::new_in(".")
        .context("failed to create temporary file in current directory")?,

    In practice today this only fires for bare filenames without a path prefix, but the test suite doesn't cover this case and the docstring contracts it.

⚠️ Suggestions

  • src/compile/common.rsreconstruct_source relies on an undocumented serde_yaml invariant

    The function assumes serde_yaml::to_string always produces a trailing \n (so the output is ---\n{yaml}\n---{body_raw}). This holds for all non-empty mappings today, but it's an implementation detail rather than a guaranteed API contract. A one-line defensive assertion or ensure! would make this explicit and catch a regression immediately if serde_yaml changes behaviour:

    ensure!(
        yaml_serialized.ends_with('\n'),
        "serde_yaml::to_string produced output without trailing newline (unexpected)"
    );
  • src/compile/codemods/mod.rs:95CODEMODS type carries a redundant lifetime

    pub static CODEMODS: &[&'static Codemod] = &[];

    The outer & on a static is itself 'static, so &'static [&'static Codemod] would be unambiguous and match the type used in apply_codemods_with's parameter (&[&'static Codemod]). Not a correctness issue, but the double-'static reads as noise. Minor.

✅ What Looks Good

  • Three-stage parse order (untyped Mapping → codemods → typed FrontMatter) correctly sequences deserialization so codemods can rewrite fields that no longer exist in the typed schema.
  • Compile-then-rewrite ordering: source is never touched on a failed compile, lock file is never touched on a failed source rewrite — the invariant chain is airtight.
  • Lost-update guard (SHA-256 snapshot at parse time, re-read before write) is the right level of protection for a developer-facing CLI tool.
  • rename_key is genuinely transactional — the mapping is left unchanged on the error path.
  • Test coverage is thorough: unit tests in helpers/mod, white-box integration tests via the crate-private registry hook, and black-box CLI subprocess tests. The idempotency test for apply_codemods_with is especially valuable.
  • CodemodContext is #[non_exhaustive] — the right call for a forward-compat context type with an empty initial shape.

Generated by Rust PR Reviewer for issue #476 · ● 1.8M ·

Two fixes from the latest Rust PR Reviewer pass:

1. atomic_write created the tempfile in std::env::temp_dir() when
   the destination path had no directory component (e.g. a bare
   "agent.md" run from cwd). On Linux when /tmp is a tmpfs mount,
   the subsequent persist() rename fails with EXDEV because rename
   cannot cross filesystems. Fix: when path.parent() returns
   None or Some("") (the bare-filename case), fall back to "." so
   the tempfile lives on the same filesystem as the destination.
   Adds a pure-logic regression test for the parent-dir derivation
   that doesn't mutate process cwd (would race with parallel
   tests).

2. reconstruct_source's `format!("---\n{}---{}", ...)` template
   relies on serde_yaml::to_string ending the output with `\n`.
   This holds in practice but is an undocumented invariant. Add a
   defensive `ensure!` so a future serde_yaml change surfaces a
   loud, actionable error rather than silently producing malformed
   YAML.

Skipped: the redundant `'static` annotation on `CODEMODS` (cosmetic,
reviewer marked as minor).

All 1350 tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 8, 2026

🔍 Rust PR Review

Summary: Looks good — well-engineered framework with sound design, comprehensive tests, and one minor suggestion for the helper API docs.

Findings

✅ What Looks Good

  • 3-stage parse is correct: untyped Mapping → codemods → typed FrontMatter is the right order; codemods must run before typed deserialization so they can rewrite shapes that have been removed from the schema. The implementation in parse_markdown_detailed_with_registry cleanly separates these stages.

  • Lost-update guard: Re-reading and re-hashing the file before the atomic rewrite is the right defense against races with editors/concurrent tools. The guard fires correctly and leaves the source byte-identical in the contested case (test perform_source_rewrite_lost_update_guard verifies this).

  • Atomic write semantics: Tempfile-in-same-dir + persist guarantees same-filesystem rename on Linux. Mode preservation on Unix is handled. The comment about EXDEV is accurate and appreciated.

  • compile_pipeline_inner ordering: Source rewrite is performed before the .lock.yml write. This means a crash between the two leaves a newly-rewritten source with a stale lock file — a benign state (re-running compile fixes it) — rather than an updated lock pointing at an unmigrated source.

  • check failure on pending codemods: Failing check when codemods would fire is the right call; it forces compile to be run, which is the only command that holds the write token.

  • #[non_exhaustive] on CodemodContext: Correct forward-compat move. Future codemods can receive context (e.g. source path) without a breaking API change.

  • Error handling: Every fallible function returns anyhow::Result. All error paths have .context() with actionable messages. No unwrap() on user-facing paths.

  • Test coverage: Four layers (unit helpers, unit runner, white-box integration, black-box CLI subprocess) with good edge case coverage — idempotency, lost-update, no-op skip, non-mapping rejection.

⚠️ Suggestions

  • helpers.rs rename_key — key ordering not documented: When rename_key moves old → new, it calls m.remove(&old_key) followed by m.insert(new_key, old_value). Because serde_yaml::Mapping is backed by IndexMap, the insertion appends to the end — the renamed key will always land last in the serialized output, regardless of where old appeared. This is already disclosed in the user-facing eprintln! warning ("Renamed keys may move to the end"), but the rename_key doc comment itself doesn't mention it. Codemod authors consulting the helper API will be surprised. Worth adding one sentence to the doc:

    /// Note: when `old` is present, the resulting `new` key is appended
    /// at the end of the mapping (IndexMap semantics). If key ordering
    /// matters, callers must re-insert surrounding keys manually.
    
  • reconstruct_source — no test for empty mapping: serde_yaml::to_string on an empty Mapping produces {}\n (inline form), not the block form the comment assumes. The ensure! guard for the trailing newline would pass, but the output would be ---\n{}\n---<body> which differs from the expected ---\n---<body>. In practice this can't happen (typed FrontMatter requires name and description), but a doc note or debug_assert!(!front_matter_mapping.is_empty()) would make the invariant explicit.

Generated by Rust PR Reviewer for issue #476 · ● 978.9K ·

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant