diff --git a/AGENTS.md b/AGENTS.md index 02de6d1f..b02514f5 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -61,6 +61,10 @@ Every compiled pipeline runs as three sequential jobs: │ │ │ ├── safe_outputs.rs # Always-on SafeOutputs MCP extension │ │ │ ├── trigger_filters.rs # Trigger filter extension (gate evaluator delivery) │ │ │ └── tests.rs # Extension integration tests +│ │ ├── codemods/ # Front-matter codemods (one file per transformation) +│ │ │ ├── mod.rs # Codemod struct, CODEMODS registry, runner +│ │ │ └── helpers.rs # take_key, insert_no_overwrite, rename_key, ConflictPolicy +│ │ ├── codemod_integration_test.rs # White-box rewrite-path tests (stub registry injection) │ │ └── types.rs # Front matter grammar and types │ ├── init.rs # Repository initialization for AI-first authoring │ ├── execute.rs # Stage 3 safe output execution @@ -192,6 +196,10 @@ index to jump to the right page. - [`docs/filter-ir.md`](docs/filter-ir.md) — filter expression IR specification: `Fact`/`Predicate` types, three-pass compilation (lower → validate → codegen), gate step generation, adding new filter types. +- [`docs/codemods.md`](docs/codemods.md) — front-matter codemod + framework: detection-based transformations, automatic source + rewrite on breaking-change updates, contributor workflow for + adding codemods. - [`docs/local-development.md`](docs/local-development.md) — local development setup notes. diff --git a/docs/codemods.md b/docs/codemods.md new file mode 100644 index 00000000..7c91e17e --- /dev/null +++ b/docs/codemods.md @@ -0,0 +1,386 @@ +# Front-matter Codemods + +_Part of the [ado-aw documentation](../AGENTS.md)._ + +The `ado-aw` compiler keeps user front matter forward-compatible +through **codemods**: small, detection-based transformations that +rewrite deprecated front-matter shapes to the current shape during +`compile`. The model is borrowed from +[gh-aw's codemod registry](https://github.com/github/gh-aw/blob/main/pkg/cli/fix_codemods.go). + +This page is the reference for both users (what codemods mean for me) +and contributors (how to add one). + +## How it works + +### No version field on user files + +Codemods don't stamp anything onto user front matter. There is no +`schema-version` field. Each codemod inspects the mapping, decides +whether the input contains the deprecated shape it knows about, and +either rewrites it or returns "no-op". The whole registry runs on +every compile; codemods that don't match are essentially free. + +### The compile flow + +1. `ado-aw compile` reads the source `.md` and parses the front + matter as an **untyped** `serde_yaml::Mapping`. This step never + trips on removed/renamed fields. +2. The runner walks the registered codemod registry in order. Each + codemod returns `Ok(true)` if it modified the mapping, `Ok(false)` + if it didn't, or `Err` to abort the whole compile. +3. The compiler runs all the usual validation and codegen against + the rewritten, typed `FrontMatter`. +4. **Only on a fully successful compile** AND **only if at least one + codemod fired**, the source `.md` is atomically rewritten and a + warning prints to stderr. A failed compile leaves the source + untouched. +5. The `.lock.yml` is written atomically last. + +### What gets preserved on rewrite + +- **Body markdown** is preserved byte-for-byte (everything after + the closing `---`). +- **Leading whitespace** before the opening `---` is preserved + byte-for-byte (BOM-strippers and editor blank lines). +- **Front-matter key order** is preserved for keys the codemod + doesn't touch (`serde_yaml`'s mapping is insertion-ordered). + Renamed keys, however, move to the **end** of the front-matter + block: `Mapping::insert` appends new keys, so when a codemod + removes `old-key` and inserts `new-key`, the new key lands at + the bottom regardless of where the old one was. The compile + warning calls this out so users aren't surprised. +- **Front-matter comments** are NOT preserved. `serde_yaml` + round-trip drops them. The warning emitted on rewrite calls this + out so it isn't a surprise. If you have important context in a + front-matter comment, move it into the markdown body before + running compile. +- **Quote and scalar styles** in YAML may be normalized. This is + cosmetic. + +### Atomicity and lost-update protection + +The rewrite uses `tempfile + rename` for atomicity (no torn writes). +Before the rename, the runner re-reads the source and compares its +SHA-256 to the snapshot taken at parse time. If the file changed +between parse and rewrite, the runner aborts with a clear error +("source file ... changed during compilation; refusing to apply +codemods") rather than clobbering whoever wrote the file. + +### `check` command behavior + +`ado-aw check` exits non-zero when codemods would fire — there is no +opt-in flag and no warning-only mode. Rationale: compiled pipelines +download the **same** `ado-aw` version that produced them +(`src/data/base.yml`, `src/data/1es-base.yml`), so the in-pipeline +integrity check is internally consistent by construction. The only +time `check` sees pending codemods is when a developer runs a newer +`ado-aw` locally against an older source — exactly when we want to +fail loudly. The fix is `ado-aw compile`, which applies the codemods +in place. + +### `execute` command behavior + +The Stage 3 executor runs codemods in memory only. It never rewrites +the source (the executor's working tree is not the source-of-truth +tree). When at least one codemod would fire, it logs a warning and +continues. + +## Adding a codemod + +You need a codemod whenever you introduce a breaking change to the +front-matter grammar: + +- Renaming a field +- Removing a field +- Changing a field's type or shape +- Adding a required field that didn't exist before +- Changing the meaning of an existing field + +Non-breaking changes (adding an optional field, accepting a new +variant) do **not** need a codemod. + +### File layout + +Codemods live in `src/compile/codemods/`: + +``` +src/compile/codemods/ +├── mod.rs # Framework + CODEMODS registry +├── helpers.rs # take_key, insert_no_overwrite, rename_key, ConflictPolicy +├── 0001_engine_id_split.rs +├── 0002_permissions_field.rs +└── 0003_safeoutput_renames.rs +``` + +The filename prefix is a zero-padded sequence number (``). It +sorts files naturally in directory listings; the **registry order** +in `mod.rs` is what determines runtime order, not the filename. + +### Anatomy of a codemod + +```rust +// src/compile/codemods/0001_engine_id_split.rs + +use anyhow::{bail, Result}; +use serde_yaml::{Mapping, Value}; + +use super::{Codemod, CodemodContext}; + +pub static CODEMOD: Codemod = Codemod { + id: "engine_id_split", + summary: "engine: -> engine: { id: copilot, model: }", + introduced_in: "0.27.0", + apply: apply_codemod, +}; + +fn apply_codemod(fm: &mut Mapping, _ctx: &CodemodContext) -> Result { + // ... your detection-based transformation ... + // Return Ok(true) if the mapping was modified, Ok(false) if the + // shape didn't match (no-op), or Err for a hard failure. +} + +#[cfg(test)] +mod tests { + use super::*; + + #[test] + fn rewrites_legacy_shape() { /* before / after fixture */ } + + #[test] + fn already_current_shape_is_noop() { /* defensive */ } + + #[test] + fn missing_field_is_noop() { /* defensive */ } + + #[test] + fn unexpected_shape_is_rejected() { /* hard error */ } +} +``` + +### Registry append + +Two edits in `src/compile/codemods/mod.rs`: + +```rust +mod m0001_engine_id_split; // <-- add module declaration + +pub static CODEMODS: &[&'static Codemod] = &[ + &m0001_engine_id_split::CODEMOD, // <-- append at the end +]; +``` + +Tests in `mod.rs` enforce that codemod ids are unique and that the +file count matches the registry length. A malformed registry fails +fast at runtime via `with_context`. + +### Author contract (invariants) + +Every codemod must satisfy these properties. We enforce them via +review + per-codemod tests: + +1. **Idempotent.** Running twice produces the same final state as + running once. The runner re-runs the entire registry every + compile — codemods that no longer apply must be no-ops. +2. **Detection-based.** Returns `Ok(false)` and leaves the mapping + untouched when the input does not contain the targeted shape. + Never modifies a mapping that's already current. +3. **Conflict-aware.** When both old and new shapes coexist in the + same source, error with a "manual migration required" message + rather than guess. Use `helpers::rename_key` with + `ConflictPolicy::Error` (the default) to get this for free. +4. **Pure.** No I/O, no env, no time/randomness. (Convention; not + type-enforced.) +5. **Mapping-only.** Cannot inspect the markdown body, the file + path, the lock file, or git state. +6. **Order-aware.** If codemod B depends on shapes produced by + codemod A, A must precede B in the registry. Document the + ordering requirement in B's doc comment. +7. **Receives unsanitized input.** The compiler runs sanitization + (`##vso[` neutralization, control-character stripping, length + limits) on the typed `FrontMatter` *after* codemods run, but the + raw `Mapping` you receive is whatever the user wrote — including + any pipeline-injection attempts, control characters, or + over-length strings. Codemods should therefore treat values as + opaque (move them around, wrap them in objects, etc.) rather + than parse or interpolate them. If a codemod must inspect a + value, treat it defensively. + +### Use the helpers + +Codemods should prefer `helpers::*` over raw `Mapping` manipulation: + +- `take_key(map, "old")` — remove and return. +- `insert_no_overwrite(map, "new", value)` — error on conflict. +- `rename_key(map, "old", "new", ConflictPolicy::Error)` — default + policy is **error**, never silent overwrite. The helper is + transactional: on the error path the mapping is unchanged. + +`rename_key` returns `Result` directly, so it composes with a +codemod's return value: + +```rust +fn apply_codemod(fm: &mut Mapping, _ctx: &CodemodContext) -> Result { + rename_key(fm, "old-key", "new-key", ConflictPolicy::Error) +} +``` + +### What if my change can't be expressed as a Mapping rewrite? + +The codemod `apply` function receives only the front-matter mapping. +It cannot inspect the markdown body, the file path, the lock file, +or git state. If your change requires that information, do not +write a codemod that guesses. Instead, return an `Err` with an +actionable "manual migration required: " message so +the user knows exactly what to fix. + +## Worked example: `engine_id_split` + +This is the codemod that would have caught the `0.17.0` breaking +change (`engine: ` → `engine: { id: copilot, model: }`). +Drop-in template for your own codemod. + +```rust +// src/compile/codemods/0001_engine_id_split.rs + +//! engine: -> engine: { id: copilot, model: } +//! +//! Before 0.17.0 the `engine` field was a model name (e.g. +//! `engine: claude-opus-4.5`). The grammar changed to use engine +//! identifiers (`engine: copilot`), with the model nested in an +//! object form (`engine: { id: copilot, model: }`). + +use anyhow::{bail, Result}; +use serde_yaml::{Mapping, Value}; + +use super::{Codemod, CodemodContext}; + +pub static CODEMOD: Codemod = Codemod { + id: "engine_id_split", + summary: "engine: -> engine: { id: copilot, model: }", + introduced_in: "0.27.0", + apply: apply_codemod, +}; + +/// Engine identifiers that are valid as the simple-form string. When +/// `engine` is a string equal to one of these, the source is already +/// using the current grammar and we leave it alone. +const KNOWN_ENGINE_IDS: &[&str] = &["copilot"]; + +fn apply_codemod(fm: &mut Mapping, _ctx: &CodemodContext) -> Result { + let key = Value::String("engine".to_string()); + + let Some(engine) = fm.get(&key) else { + // No engine field at all — relies on default (copilot) in + // both old and new grammars. Nothing to do. + return Ok(false); + }; + + match engine { + // Already the object form: nothing to migrate. + Value::Mapping(_) => Ok(false), + + // Simple-form string with a known engine identifier + // (e.g. `copilot`): already current. No-op. + Value::String(s) if KNOWN_ENGINE_IDS.contains(&s.as_str()) => Ok(false), + + // Simple-form string that is NOT a known engine identifier: + // it's a *model name* from the old grammar. Wrap in object + // form. + Value::String(s) => { + let model = s.clone(); + let mut object = Mapping::new(); + object.insert( + Value::String("id".to_string()), + Value::String("copilot".to_string()), + ); + object.insert( + Value::String("model".to_string()), + Value::String(model), + ); + fm.insert(key, Value::Mapping(object)); + Ok(true) + } + + // Unexpected shape (number, bool, sequence, …). Refuse rather + // than guess — the user needs to fix this by hand. + other => bail!( + "engine field has unexpected shape (expected string or mapping, \ + got {}); manual migration required", + describe(other) + ), + } +} + +fn describe(v: &Value) -> &'static str { + match v { + Value::Null => "null", + Value::Bool(_) => "bool", + Value::Number(_) => "number", + Value::String(_) => "string", + Value::Sequence(_) => "sequence", + Value::Mapping(_) => "mapping", + Value::Tagged(_) => "tagged", + } +} +``` + +### What this example illustrates + +1. **Detection-first.** Each match arm decides whether the input + matches the codemod's target shape. Three of the four arms + return `Ok(false)` — those are the cases where the source is + already current. +2. **Idempotent by construction.** Once the mapping is in object + form, the first match arm fires and returns `Ok(false)`. The + codemod can run on every compile without harm. +3. **Conflict-aware.** The `Value::Mapping(_)` arm is the + "already-migrated" case. We never overwrite an existing object + form. The `bail!` on unexpected shapes is the manual-migration + escape hatch. +4. **Two-line registry append:** + + ```rust + mod m0001_engine_id_split; + + pub static CODEMODS: &[&'static Codemod] = &[ + &m0001_engine_id_split::CODEMOD, + ]; + ``` + + That's it. The registry-uniqueness and filename-prefix tests + keep passing. + +## Tests + +The codemod framework is covered by three layers of tests: + +- **Unit tests** in `src/compile/codemods/{mod.rs,helpers.rs}` + cover registry health, helper edge cases, and (for shipped + codemods) individual `apply` functions. +- **White-box integration tests** in + `src/compile/codemod_integration_test.rs` exercise the rewrite + path end-to-end (parse → codemods → compile → atomic source + rewrite → lock-file write) using a stub codemod registry + injected via the crate-private `compile_pipeline_with_registry` + and `parse_markdown_detailed_with_registry` hooks. They live + inside `src/` because the production registry is empty and + integration tests in `tests/` cannot link against crate + internals. +- **Black-box CLI tests** in `tests/codemod_tests.rs` spawn the + compiled `ado-aw` binary as a subprocess and assert on the + user-visible behavior of `compile` and `check`. + +When you add a real codemod, ship the per-codemod before/after +fixtures alongside it in the codemod's own file +(`src/compile/codemods/_.rs`). + +## See also + +- [`docs/front-matter.md`](front-matter.md) — the full + front-matter grammar. +- [`docs/extending.md`](extending.md) — broader guidance for + adding features to the compiler, including the requirement to + add a codemod alongside any breaking front-matter change. +- [gh-aw's `pkg/cli/fix_codemods.go`](https://github.com/github/gh-aw/blob/main/pkg/cli/fix_codemods.go) — + the upstream codemod model. diff --git a/docs/extending.md b/docs/extending.md index f4a235f2..1d679d10 100644 --- a/docs/extending.md +++ b/docs/extending.md @@ -9,6 +9,9 @@ When extending the compiler: 1. **New CLI commands**: Add variants to the `Commands` enum in `main.rs` 2. **New compile targets**: Implement the `Compiler` trait in a new file under `src/compile/` 3. **New front matter fields**: Add fields to `FrontMatter` in `src/compile/types.rs` + - **Breaking changes** (renames, removals, type changes, added required fields) + require adding a codemod under `src/compile/codemods/` in the same PR. + See [`docs/codemods.md`](codemods.md). 4. **New template markers**: Handle replacements in the target-specific compiler (e.g., `standalone.rs` or `onees.rs`) 5. **New safe-output tools**: Add to `src/safeoutputs/` — implement `ToolResult`, `Executor`, register in `mod.rs`, `mcp.rs`, `execute.rs` 6. **New first-class tools**: Create `src/tools//` with `mod.rs` and `extension.rs` (CompilerExtension impl). Add `execute.rs` if the tool has Stage 3 runtime logic. Extend `ToolsConfig` in `types.rs`, add collection in `collect_extensions()` diff --git a/src/compile/codemod_integration_test.rs b/src/compile/codemod_integration_test.rs new file mode 100644 index 00000000..399abff9 --- /dev/null +++ b/src/compile/codemod_integration_test.rs @@ -0,0 +1,213 @@ +//! White-box integration tests for the front-matter codemod +//! framework. +//! +//! These tests exercise the rewrite path end-to-end (parse → +//! codemods → compile → atomic source rewrite → lock-file write) +//! using a stub codemod registry. They live inside `src/` because +//! they need access to crate-private hooks +//! (`compile_pipeline_with_registry`, +//! `perform_source_rewrite_if_needed`, +//! `common::parse_markdown_detailed_with_registry`) that the empty +//! production [`super::codemods::CODEMODS`] registry cannot +//! exercise. +//! +//! Black-box subprocess tests of the user-visible CLI behavior live +//! in [`tests/codemod_tests.rs`](../../tests/codemod_tests.rs). + +#![cfg(test)] + +use super::codemods::{Codemod, CodemodContext, ConflictPolicy}; +use super::common; +use super::{compile_pipeline_with_registry, perform_source_rewrite_if_needed}; +use anyhow::Result; +use serde_yaml::Mapping; + +// ─── Stub registry ────────────────────────────────────────────────────────── + +/// Stub codemod: renames the `legacy-name` top-level key to `name`. +/// Drives end-to-end rewrite tests without registering a real codemod. +static TEST_RENAME_LEGACY_NAME: Codemod = Codemod { + id: "test_rename_legacy_name", + summary: "rename `legacy-name` -> `name`", + introduced_in: "test", + apply: rename_legacy_name, +}; + +fn rename_legacy_name(fm: &mut Mapping, _ctx: &CodemodContext) -> Result { + super::codemods::rename_key(fm, "legacy-name", "name", ConflictPolicy::Error) +} + +/// Source whose typed deserialization would fail without a codemod: +/// it has `legacy-name` instead of the required `name` field. +fn stale_source() -> &'static str { + "---\nlegacy-name: my-agent\ndescription: test description\n---\n## Body\nHello.\n" +} + +fn write_temp_md(name: &str, content: &str) -> (tempfile::TempDir, std::path::PathBuf) { + let dir = tempfile::tempdir().expect("tempdir"); + let path = dir.path().join(name); + std::fs::write(&path, content).expect("write temp md"); + (dir, path) +} + +// ─── End-to-end rewrite path ──────────────────────────────────────────────── + +#[tokio::test] +async fn codemod_rewrites_stale_source_and_preserves_body() { + let (dir, source_path) = write_temp_md("agent.md", stale_source()); + + let registry: &[&'static Codemod] = &[&TEST_RENAME_LEGACY_NAME]; + let rewrote = compile_pipeline_with_registry( + &source_path.to_string_lossy(), + None, + true, // skip_integrity + false, // debug_pipeline + registry, + ) + .await + .expect("compile_pipeline_with_registry should succeed"); + + assert!(rewrote, "expected compile to report a source rewrite"); + + // Source file rewritten: contains `name: my-agent`, no + // `legacy-name`. + let rewritten = std::fs::read_to_string(&source_path).expect("read rewritten"); + assert!( + rewritten.contains("name: my-agent"), + "rewritten source missing `name: my-agent`:\n{}", + rewritten + ); + assert!( + !rewritten.contains("legacy-name"), + "rewritten source still contains legacy-name:\n{}", + rewritten + ); + + // Body region byte-identical to the original (everything after + // the closing `---`). + let orig = stale_source(); + let orig_body = &orig[orig.find("\n---\n").unwrap() + 5..]; + let new_body = &rewritten[rewritten.find("\n---\n").unwrap() + 5..]; + assert_eq!(orig_body, new_body, "body region not preserved byte-for-byte"); + + // Lock file generated. + let lock = source_path.with_extension("lock.yml"); + assert!(lock.exists(), "expected {} to exist", lock.display()); + + // Re-running parse with the same stub registry produces no + // further codemod fires — the rewrite moved the source to its + // current shape. + let after = std::fs::read_to_string(&source_path).unwrap(); + let parsed_again = + common::parse_markdown_detailed_with_registry(&after, registry).unwrap(); + assert!( + !parsed_again.codemods.changed(), + "expected post-rewrite source to require no further codemods, but \ + {} fired", + parsed_again.codemods.applied.len() + ); + + drop(dir); +} + +#[tokio::test] +async fn codemod_skip_when_no_stub_registry_runs() { + // With the empty production registry, a healthy source must + // compile without rewriting anything. + let healthy = "---\nname: x\ndescription: y\n---\nbody\n"; + let (dir, source_path) = write_temp_md("agent.md", healthy); + let registry: &[&'static Codemod] = &[]; + let rewrote = compile_pipeline_with_registry( + &source_path.to_string_lossy(), + None, + true, + false, + registry, + ) + .await + .expect("compile should succeed"); + assert!(!rewrote, "expected no rewrite for source that needs no codemods"); + let after = std::fs::read_to_string(&source_path).unwrap(); + assert_eq!(after, healthy, "source must be byte-identical"); + drop(dir); +} + +#[tokio::test] +async fn codemod_no_op_returns_false_does_not_rewrite() { + // A codemod that always returns Ok(false) must not trigger a + // source rewrite even though it ran. + fn noop(_fm: &mut Mapping, _ctx: &CodemodContext) -> Result { + Ok(false) + } + static NOOP_MIG: Codemod = Codemod { + id: "test_noop_returns_false", + summary: "no-op", + introduced_in: "test", + apply: noop, + }; + + let healthy = "---\nname: x\ndescription: y\n---\nbody\n"; + let (dir, source_path) = write_temp_md("agent.md", healthy); + let registry: &[&'static Codemod] = &[&NOOP_MIG]; + let rewrote = compile_pipeline_with_registry( + &source_path.to_string_lossy(), + None, + true, + false, + registry, + ) + .await + .expect("compile should succeed"); + assert!( + !rewrote, + "no-op codemod that returns Ok(false) must not trigger a rewrite" + ); + let after = std::fs::read_to_string(&source_path).unwrap(); + assert_eq!(after, healthy, "source must be byte-identical"); + drop(dir); +} + +// ─── Lost-update guard ────────────────────────────────────────────────────── + +#[tokio::test] +async fn perform_source_rewrite_lost_update_guard() { + // Construct a parsed-source pointing at a file whose hash will + // not match (we mutate it after computing the hash). This + // exercises the lost-update guard directly without needing a + // full compile. + let (dir, source_path) = write_temp_md("agent.md", stale_source()); + let original = std::fs::read_to_string(&source_path).unwrap(); + let registry: &[&'static Codemod] = &[&TEST_RENAME_LEGACY_NAME]; + let parsed = + common::parse_markdown_detailed_with_registry(&original, registry).unwrap(); + assert!(parsed.codemods.changed()); + + // Mutate the source after parse (simulates editor / concurrent + // tool overwriting). + std::fs::write(&source_path, "different bytes\n").unwrap(); + + // Rewrite must refuse. + let result = perform_source_rewrite_if_needed( + &source_path, + &original, + &parsed.leading_whitespace, + &parsed.front_matter_mapping, + &parsed.body_raw, + &parsed.source_sha256, + &parsed.codemods, + ) + .await; + let err = result.expect_err("expected lost-update guard to fire"); + assert!( + format!("{:#}", err).contains("changed during compilation"), + "unexpected error: {:#}", + err + ); + + // The source is left as the interloper wrote it (we did not + // clobber); it is *not* the codemod-rewritten version. + let after = std::fs::read_to_string(&source_path).unwrap(); + assert_eq!(after, "different bytes\n"); + + drop(dir); +} diff --git a/src/compile/codemods/helpers.rs b/src/compile/codemods/helpers.rs new file mode 100644 index 00000000..59331fa2 --- /dev/null +++ b/src/compile/codemods/helpers.rs @@ -0,0 +1,236 @@ +//! Helpers for writing migrations against `serde_yaml::Mapping`. +//! +//! Migrations should prefer these over raw `Mapping` manipulation so that +//! conflicts (e.g. both old and new keys present) are surfaced rather than +//! silently overwritten. + +use anyhow::{bail, Result}; +use serde_yaml::{Mapping, Value}; + +/// Conflict policy used by [`rename_key`] when the destination key is +/// already present. +// TODO(codemods): remove when the first real codemod is registered. +#[allow(dead_code)] +#[derive(Debug, Clone, Copy)] +pub enum ConflictPolicy { + /// Default: error if `new` already exists when renaming `old → new`. + Error, + /// Keep the existing `new` value, drop `old`. + PreferNew, + /// Overwrite `new` with the value from `old`. + PreferOld, +} + +/// Remove and return the value at `key`, or `None` if absent. +// TODO(codemods): remove when the first real codemod is registered. +#[allow(dead_code)] +pub fn take_key(m: &mut Mapping, key: &str) -> Option { + m.remove(Value::String(key.to_string())) +} + +/// Insert `value` at `key`, returning `Err` if the key already exists. +/// +/// Prefer this over `Mapping::insert` in migrations: silent overwrite is +/// almost always a bug when transforming user data. +// TODO(codemods): remove when the first real codemod is registered. +#[allow(dead_code)] +pub fn insert_no_overwrite( + m: &mut Mapping, + key: &str, + value: Value, +) -> Result<()> { + if m.contains_key(Value::String(key.to_string())) { + bail!( + "refusing to overwrite existing key `{}` in front matter", + key + ); + } + m.insert(Value::String(key.to_string()), value); + Ok(()) +} + +/// Rename `old` → `new` according to `policy`. +/// +/// Returns `Ok(true)` when the mapping was mutated in any way: +/// +/// - the `old` key was moved to `new` (the typical rename), OR +/// - both keys were present with [`ConflictPolicy::PreferOld`] and +/// `new` was overwritten, OR +/// - both keys were present with [`ConflictPolicy::PreferNew`] and +/// the stale `old` key was dropped (note: the `new` value did not +/// change, but the mapping still mutated — codemod authors using +/// `PreferNew` should be aware that `Ok(true)` here means +/// "cleaned up a remnant," not "migrated semantic content"). +/// +/// Returns `Ok(false)` when `old` was absent (no-op). +/// +/// The mapping is left **unchanged** on the error path. Callers can +/// rely on this invariant when chaining migrations: a failed rename +/// won't leave the mapping in a half-mutated state for the next call +/// to inspect. +// TODO(codemods): remove when the first real codemod is registered. +#[allow(dead_code)] +pub fn rename_key( + m: &mut Mapping, + old: &str, + new: &str, + policy: ConflictPolicy, +) -> Result { + let old_key = Value::String(old.to_string()); + let new_key = Value::String(new.to_string()); + + if !m.contains_key(&old_key) { + return Ok(false); + } + let new_present = m.contains_key(&new_key); + + match (new_present, policy) { + (true, ConflictPolicy::Error) => { + // Surface the conflict without mutating the mapping. + bail!( + "refusing to rename `{}` -> `{}`: destination key already exists \ + (setting both old and new keys at once is ambiguous)", + old, + new + ); + } + (false, _) | (true, ConflictPolicy::PreferOld) => { + // Move old -> new, replacing any existing new value. + let old_value = m.remove(&old_key).expect("old key checked above"); + m.insert(new_key, old_value); + Ok(true) + } + (true, ConflictPolicy::PreferNew) => { + // Drop old, keep new. + m.remove(&old_key); + Ok(true) + } + } +} + +#[cfg(test)] +mod tests { + use super::*; + + fn map_with(pairs: &[(&str, &str)]) -> Mapping { + let mut m = Mapping::new(); + for (k, v) in pairs { + m.insert( + Value::String((*k).to_string()), + Value::String((*v).to_string()), + ); + } + m + } + + #[test] + fn take_key_present_returns_value_and_removes_it() { + let mut m = map_with(&[("a", "1"), ("b", "2")]); + let v = take_key(&mut m, "a").unwrap(); + assert_eq!(v, Value::String("1".to_string())); + assert!(!m.contains_key(Value::String("a".to_string()))); + assert!(m.contains_key(Value::String("b".to_string()))); + } + + #[test] + fn take_key_absent_returns_none() { + let mut m = map_with(&[("a", "1")]); + assert!(take_key(&mut m, "missing").is_none()); + assert!(m.contains_key(Value::String("a".to_string()))); + } + + #[test] + fn insert_no_overwrite_inserts_when_absent() { + let mut m = Mapping::new(); + insert_no_overwrite(&mut m, "k", Value::String("v".into())).unwrap(); + assert_eq!( + m.get(Value::String("k".into())), + Some(&Value::String("v".into())) + ); + } + + #[test] + fn insert_no_overwrite_errors_on_conflict() { + let mut m = map_with(&[("k", "v1")]); + let err = insert_no_overwrite(&mut m, "k", Value::String("v2".into())) + .unwrap_err(); + assert!( + format!("{}", err).contains("refusing to overwrite"), + "unexpected error message: {}", + err + ); + assert_eq!( + m.get(Value::String("k".into())), + Some(&Value::String("v1".into())) + ); + } + + #[test] + fn rename_key_old_absent_is_noop() { + let mut m = map_with(&[("b", "2")]); + let renamed = rename_key(&mut m, "a", "z", ConflictPolicy::Error).unwrap(); + assert!(!renamed); + assert!(!m.contains_key(Value::String("z".into()))); + } + + #[test] + fn rename_key_new_absent_moves_value() { + let mut m = map_with(&[("old", "v")]); + let renamed = rename_key(&mut m, "old", "new", ConflictPolicy::Error).unwrap(); + assert!(renamed); + assert!(!m.contains_key(Value::String("old".into()))); + assert_eq!( + m.get(Value::String("new".into())), + Some(&Value::String("v".into())) + ); + } + + #[test] + fn rename_key_new_present_with_error_policy_fails() { + let mut m = map_with(&[("old", "v_old"), ("new", "v_new")]); + let err = rename_key(&mut m, "old", "new", ConflictPolicy::Error).unwrap_err(); + assert!( + format!("{}", err).contains("destination key already exists"), + "unexpected error message: {}", + err + ); + // Both keys must remain intact — the helper guarantees the + // mapping is unchanged on the error path. + assert_eq!( + m.get(Value::String("old".into())), + Some(&Value::String("v_old".into())), + "old key must be preserved on error" + ); + assert_eq!( + m.get(Value::String("new".into())), + Some(&Value::String("v_new".into())), + "new key must be preserved on error" + ); + } + + #[test] + fn rename_key_new_present_with_prefer_new_drops_old() { + let mut m = map_with(&[("old", "v_old"), ("new", "v_new")]); + let renamed = + rename_key(&mut m, "old", "new", ConflictPolicy::PreferNew).unwrap(); + assert!(renamed); + assert!(!m.contains_key(Value::String("old".into()))); + assert_eq!( + m.get(Value::String("new".into())), + Some(&Value::String("v_new".into())) + ); + } + + #[test] + fn rename_key_new_present_with_prefer_old_overwrites() { + let mut m = map_with(&[("old", "v_old"), ("new", "v_new")]); + let renamed = + rename_key(&mut m, "old", "new", ConflictPolicy::PreferOld).unwrap(); + assert!(renamed); + assert!(!m.contains_key(Value::String("old".into()))); + assert_eq!( + m.get(Value::String("new".into())), + Some(&Value::String("v_old".into())) + ); + } +} diff --git a/src/compile/codemods/mod.rs b/src/compile/codemods/mod.rs new file mode 100644 index 00000000..6e308eca --- /dev/null +++ b/src/compile/codemods/mod.rs @@ -0,0 +1,349 @@ +//! Front-matter codemod framework. +//! +//! A flat, append-only registry of detection-based transformations +//! that rewrite deprecated front-matter shapes to current ones +//! before typed deserialization. Modeled on gh-aw's codemod registry +//! (`pkg/cli/fix_codemods.go`). +//! +//! See `docs/codemods.md` for the full contributor reference. In +//! short: +//! +//! - Each codemod is a pure function that **detects** an old shape +//! and rewrites it. When the input does not match, it returns +//! `Ok(false)` and leaves the mapping untouched. +//! - Codemods live in `src/compile/codemods/_.rs` and are +//! appended to [`CODEMODS`] in registration order. +//! - Codemods operate on the **untyped** `serde_yaml::Mapping` +//! representation so they can rewrite shapes that no longer match +//! the typed `FrontMatter` (renamed/removed fields, +//! `deny_unknown_fields`). +//! - Codemods MUST be **idempotent**: running twice produces the +//! same final state as running once. +//! - When an old key and a new key coexist in the same source, the +//! codemod errors with a "manual migration required" message +//! rather than guess. The [`helpers::rename_key`] + +//! [`helpers::ConflictPolicy::Error`] pattern gives this for free. +//! +//! Unlike a version-stamped chain, codemods leave no per-source +//! version footprint — there is no `schema-version` field in user +//! front matter. Each codemod is idempotent so re-running the whole +//! registry on every compile is cheap when nothing matches. + +use anyhow::{Context, Result}; +use serde_yaml::Mapping; + +mod helpers; +// `#[allow(unused_imports)]` here mirrors the per-item +// `#[allow(dead_code)]` annotations in `helpers.rs`. While the +// CODEMODS registry is empty, no in-crate caller exercises these +// re-exports — the lint warns. Once the first real codemod lands +// and uses one of these helpers, both the re-export attribute and +// the per-item `dead_code` allows on `helpers.rs` should be +// removed. +// TODO(codemods): remove when the first real codemod is registered. +#[allow(unused_imports)] +pub use helpers::{insert_no_overwrite, rename_key, take_key, ConflictPolicy}; + +/// Forward-compatible context passed to every codemod. Currently +/// empty; we keep it in the signature so future codemods can be +/// given (e.g.) the source path without breaking the function +/// pointer type. +#[non_exhaustive] +pub struct CodemodContext {} + +/// A single front-matter codemod. +/// +/// Codemods are pure functions that detect a specific deprecated +/// shape in the front-matter mapping and rewrite it to the current +/// shape. They must satisfy the following invariants (see +/// `docs/codemods.md`): +/// +/// - **Idempotent.** Running twice produces the same result as +/// running once. +/// - **Detection-based.** Returns `Ok(false)` and leaves the mapping +/// untouched when the input does not contain the targeted shape. +/// - **Conflict-aware.** Errors with an actionable "manual migration +/// required" message when both old and new shapes coexist. +/// - **Pure.** No I/O, no env, no time/randomness. +/// - **Mapping-only.** Cannot inspect the markdown body, file path, +/// lock file, or git state. +pub struct Codemod { + /// Stable snake_case identifier used in logs, errors, and tests. + pub id: &'static str, + /// Human-facing one-line summary, surfaced in the compile warning. + pub summary: &'static str, + /// Compiler version that introduced this codemod (e.g. "0.27.0"). + /// Provenance only; not consumed by the runner. + // TODO(codemods): remove when the first real codemod is registered. + #[allow(dead_code)] + pub introduced_in: &'static str, + /// The transformation. Returns `Ok(true)` when the codemod + /// modified the mapping; `Ok(false)` when the mapping does not + /// match this codemod's detection predicate. Errors propagate up + /// and abort the whole compile. + pub apply: fn(&mut Mapping, &CodemodContext) -> Result, +} + +/// The fixed registry of codemods. Append-only — new codemods land +/// at the end. Order matters when one codemod depends on another +/// having run first (e.g. A renames `foo` → `bar`, B operates on +/// `bar`); idempotency means any codemod can re-run on any source +/// without harm. +pub static CODEMODS: &[&'static Codemod] = &[]; + +/// Result of running the codemod registry on a single front-matter +/// mapping. +#[derive(Debug, Clone)] +pub struct CodemodReport { + /// Ordered list of codemods that ran (returned `Ok(true)`). + pub applied: Vec, +} + +/// Record of a single codemod that fired. Carries enough info for +/// warning emission and tests without re-querying the registry. +#[derive(Debug, Clone)] +pub struct AppliedCodemod { + pub id: &'static str, + pub summary: &'static str, +} + +impl CodemodReport { + /// An empty report (no codemod fired). + // TODO(codemods): remove when the first real codemod is registered. + #[allow(dead_code)] + pub fn empty() -> Self { + Self { + applied: Vec::new(), + } + } + + /// Returns true when at least one codemod ran. + pub fn changed(&self) -> bool { + !self.applied.is_empty() + } + + /// IDs of codemods that ran, in order. Helpful for tests. + // TODO(codemods): remove when the first real codemod is registered. + #[allow(dead_code)] + pub fn applied_ids(&self) -> Vec<&'static str> { + self.applied.iter().map(|a| a.id).collect() + } +} + +/// Apply the registered codemods to `fm`. +/// +/// Equivalent to [`apply_codemods_with`] called with the global +/// [`CODEMODS`] registry. +// TODO(codemods): remove when the first real codemod is registered. +#[allow(dead_code)] +pub fn apply_codemods(fm: &mut Mapping) -> Result { + apply_codemods_with(fm, CODEMODS) +} + +/// Apply an explicit codemod registry. Used by tests with a stub +/// registry; production code calls [`apply_codemods`]. +pub(crate) fn apply_codemods_with( + fm: &mut Mapping, + registry: &[&'static Codemod], +) -> Result { + let mut applied: Vec = Vec::new(); + for c in registry { + let changed = (c.apply)(fm, &CodemodContext {}) + .with_context(|| format!("codemod {} failed", c.id))?; + if changed { + applied.push(AppliedCodemod { + id: c.id, + summary: c.summary, + }); + } + } + Ok(CodemodReport { applied }) +} + +#[cfg(test)] +mod tests { + use super::helpers::ConflictPolicy; + use super::*; + use anyhow::bail; + use serde_yaml::Value; + + // ─── Registry health (operates on the real CODEMODS slice) ──────────── + + #[test] + fn registry_ids_are_unique() { + // Vacuously true while CODEMODS is empty; the assertion + // machinery still compiles so this test guards against + // duplicate ids the moment a real codemod ships. + let mut seen = std::collections::BTreeSet::new(); + for c in CODEMODS { + assert!( + seen.insert(c.id), + "duplicate codemod id `{}` in CODEMODS", + c.id + ); + } + } + + #[test] + fn codemod_filenames_match_registry_count() { + // Vacuously true while CODEMODS is empty (the directory + // contains only `mod.rs` and `helpers.rs`, which are + // skipped). Once a numeric `_.rs` file lands, this + // test asserts the registry was updated to match. + let dir = std::path::Path::new(env!("CARGO_MANIFEST_DIR")) + .join("src/compile/codemods"); + let mut numeric_files: Vec = Vec::new(); + for entry in std::fs::read_dir(&dir).expect("read codemods dir") { + let entry = entry.expect("dir entry"); + let path = entry.path(); + if path.extension().and_then(|e| e.to_str()) != Some("rs") { + continue; + } + let stem = path + .file_stem() + .and_then(|s| s.to_str()) + .expect("utf8 stem") + .to_string(); + if stem == "mod" || stem == "helpers" { + continue; + } + // Expect `_` so files sort naturally in + // directory listings. The registry order is the canonical + // application order; the prefix is purely cosmetic. + let (prefix, _rest) = stem.split_once('_').unwrap_or_else(|| { + panic!( + "codemod file {:?} does not match `_.rs`", + path + ) + }); + let _: u32 = prefix.parse().unwrap_or_else(|_| { + panic!( + "codemod file {:?} has non-numeric prefix `{}`", + path, prefix + ) + }); + numeric_files.push(stem); + } + assert_eq!( + numeric_files.len(), + CODEMODS.len(), + "number of `_.rs` files ({}) does not match \ + CODEMODS.len() ({}); files: {:?}", + numeric_files.len(), + CODEMODS.len(), + numeric_files + ); + } + + // ─── apply_codemods_with ────────────────────────────────────────────── + + fn noop_no_change(_fm: &mut Mapping, _ctx: &CodemodContext) -> Result { + Ok(false) + } + + fn rename_a_to_b(fm: &mut Mapping, _ctx: &CodemodContext) -> Result { + rename_key(fm, "a", "b", ConflictPolicy::Error) + } + + fn always_fail(_fm: &mut Mapping, _ctx: &CodemodContext) -> Result { + bail!("intentional test failure"); + } + + static TEST_CODEMOD_NOOP: Codemod = Codemod { + id: "test_noop", + summary: "no-op codemod for tests", + introduced_in: "test", + apply: noop_no_change, + }; + + static TEST_CODEMOD_RENAME: Codemod = Codemod { + id: "test_rename_a_to_b", + summary: "rename a -> b", + introduced_in: "test", + apply: rename_a_to_b, + }; + + static TEST_CODEMOD_FAIL: Codemod = Codemod { + id: "test_fail", + summary: "always fails", + introduced_in: "test", + apply: always_fail, + }; + + #[test] + fn empty_registry_produces_empty_report() { + let mut m: Mapping = serde_yaml::from_str("name: x\n").unwrap(); + let report = apply_codemods_with(&mut m, &[]).unwrap(); + assert!(!report.changed()); + assert!(report.applied.is_empty()); + } + + #[test] + fn all_no_op_codemods_produce_empty_report() { + let mut m: Mapping = serde_yaml::from_str("name: x\n").unwrap(); + let registry: &[&'static Codemod] = &[&TEST_CODEMOD_NOOP]; + let report = apply_codemods_with(&mut m, registry).unwrap(); + assert!(!report.changed()); + } + + #[test] + fn matching_codemod_appears_in_report() { + let mut m: Mapping = serde_yaml::from_str("a: 1\n").unwrap(); + let registry: &[&'static Codemod] = &[&TEST_CODEMOD_RENAME]; + let report = apply_codemods_with(&mut m, registry).unwrap(); + assert!(report.changed()); + assert_eq!(report.applied_ids(), vec!["test_rename_a_to_b"]); + assert!(!m.contains_key(Value::String("a".into()))); + assert!(m.contains_key(Value::String("b".into()))); + } + + #[test] + fn non_matching_codemod_omitted_from_report() { + // Source has no `a` key, so the rename is a no-op and is + // not listed in the report. + let mut m: Mapping = serde_yaml::from_str("name: x\n").unwrap(); + let registry: &[&'static Codemod] = &[&TEST_CODEMOD_RENAME]; + let report = apply_codemods_with(&mut m, registry).unwrap(); + assert!(!report.changed()); + } + + #[test] + fn mixed_registry_lists_only_changed_codemods() { + let mut m: Mapping = serde_yaml::from_str("a: 1\n").unwrap(); + let registry: &[&'static Codemod] = + &[&TEST_CODEMOD_NOOP, &TEST_CODEMOD_RENAME, &TEST_CODEMOD_NOOP]; + let report = apply_codemods_with(&mut m, registry).unwrap(); + assert_eq!(report.applied_ids(), vec!["test_rename_a_to_b"]); + } + + #[test] + fn codemod_failure_carries_id_in_context() { + let mut m: Mapping = serde_yaml::from_str("a: 1\n").unwrap(); + let registry: &[&'static Codemod] = &[&TEST_CODEMOD_FAIL]; + let err = apply_codemods_with(&mut m, registry).unwrap_err(); + let chain = format!("{:#}", err); + assert!( + chain.contains("codemod test_fail failed"), + "expected codemod context, got: {}", + chain + ); + assert!( + chain.contains("intentional test failure"), + "expected inner error, got: {}", + chain + ); + } + + #[test] + fn idempotent_codemod_runs_safely_twice() { + let registry: &[&'static Codemod] = &[&TEST_CODEMOD_RENAME]; + let mut m: Mapping = serde_yaml::from_str("a: 1\n").unwrap(); + let report1 = apply_codemods_with(&mut m, registry).unwrap(); + assert!(report1.changed()); + let report2 = apply_codemods_with(&mut m, registry).unwrap(); + assert!( + !report2.changed(), + "second run on already-migrated mapping must be a no-op" + ); + } +} diff --git a/src/compile/common.rs b/src/compile/common.rs index dfee1b57..32e882b4 100644 --- a/src/compile/common.rs +++ b/src/compile/common.rs @@ -12,27 +12,268 @@ use crate::allowed_hosts::{CORE_ALLOWED_HOSTS, mcp_required_hosts}; use crate::ecosystem_domains::{get_ecosystem_domains, is_ecosystem_identifier, is_known_ecosystem}; use crate::validate; -/// Parse the markdown file and extract front matter and body -pub fn parse_markdown(content: &str) -> Result<(FrontMatter, String)> { - let content = content.trim(); +/// Atomically write `contents` to `path`. +/// +/// Uses [`tempfile::NamedTempFile`] in the destination's parent +/// directory so the final `persist` is a same-filesystem rename. This +/// guarantees readers either see the old file or the new file in full — +/// never a half-written state. +/// +/// Behavior: +/// +/// - Creates the tempfile in `path.parent()` (falls back to `.` when +/// the parent is empty, matching `tokio::fs::write` semantics). +/// - On Unix, preserves the existing file's mode if the target exists. +/// Otherwise the tempfile keeps its default mode (0o600 from +/// `tempfile`'s implementation). +/// - When the destination is a symlink, the rename replaces the +/// symlink with a regular file (matches `tokio::fs::write`; the +/// symlink target is *not* followed). +pub async fn atomic_write(path: &Path, contents: &str) -> Result<()> { + let path = path.to_path_buf(); + let owned_contents = contents.to_string(); + // tempfile is sync; do the whole thing on a blocking task so we + // don't block the async runtime on large writes / fsync. + tokio::task::spawn_blocking(move || atomic_write_blocking(&path, &owned_contents)) + .await + .context("atomic_write task panicked")? +} + +fn atomic_write_blocking(path: &Path, contents: &str) -> Result<()> { + use std::io::Write; + + // Determine the directory to create the tempfile in. We MUST use + // a path on the same filesystem as the destination so that the + // final `persist` rename is atomic (otherwise it fails with + // EXDEV on Linux when /tmp is a separate tmpfs mount). + // + // - `path.parent() == Some(non-empty)` -> use that parent. + // - `path.parent() == Some("")` (bare filename like "agent.md") + // or `None` -> use the current directory ("."), which is the + // same filesystem as where the file will land. + let parent = path.parent().filter(|p| !p.as_os_str().is_empty()); + let parent_dir: &Path = parent.unwrap_or_else(|| Path::new(".")); + let mut tmp = tempfile::NamedTempFile::new_in(parent_dir).with_context(|| { + format!( + "failed to create temporary file in {}", + parent_dir.display() + ) + })?; + + tmp.write_all(contents.as_bytes()) + .with_context(|| format!("failed to write temporary file for {}", path.display()))?; + tmp.as_file() + .sync_all() + .with_context(|| format!("failed to fsync temporary file for {}", path.display()))?; + + // On Unix, copy the existing file's mode onto the tempfile so + // permissions are preserved across the atomic rename. + #[cfg(unix)] + { + use std::os::unix::fs::PermissionsExt; + if let Ok(meta) = std::fs::metadata(path) { + let mode = meta.permissions().mode(); + std::fs::set_permissions( + tmp.path(), + std::fs::Permissions::from_mode(mode), + ) + .with_context(|| { + format!( + "failed to copy permissions from {} to temp file", + path.display() + ) + })?; + } + } + + tmp.persist(path) + .with_context(|| format!("failed to atomically rename into {}", path.display()))?; + Ok(()) +} + +/// Detailed parse result. Holds enough information to rewrite the +/// source on disk byte-faithfully when codemods apply. +/// +/// See [`parse_markdown_detailed`]. +#[derive(Debug)] +pub struct ParsedSource { + /// Typed front matter, after codemods have been applied to the + /// underlying mapping. + pub front_matter: FrontMatter, + /// Body for compilation, with leading/trailing whitespace trimmed + /// (matches the legacy `parse_markdown` second tuple element). + pub markdown_body: String, + /// Codemod outcome. + pub codemods: super::codemods::CodemodReport, + /// The codemod-rewritten front-matter mapping. Used to + /// reconstruct the source for the on-disk rewrite. + pub front_matter_mapping: serde_yaml::Mapping, + /// Whitespace bytes that appeared before the opening `---` fence, + /// preserved verbatim so that source rewrite is byte-faithful. + /// Empty in the typical case where the file starts with `---`. + pub leading_whitespace: String, + /// The body region exactly as it appeared after the closing `---`, + /// byte-for-byte (no trim). Includes any leading newline. + pub body_raw: String, + /// SHA-256 of the original source bytes (lost-update protection). + pub source_sha256: [u8; 32], +} - if !content.starts_with("---") { +/// Parse the markdown file, run the codemod registry on the front +/// matter in memory, and return both the typed `FrontMatter` and the +/// raw fragments needed to rewrite the source on disk byte-faithfully. +/// +/// Use this from callers that may rewrite the source (the `compile` +/// command). Callers that only want the typed view of the front matter +/// should use the backward-compatible [`parse_markdown`] wrapper. +pub fn parse_markdown_detailed(content: &str) -> Result { + parse_markdown_detailed_with_registry(content, super::codemods::CODEMODS) +} + +/// Variant of [`parse_markdown_detailed`] that allows injecting an +/// explicit codemod registry. Used by tests; production callers go +/// through the no-arg version that reads the global +/// [`super::codemods::CODEMODS`]. +pub(crate) fn parse_markdown_detailed_with_registry( + content: &str, + registry: &[&'static super::codemods::Codemod], +) -> Result { + use sha2::Digest; + + // Lost-update protection: hash the raw input as it was provided, so + // a rewrite path can later re-read the file and compare. + let mut hasher = sha2::Sha256::new(); + hasher.update(content.as_bytes()); + let source_sha256: [u8; 32] = hasher.finalize().into(); + + // Allow leading whitespace before the opening fence (preserves + // historical leniency). We compute a byte offset into `content` so + // that `body_raw` extraction is purely byte-faithful, and we keep + // the whitespace prefix around so that source rewrites preserve + // anything the user (or their editor) put before the opening + // fence. + let leading_ws = content.bytes().take_while(|b| b.is_ascii_whitespace()).count(); + let leading_whitespace = content[..leading_ws].to_string(); + let after_lead = &content[leading_ws..]; + if !after_lead.starts_with("---") { anyhow::bail!("Markdown file must start with YAML front matter (---)"); } - // Find the closing --- - let rest = &content[3..]; - let end_idx = rest + let after_open = &after_lead[3..]; + let end_idx = after_open .find("\n---") .context("Could not find closing --- for front matter")?; - let yaml_content = &rest[..end_idx]; - let markdown_body = rest[end_idx + 4..].trim(); + let yaml_str = &after_open[..end_idx]; + let body_raw_slice = &after_open[end_idx + 4..]; + let body_raw = body_raw_slice.to_string(); + let markdown_body = body_raw_slice.trim().to_string(); + + // Stage 1: parse to untyped Value, reject non-mapping at top level. + let parsed_value: serde_yaml::Value = + serde_yaml::from_str(yaml_str).context("Failed to parse YAML front matter")?; + let mut mapping = match parsed_value { + serde_yaml::Value::Mapping(m) => m, + other => { + anyhow::bail!( + "YAML front matter must be a mapping/object, got {}", + yaml_value_kind(&other) + ); + } + }; + + // Stage 2: run the codemod registry against the untyped mapping. + let report = super::codemods::apply_codemods_with(&mut mapping, registry) + .context("Failed to apply codemods")?; + + // Stage 3: deserialize the (possibly modified) mapping into the + // typed FrontMatter. Errors here mean either the user wrote an + // unsupported shape or a codemod produced invalid output. The + // error context differs by case so the user can tell which. + let front_matter: FrontMatter = serde_yaml::from_value( + serde_yaml::Value::Mapping(mapping.clone()), + ) + .with_context(|| { + if report.changed() { + let ids = report.applied_ids().join(", "); + format!( + "Failed to parse YAML front matter after applying codemods ({}); \ + a codemod likely produced an invalid shape", + ids + ) + } else { + "Failed to parse YAML front matter".to_string() + } + })?; + + Ok(ParsedSource { + front_matter, + markdown_body, + codemods: report, + front_matter_mapping: mapping, + leading_whitespace, + body_raw, + source_sha256, + }) +} + +/// Reconstruct full source content from codemod outputs. +/// +/// Takes the individual fragments rather than the full +/// [`ParsedSource`] so callers that have already destructured the +/// parse don't have to round-trip a fresh `front_matter` through +/// serde just to satisfy the typed field. +/// +/// Output shape: +/// - `leading_whitespace` (typically empty) +/// - `---\n` +/// - the codemod-rewritten YAML mapping (`serde_yaml::to_string` +/// always ends with `\n`); the mapping's existing key order is +/// preserved so user-authored keys keep their original positions +/// - `---` +/// - the original body region byte-for-byte (`body_raw`) +pub fn reconstruct_source( + leading_whitespace: &str, + front_matter_mapping: &serde_yaml::Mapping, + body_raw: &str, +) -> Result { + let yaml_serialized = serde_yaml::to_string(front_matter_mapping) + .context("Failed to serialize codemod-rewritten front matter")?; + // Defensive: the format string assumes the serialized YAML ends + // with `\n` so the closing `---` lands on a new line. This is + // serde_yaml's documented behavior for non-empty mappings, but + // hard-fail loudly if a future version breaks the assumption + // rather than silently producing malformed YAML. + anyhow::ensure!( + yaml_serialized.ends_with('\n'), + "serde_yaml::to_string produced output without trailing newline; \ + cannot reconstruct front-matter block safely" + ); + Ok(format!( + "{}---\n{}---{}", + leading_whitespace, yaml_serialized, body_raw + )) +} - let front_matter: FrontMatter = - serde_yaml::from_str(yaml_content).context("Failed to parse YAML front matter")?; +fn yaml_value_kind(v: &serde_yaml::Value) -> &'static str { + match v { + serde_yaml::Value::Null => "null", + serde_yaml::Value::Bool(_) => "bool", + serde_yaml::Value::Number(_) => "number", + serde_yaml::Value::String(_) => "string", + serde_yaml::Value::Sequence(_) => "sequence", + serde_yaml::Value::Mapping(_) => "mapping", + serde_yaml::Value::Tagged(_) => "tagged", + } +} - Ok((front_matter, markdown_body.to_string())) +/// Backward-compatible parse: returns the typed front matter and the +/// trimmed body. New callers that may rewrite the source on disk +/// should use [`parse_markdown_detailed`] instead. +#[allow(dead_code)] +pub fn parse_markdown(content: &str) -> Result<(FrontMatter, String)> { + let parsed = parse_markdown_detailed(content)?; + Ok((parsed.front_matter, parsed.markdown_body)) } /// Replace a placeholder in the template, preserving the indentation for multi-line content. @@ -2293,6 +2534,214 @@ mod tests { fm } + // ─── atomic_write ───────────────────────────────────────────────────────── + + #[tokio::test] + async fn atomic_write_creates_new_file() { + let dir = tempfile::tempdir().unwrap(); + let path = dir.path().join("a.txt"); + atomic_write(&path, "hello\n").await.unwrap(); + let read = tokio::fs::read_to_string(&path).await.unwrap(); + assert_eq!(read, "hello\n"); + } + + #[tokio::test] + async fn atomic_write_overwrites_existing_file() { + let dir = tempfile::tempdir().unwrap(); + let path = dir.path().join("a.txt"); + tokio::fs::write(&path, "old contents").await.unwrap(); + atomic_write(&path, "new contents").await.unwrap(); + let read = tokio::fs::read_to_string(&path).await.unwrap(); + assert_eq!(read, "new contents"); + } + + #[cfg(unix)] + #[tokio::test] + async fn atomic_write_preserves_unix_mode() { + use std::os::unix::fs::PermissionsExt; + let dir = tempfile::tempdir().unwrap(); + let path = dir.path().join("a.txt"); + tokio::fs::write(&path, "old").await.unwrap(); + std::fs::set_permissions(&path, std::fs::Permissions::from_mode(0o640)).unwrap(); + atomic_write(&path, "new").await.unwrap(); + let mode = std::fs::metadata(&path).unwrap().permissions().mode(); + assert_eq!(mode & 0o777, 0o640, "expected mode 0o640, got {:o}", mode & 0o777); + } + + #[cfg(unix)] + #[tokio::test] + async fn atomic_write_replaces_symlink_with_regular_file() { + let dir = tempfile::tempdir().unwrap(); + let target = dir.path().join("target.txt"); + let link = dir.path().join("link.txt"); + tokio::fs::write(&target, "target").await.unwrap(); + std::os::unix::fs::symlink(&target, &link).unwrap(); + + atomic_write(&link, "via-link").await.unwrap(); + + // Link is now a regular file; target is unchanged. + let link_meta = std::fs::symlink_metadata(&link).unwrap(); + assert!( + !link_meta.file_type().is_symlink(), + "symlink should have been replaced" + ); + assert_eq!(tokio::fs::read_to_string(&link).await.unwrap(), "via-link"); + assert_eq!(tokio::fs::read_to_string(&target).await.unwrap(), "target"); + } + + #[test] + fn atomic_write_parent_derivation_handles_bare_filename() { + // Regression test for the EXDEV bug: a bare filename (no + // directory component) used to fall through to + // `NamedTempFile::new()` which creates the tempfile in the + // system temp dir, breaking persist() across filesystem + // boundaries (e.g. tmpfs `/tmp` on Linux). Verify the + // derivation logic now picks ".". Pure-logic test — no + // filesystem I/O so it's parallel-safe. + let bare = Path::new("agent.md"); + let parent = bare.parent().filter(|p| !p.as_os_str().is_empty()); + let parent_dir: &Path = parent.unwrap_or_else(|| Path::new(".")); + assert_eq!(parent_dir, Path::new(".")); + + let with_dir = Path::new("subdir/agent.md"); + let parent = with_dir.parent().filter(|p| !p.as_os_str().is_empty()); + let parent_dir: &Path = parent.unwrap_or_else(|| Path::new(".")); + assert_eq!(parent_dir, Path::new("subdir")); + + let absolute = Path::new("/tmp/agent.md"); + let parent = absolute.parent().filter(|p| !p.as_os_str().is_empty()); + let parent_dir: &Path = parent.unwrap_or_else(|| Path::new(".")); + assert_eq!(parent_dir, Path::new("/tmp")); + } + + // ─── parse_markdown_detailed ────────────────────────────────────────────── + + fn reconstruct(parsed: &ParsedSource) -> String { + reconstruct_source( + &parsed.leading_whitespace, + &parsed.front_matter_mapping, + &parsed.body_raw, + ) + .unwrap() + } + + #[test] + fn parse_markdown_detailed_preserves_body_byte_for_byte() { + // Case 1: trailing newline. + let original = "---\nname: x\ndescription: y\n---\nbody line\n"; + let parsed = parse_markdown_detailed(original).unwrap(); + assert_eq!(parsed.body_raw, "\nbody line\n"); + let reconstructed = reconstruct(&parsed); + // No migrations ran, so the YAML round-trip is the only + // structural change. The body region is byte-faithful. + assert!(reconstructed.ends_with("---\nbody line\n")); + + // Case 2: no trailing newline at all. + let original = "---\nname: x\ndescription: y\n---\nbody"; + let parsed = parse_markdown_detailed(original).unwrap(); + assert_eq!(parsed.body_raw, "\nbody"); + let reconstructed = reconstruct(&parsed); + assert!(reconstructed.ends_with("---\nbody")); + + // Case 3: empty body, but trailing newline. + let original = "---\nname: x\ndescription: y\n---\n"; + let parsed = parse_markdown_detailed(original).unwrap(); + assert_eq!(parsed.body_raw, "\n"); + let reconstructed = reconstruct(&parsed); + assert!(reconstructed.ends_with("---\n")); + + // Case 4: blank lines between closing fence and content are + // preserved as-is in body_raw. + let original = "---\nname: x\ndescription: y\n---\n\n\n## heading\n\nbody.\n"; + let parsed = parse_markdown_detailed(original).unwrap(); + assert_eq!(parsed.body_raw, "\n\n\n## heading\n\nbody.\n"); + } + + #[test] + fn parse_markdown_detailed_byte_faithful_when_no_codemod_runs() { + // With the registry empty, parsing a healthy source and + // reconstructing it should produce a byte-identical document + // apart from serde_yaml's canonical formatting of the YAML + // mapping. We assert the body region matches exactly. + let original = "---\nname: x\ndescription: y\n---\n## body\n"; + let parsed = parse_markdown_detailed(original).unwrap(); + assert!(!parsed.codemods.changed()); + let reconstructed = reconstruct(&parsed); + // Find the closing fence in both and compare the suffix. + let orig_suffix = &original[original.find("\n---\n").unwrap()..]; + let recon_suffix = &reconstructed[reconstructed.find("\n---\n").unwrap()..]; + assert_eq!(orig_suffix, recon_suffix, "body region must be byte-identical"); + } + + #[test] + fn parse_markdown_detailed_preserves_leading_whitespace() { + // Leading blank lines / spaces (e.g. from editor BOM-strippers + // adding a blank line) must round-trip through reconstruction + // so migration rewrites don't silently normalize them away. + let original = "\n \n---\nname: x\ndescription: y\n---\nbody\n"; + let parsed = parse_markdown_detailed(original).unwrap(); + assert_eq!(parsed.leading_whitespace, "\n \n"); + let reconstructed = reconstruct(&parsed); + assert!( + reconstructed.starts_with("\n \n---\n"), + "expected leading whitespace preserved, got: {:?}", + &reconstructed[..20.min(reconstructed.len())] + ); + assert!(reconstructed.ends_with("---\nbody\n")); + } + + #[test] + fn parse_markdown_detailed_allows_leading_whitespace() { + let original = "\n \n---\nname: x\ndescription: y\n---\nbody\n"; + let parsed = parse_markdown_detailed(original).unwrap(); + assert_eq!(parsed.front_matter.name, "x"); + } + + #[test] + fn parse_markdown_detailed_rejects_missing_open_fence() { + let original = "name: x\ndescription: y\nbody\n"; + let err = parse_markdown_detailed(original).unwrap_err(); + assert!( + format!("{}", err).contains("must start with YAML front matter"), + "got: {}", + err + ); + } + + #[test] + fn parse_markdown_detailed_rejects_non_mapping_top_level() { + let original = "---\n- a\n- b\n---\nbody\n"; + let err = parse_markdown_detailed(original).unwrap_err(); + assert!( + format!("{}", err).contains("must be a mapping"), + "got: {}", + err + ); + } + + #[test] + fn parse_markdown_detailed_rejects_missing_close_fence() { + let original = "---\nname: x\nno-fence"; + let err = parse_markdown_detailed(original).unwrap_err(); + assert!( + format!("{}", err).contains("Could not find closing"), + "got: {}", + err + ); + } + + #[test] + fn parse_markdown_detailed_records_source_hash() { + let a = "---\nname: x\ndescription: y\n---\n"; + let b = "---\nname: x\ndescription: y\n---\nextra\n"; + let pa = parse_markdown_detailed(a).unwrap(); + let pb = parse_markdown_detailed(b).unwrap(); + assert_ne!(pa.source_sha256, pb.source_sha256); + // Hashing is stable over re-parses of the same input. + let pa2 = parse_markdown_detailed(a).unwrap(); + assert_eq!(pa.source_sha256, pa2.source_sha256); + } + // ─── compute_effective_workspace ───────────────────────────────────────── #[test] diff --git a/src/compile/mod.rs b/src/compile/mod.rs index bad700eb..63bfa15a 100644 --- a/src/compile/mod.rs +++ b/src/compile/mod.rs @@ -10,6 +10,9 @@ mod common; pub mod extensions; pub(crate) mod filter_ir; mod gitattributes; +#[cfg(test)] +mod codemod_integration_test; +pub(crate) mod codemods; mod onees; pub(crate) mod pr_filters; mod standalone; @@ -20,7 +23,10 @@ use async_trait::async_trait; use log::{debug, info}; use std::path::{Path, PathBuf}; +#[allow(unused_imports)] pub use common::parse_markdown; +#[allow(unused_imports)] +pub use common::{atomic_write, parse_markdown_detailed, reconstruct_source, ParsedSource}; pub use common::HEADER_MARKER; pub use common::ADO_MCP_ENTRYPOINT; pub use common::ADO_MCP_IMAGE; @@ -57,7 +63,40 @@ pub async fn compile_pipeline( skip_integrity: bool, debug_pipeline: bool, ) -> Result<()> { - compile_pipeline_inner(input_path, output_path, skip_integrity, debug_pipeline, true).await + compile_pipeline_inner( + input_path, + output_path, + skip_integrity, + debug_pipeline, + true, + codemods::CODEMODS, + ) + .await + .map(|_rewrote| ()) +} + +/// Test-only entry point that injects a custom codemod registry. +/// +/// Production code goes through [`compile_pipeline`]. Tests use this +/// to drive end-to-end codemod scenarios without polluting the real +/// [`codemods::CODEMODS`] slice. +#[cfg(test)] +async fn compile_pipeline_with_registry( + input_path: &str, + output_path: Option<&str>, + skip_integrity: bool, + debug_pipeline: bool, + registry: &[&'static codemods::Codemod], +) -> Result { + compile_pipeline_inner( + input_path, + output_path, + skip_integrity, + debug_pipeline, + false, + registry, + ) + .await } /// Internal compile entry point that lets the caller opt out of the @@ -65,13 +104,17 @@ pub async fn compile_pipeline( /// (`compile_all_pipelines`) skip the per-pipeline sync to avoid an /// O(N²)-ish series of full-tree scans and instead perform a single sync /// after the whole batch completes. +/// +/// Returns whether the source `.md` was rewritten by the migration +/// pass. Batch callers use this to surface an aggregate count. async fn compile_pipeline_inner( input_path: &str, output_path: Option<&str>, skip_integrity: bool, debug_pipeline: bool, sync_gitattributes: bool, -) -> Result<()> { + registry: &[&'static codemods::Codemod], +) -> Result { let input_path = Path::new(input_path); info!("Compiling pipeline from: {}", input_path.display()); @@ -82,11 +125,23 @@ async fn compile_pipeline_inner( .with_context(|| format!("Failed to read input file: {}", input_path.display()))?; debug!("Input file size: {} bytes", content.len()); - let (mut front_matter, markdown_body) = parse_markdown(&content)?; + let parsed = common::parse_markdown_detailed_with_registry(&content, registry)?; + let mut front_matter = parsed.front_matter; + let markdown_body = parsed.markdown_body; + let codemod_report = parsed.codemods; + let front_matter_mapping = parsed.front_matter_mapping; + let leading_whitespace = parsed.leading_whitespace; + let body_raw = parsed.body_raw; + let source_sha256 = parsed.source_sha256; // Sanitize all front matter text fields before any further processing. // This neutralizes pipeline command injection (##vso[), strips control // characters, and enforces content limits across all config values. + // + // Note: sanitization mutates the typed FrontMatter in memory but does + // NOT touch the `front_matter_mapping` we keep around for source + // rewrite. That is intentional — we don't want to silently rewrite + // user source bytes via sanitize on disk. use crate::sanitize::SanitizeConfig; front_matter.sanitize_config_fields(); @@ -134,7 +189,8 @@ async fn compile_pipeline_inner( info!("Using {} compiler", compiler.target_name()); - // Compile + // Compile (no source mutation yet — a failure here must leave the + // source byte-identical). let pipeline_yaml = compiler .compile(input_path, &yaml_output_path, &front_matter, &markdown_body, skip_integrity, debug_pipeline) .await?; @@ -142,8 +198,29 @@ async fn compile_pipeline_inner( // Clean up spacing artifacts from empty placeholder replacements let pipeline_yaml = clean_generated_yaml(&pipeline_yaml); - // Write output - tokio::fs::write(&yaml_output_path, &pipeline_yaml) + // Source rewrite: only after a fully-successful compile. + // + // We perform this BEFORE writing the .lock.yml so that a partial + // state (rewritten source + stale lock file, or rewritten lock file + // pointing at unmigrated source) cannot escape: if rewrite fails, + // we abort before the lock file ever gets touched. + let mut rewrote = false; + if codemod_report.changed() { + rewrote = perform_source_rewrite_if_needed( + input_path, + &content, + &leading_whitespace, + &front_matter_mapping, + &body_raw, + &source_sha256, + &codemod_report, + ) + .await?; + } + + // Write output via atomic_write so a crash mid-write cannot leave a + // half-written .lock.yml on disk. + common::atomic_write(&yaml_output_path, &pipeline_yaml) .await .with_context(|| { format!( @@ -169,7 +246,72 @@ async fn compile_pipeline_inner( } } - Ok(()) + Ok(rewrote) +} + +/// Reconstruct the codemod-rewritten source, run the lost-update +/// guard, and atomically rewrite the source `.md` if the content +/// actually changed. Returns whether a write happened. +/// +/// On success, emits the documented stderr warning so users always +/// see when codemods were applied. This warning is *not* gated by +/// `--verbose`/`--debug`. +async fn perform_source_rewrite_if_needed( + input_path: &Path, + original_content: &str, + leading_whitespace: &str, + front_matter_mapping: &serde_yaml::Mapping, + body_raw: &str, + source_sha256: &[u8; 32], + report: &codemods::CodemodReport, +) -> Result { + let new_content = + common::reconstruct_source(leading_whitespace, front_matter_mapping, body_raw)?; + if new_content == original_content { + // Codemods ran but their net effect is a no-op on disk — skip + // the rewrite to avoid gratuitous diffs. + return Ok(false); + } + + // Lost-update guard: re-read the source and compare its hash to + // the snapshot we took when parsing. If anyone else mutated the + // file in the meantime, refuse to clobber their changes. + use sha2::Digest; + let current_bytes = tokio::fs::read(input_path).await.with_context(|| { + format!( + "Failed to re-read source for codemod safety check: {}", + input_path.display() + ) + })?; + let mut hasher = sha2::Sha256::new(); + hasher.update(¤t_bytes); + let current_hash: [u8; 32] = hasher.finalize().into(); + if ¤t_hash != source_sha256 { + anyhow::bail!( + "source file {} changed during compilation; refusing to apply codemods. Re-run `ado-aw compile`.", + input_path.display() + ); + } + + common::atomic_write(input_path, &new_content) + .await + .with_context(|| { + format!( + "Failed to atomically rewrite source after codemods: {}", + input_path.display() + ) + })?; + + eprintln!("warning: applied codemods to {}:", input_path.display()); + for applied in &report.applied { + eprintln!(" - {}: {}", applied.id, applied.summary); + } + eprintln!( + "note: front-matter comments are not preserved when codemods rewrite the source. \ + Renamed keys may move to the end of the front-matter block." + ); + + Ok(true) } /// Locate the repo root containing `output_path`, scan it for all compiled @@ -221,6 +363,7 @@ pub async fn compile_all_pipelines(skip_integrity: bool, debug_pipeline: bool) - let mut success_count = 0; let mut skip_count = 0; let mut fail_count = 0; + let mut rewrote_count = 0; for pipeline in &detected { let source_path = root.join(&pipeline.source); @@ -239,8 +382,13 @@ pub async fn compile_all_pipelines(skip_integrity: bool, debug_pipeline: bool) - let source_str = source_path.to_string_lossy(); let output_str = yaml_output_path.to_string_lossy(); - match compile_pipeline_inner(&source_str, Some(&output_str), skip_integrity, debug_pipeline, false).await { - Ok(()) => success_count += 1, + match compile_pipeline_inner(&source_str, Some(&output_str), skip_integrity, debug_pipeline, false, codemods::CODEMODS).await { + Ok(rewrote) => { + success_count += 1; + if rewrote { + rewrote_count += 1; + } + } Err(e) => { eprintln!( " Error compiling '{}': {:#}", @@ -263,10 +411,17 @@ pub async fn compile_all_pipelines(skip_integrity: bool, debug_pipeline: bool) - } println!(); - println!( - "Done: {} compiled, {} skipped, {} failed.", - success_count, skip_count, fail_count - ); + if rewrote_count > 0 { + println!( + "Done: {} compiled, {} skipped, {} failed; {} source file(s) rewritten by codemods.", + success_count, skip_count, fail_count, rewrote_count + ); + } else { + println!( + "Done: {} compiled, {} skipped, {} failed.", + success_count, skip_count, fail_count + ); + } if fail_count > 0 { anyhow::bail!("{} pipeline(s) failed to compile", fail_count); @@ -349,7 +504,27 @@ pub async fn check_pipeline(pipeline_path: &str) -> Result<()> { ) })?; - let (mut front_matter, markdown_body) = parse_markdown(&content)?; + let parsed = parse_markdown_detailed(&content)?; + + // Pending-migration enforcement: `check` MUST NOT silently let + // a stale source pass. The runtime integrity check inside + // compiled pipelines uses the same ado-aw version that produced + // the pipeline (see src/data/base.yml / 1es-base.yml — they + // download `ado-aw v${COMPILER_VERSION}`), so a check failure + // here only happens locally / in CI when a developer runs a + // newer ado-aw against an older source. That is exactly when + // we want to fail loudly so `ado-aw compile` is run. + if parsed.codemods.changed() { + anyhow::bail!( + "{} contains deprecated front-matter shapes that need codemod fixes.\n \ + hint: run `ado-aw compile {}` to apply the codemods in place.", + source_path.display(), + source_path.display(), + ); + } + + let mut front_matter = parsed.front_matter; + let markdown_body = parsed.markdown_body; use crate::sanitize::SanitizeConfig; front_matter.sanitize_config_fields(); diff --git a/src/main.rs b/src/main.rs index 1b2985e3..e046b4d6 100644 --- a/src/main.rs +++ b/src/main.rs @@ -223,14 +223,28 @@ async fn run_execute( ado_project: Option, dry_run: bool, ) -> Result<()> { - // Read and parse source markdown to get tool configs + // Read and parse source markdown to get tool configs. + // Use parse_markdown_detailed so Stage 3 benefits from in-memory + // codemod fixes when a source has deprecated shapes. Stage 3 must + // NOT rewrite the source file (the executor's working tree is not + // the source-of-truth tree), so we just emit a log warning. let content = tokio::fs::read_to_string(&source) .await .with_context(|| format!("Failed to read source file: {}", source.display()))?; - let (front_matter, _) = compile::parse_markdown(&content) + let parsed = compile::parse_markdown_detailed(&content) .with_context(|| format!("Failed to parse source file: {}", source.display()))?; + if parsed.codemods.changed() { + log::warn!( + "front matter at {} contains deprecated shapes; running with in-memory codemod fixes applied. Run `ado-aw compile {}` to update the source.", + source.display(), + source.display(), + ); + } + + let front_matter = parsed.front_matter; + println!("Loaded tool configs from: {}", source.display()); // Extract tools config before moving front_matter into build_execution_context diff --git a/tests/codemod_tests.rs b/tests/codemod_tests.rs new file mode 100644 index 00000000..eadc6499 --- /dev/null +++ b/tests/codemod_tests.rs @@ -0,0 +1,158 @@ +//! Integration tests for the front-matter codemod framework. +//! +//! These tests spawn the compiled `ado-aw` binary as a subprocess +//! (matching the pattern used in `tests/compiler_tests.rs`) and +//! assert on the user-visible behavior of `compile` and `check` for +//! sources with various front-matter shapes. +//! +//! The codemod registry shipped with this binary is intentionally +//! empty; the rewrite path is exercised by the white-box tests in +//! `src/compile/codemod_integration_test.rs`, which can inject a +//! stub registry. These tests cover the user-facing CLI behaviors +//! that don't require codemod registration: +//! +//! - Healthy current sources compile and `check` cleanly without +//! rewriting the source. +//! - Non-mapping front matter is rejected with a clear message. +//! - The full `compile` -> `check` round-trip succeeds. + +use std::fs; +use std::path::{Path, PathBuf}; +use std::process::Command; +use tempfile::TempDir; + +/// Set up a unique temp directory for each test run. Returned as a +/// `TempDir` so RAII cleans the directory up even if a test panics. +fn fresh_temp_dir() -> TempDir { + tempfile::Builder::new() + .prefix("ado-aw-codemod-tests-") + .tempdir() + .expect("create temp dir") +} + +/// Same as [`fresh_temp_dir`] but also creates an empty `.git/` +/// directory at the root so `ado-aw check` (which walks up to the +/// repo root) can resolve a source path from the compiled lock +/// file's `@ado-aw` header. +fn fresh_git_temp_dir() -> TempDir { + let dir = fresh_temp_dir(); + fs::create_dir(dir.path().join(".git")).expect("create .git dir"); + dir +} + +fn ado_aw_binary() -> PathBuf { + PathBuf::from(env!("CARGO_BIN_EXE_ado-aw")) +} + +/// Run `ado-aw compile `, returning the captured output. +fn run_compile(source: &Path) -> std::process::Output { + Command::new(ado_aw_binary()) + .args(["compile", source.to_str().unwrap()]) + .output() + .expect("Failed to run ado-aw compile") +} + +/// Run `ado-aw check `, returning the captured output. +fn run_check(pipeline: &Path) -> std::process::Output { + Command::new(ado_aw_binary()) + .args(["check", pipeline.to_str().unwrap()]) + .output() + .expect("Failed to run ado-aw check") +} + +/// Write a source file to `dir/agent.md` and return its path. +fn write_source(dir: &Path, content: &str) -> PathBuf { + let path = dir.join("agent.md"); + fs::write(&path, content).expect("write source"); + path +} + +// ─── Healthy compile (no codemods needed) ────────────────────────────────── + +#[test] +fn compile_succeeds_on_current_source() { + let dir = fresh_temp_dir(); + let original = "---\nname: smoketest\ndescription: smoketest description\n---\n## Body\n\nHello.\n"; + let source = write_source(dir.path(), original); + + let output = run_compile(&source); + + assert!( + output.status.success(), + "compile should succeed on healthy source: stdout={:?} stderr={:?}", + String::from_utf8_lossy(&output.stdout), + String::from_utf8_lossy(&output.stderr) + ); + + // Lock file must be generated. + let lock = source.with_extension("lock.yml"); + assert!( + lock.exists(), + "expected compiled YAML at {}", + lock.display() + ); + + // Empty registry + healthy source must NOT rewrite — verify + // byte-identity. + let after = fs::read_to_string(&source).expect("re-read source"); + assert_eq!( + after, original, + "source must be byte-identical after compile when no codemods apply" + ); + + // Stderr should NOT contain a codemod warning. + let stderr = String::from_utf8_lossy(&output.stderr); + assert!( + !stderr.contains("warning: applied codemods"), + "no codemod warning expected, got stderr: {}", + stderr + ); +} + +#[test] +fn compile_then_check_round_trip_passes() { + let dir = fresh_git_temp_dir(); + let source = write_source( + dir.path(), + "---\nname: round-trip-agent\ndescription: round-trip\n---\n## Body\n", + ); + + let compile_output = run_compile(&source); + assert!( + compile_output.status.success(), + "compile should succeed: {}", + String::from_utf8_lossy(&compile_output.stderr) + ); + + let lock = source.with_extension("lock.yml"); + assert!(lock.exists(), "expected lock file at {}", lock.display()); + + let check_output = run_check(&lock); + assert!( + check_output.status.success(), + "check should succeed: stdout={:?} stderr={:?}", + String::from_utf8_lossy(&check_output.stdout), + String::from_utf8_lossy(&check_output.stderr) + ); +} + +// ─── Non-mapping front matter ────────────────────────────────────────────── + +#[test] +fn compile_rejects_non_mapping_top_level_yaml() { + let dir = fresh_temp_dir(); + let source = write_source(dir.path(), "---\n- a\n- b\n---\nbody\n"); + + let output = run_compile(&source); + + assert!( + !output.status.success(), + "compile should fail when front matter is a sequence not a mapping" + ); + let stderr = String::from_utf8_lossy(&output.stderr); + assert!( + stderr.contains("must be a mapping"), + "stderr should report non-mapping error, got: {}", + stderr + ); +}