Skip to content

Refactor: Engine type + Orchestrator adoption; request-time label catalog#288

Merged
martsokha merged 3 commits into
mainfrom
refactor/engine-orchestrator
Jun 27, 2026
Merged

Refactor: Engine type + Orchestrator adoption; request-time label catalog#288
martsokha merged 3 commits into
mainfrom
refactor/engine-orchestrator

Conversation

@martsokha

Copy link
Copy Markdown
Member

Summary

Rewires the runtime's per-document pipelines on top of upstream elide::Orchestrator, introduces an Engine type as the single entry point for both long-lived state (registry + codecs) and per-request verbs (analyze / anonymize_with), and switches the label catalog to a request-time builtins-by-name + custom-inline shape.

Engine type + Orchestrator adoption

  • Rename EngineHandleEngine. Same fields (RegistryHandle + Arc<FormatRegistry>), cheaply cloneable. New verbs:
    • Engine::analyze(&mut UntypedDocumentHandle, &AnalyzerParams) -> Report — builds an Orchestrator, runs detection, returns the editable report.
    • Engine::anonymize_with(handle, spec, &[Policy], ModalityKind, &[(Uuid, RuleAction)], Report) -> () — same construction with policies + body-modality reviewer overrides layered on, runs the redaction phase.
  • Move analyzer/ + anonymizer/ under engine/ as pub(crate). Per-modality compile_* returns just Analyzer<M>; the (now modality-free) Scope is built once via engine/scope.rs::compile_scope and attached to the orchestrator with with_scope.
  • runs/pipeline.rs collapses ~325 → ~225 LOC — outer per-modality switch deleted. analyze_document is one engine.analyze call followed by per-M extraction of report.entities::<M>() to pin the body modality for persistence. apply_document rebuilds the report with Report::new().insert_body::<M>(entities), extracts overrides from records, calls engine.anonymize_with, re-types the handle, re-encodes the bytes.
  • Container parts (PDF embedded images, archive members) are detected by the orchestrator but discarded at the DocBody persistence boundary today. Follow-up: evolve DocBody to retain parts.

Upstream elide bump

70e6e8bf0ca39cf3. Critical changes absorbed:

  • Report::new() + Report::insert_body::<M>(entities) + insert_part::<P>(id, entities) now public — two-phase cross-process round-trip is supported.
  • Orchestrator::analyze / anonymize_with take &mut UntypedDocumentHandle (was generic <M>) — orchestrator dispatches modality internally.
  • Scope is modality-free now; one scope per orchestrator via with_scope, bundles the label catalog.
  • Methods renamed: analyze_documentanalyze, applyanonymize_with, anonymize_documentanonymize.
  • with_modality(analyzer, anonymizer, scope)with_modality(analyzer, anonymizer) (scope is global).

Label catalog (request-time builtins + custom)

  • nvisy-core::plan::LabelCatalogParams { builtins: Vec<String>, custom: Vec<LabelSchema> } (new plan/label.rs). AnalyzerParams.label_catalog switches from Vec<LabelSchema> to this type. Defaults empty.
  • nvisy-server::AnalyzerOverrides.label_catalog collapses from CollectionOverride<LabelSchema, LabelSelector> to ScalarOverride<LabelCatalogParams> — the catalog is now plain data, replaced wholesale per request (no collection patches).
  • engine::build_catalog resolves builtins against a process-cached LabelCatalog::with_builtins() (OnceLock); unknown names log tracing::warn! and are skipped. custom schemas insert after.
  • Nvisy.example.toml: shows the new [analyzer.label_catalog] block with builtins = [...] and optional [[custom]].

Style sweep

  • Hoist inlined fully-qualified paths to use statements across the engine + server. tracing::* macros + rustdoc reference-link targets stay inline per project convention.
  • Collapse three manual impl Default blocks (AnalyzerOverrides, RecognizerOverrides, EnricherOverrides) to #[derive(Default)]ScalarOverride::default() / CollectionOverride::default() already produce the right Inherit.
  • Schema-naming fix: #[schemars(rename = "{T}Override")] on ScalarOverride<T> and #[schemars(rename = "{T}CollectionOverride")] on CollectionOverride<T, S> so OpenAPI gets DeduplicationParamsOverride / ScopeParamsOverride / etc. instead of ScalarOverride, ScalarOverride2, ScalarOverride3, ...

Commits

  1. f515aca5 — refactor(engine): introduce Engine type; adopt elide::Orchestrator
  2. 371c6a9b — feat(plan,engine,server): builtin+custom label catalog; hoist inlined paths
  3. e5050d49 — chore(server): name override schemas by their inner type

Test plan

  • cargo build --workspace
  • cargo clippy --workspace --all-targets -- -D warnings
  • cargo test --workspace
  • RUSTDOCFLAGS="-D warnings" cargo doc --workspace --no-deps
  • cargo deny check all

Follow-ups

  • Evolve DocBody to retain container parts (HashMap<PartId, _>) so PDF-embedded-image OCR + archive-member redactions survive the analyze → review → apply round-trip.

🤖 Generated with Claude Code

martsokha and others added 3 commits June 27, 2026 03:02
Replace EngineHandle with `Engine`, the single entry point for both
per-document analyze/apply verbs and the long-lived state (registry +
codecs). Adopt elide's upstream `Orchestrator` for per-document
pipelines now that `Report::new()` + `insert_body::<M>` are public —
the two-phase cross-process round-trip (persist entities, review,
re-apply) is supported.

Layout:

- engine/mod.rs hosts `Engine` and the per-call orchestrator builder.
  Two verbs:
    * `analyze(&mut UntypedDocumentHandle, &AnalyzerParams) -> Report`
    * `anonymize_with(&mut UntypedDocumentHandle, &AnalyzerParams,
                       &[Policy], ModalityKind,
                       &[(Uuid, RuleAction)], Report) -> ()`
  Internal `build_orchestrator` constructs a modality-free `Scope` (now
  bundles the label catalog), compiles the per-modality analyzers, and
  builds each anonymizer by layering catalog → body-modality overrides
  → policy chain on a fresh `Anonymizer<M>`.

- engine/{analyzer,anonymizer}/ moved under engine/ as pub(crate)
  helpers. Per-modality `compile_*` now returns just `Analyzer<M>`
  (scope is global on the orchestrator). Standalone
  `compile_text`/`tabular`/`image`/`audio` in anonymizer/ deleted —
  Engine attaches policies directly via `attach_policies_*`.

- engine/scope.rs: untyped `compile_scope(params, catalog) -> Scope`,
  matching upstream's modality-free Scope shape.

- runs/pipeline.rs: collapses from ~325 → ~225 LOC. Outer per-modality
  switch deleted; `analyze_document` is one `engine.analyze` call
  followed by per-M extraction of `report.entities::<M>()` to pin the
  body modality for persistence. `apply_document` rebuilds a `Report`
  via `Report::new().insert_body::<M>(entities)`, extracts overrides
  from records, calls `engine.anonymize_with`, re-types the handle,
  re-encodes the bytes.

Container parts (PDF embedded images, archive members) are detected by
the orchestrator but discarded at the `DocBody` persistence boundary
today; evolving `DocBody` to carry parts is a follow-up.

Upstream elide bumped `70e6e8bf` → `0ca39cf3` for the public
`Report` constructors + the simplified orchestrator surface
(`analyze` / `anonymize_with` / `anonymize`, untyped handles,
modality-free `Scope`, `with_modality(analyzer, anonymizer)` without
scope).

Server + CLI: EngineHandle → Engine (mechanical rename).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… paths

Plan + wire:

- nvisy-core: new `LabelCatalogParams { builtins: Vec<String>,
  custom: Vec<LabelSchema> }` (`plan/label.rs`). `AnalyzerParams.
  label_catalog` switches from `Vec<LabelSchema>` to the new type.
  Defaults empty — deployments may pre-populate via the server-default
  block; the request can scalar-replace it wholesale.
- nvisy-server: `AnalyzerOverrides.label_catalog` collapses from
  `CollectionOverride<LabelSchema, LabelSelector>` to
  `ScalarOverride<LabelCatalogParams>`. `LabelSelector` +
  `label_matches` deleted.
- nvisy-engine: `build_catalog` resolves the per-request catalog by
  looking each `builtins` name up against a process-cached
  `LabelCatalog::with_builtins()` (`OnceLock`); unknown names log a
  warning and are skipped. Customs insert after.

Style sweep:

- Hoist inlined fully-qualified paths to `use` statements across the
  engine + server (registry/fjall_ext, runs/{orchestrate,persist,
  pipeline,state}, keyspace/{context,file,policy},
  engine/{mod,scope,analyzer/common}, server handler/{detections,
  files}). `tracing::*` macros + rustdoc reference-link targets stay
  inline per the standing rule.
- Collapse three manual `impl Default` blocks on
  `AnalyzerOverrides` / `RecognizerOverrides` / `EnricherOverrides`
  to `#[derive(Default)]` (`ScalarOverride::default()` /
  `CollectionOverride::default()` already produce the right Inherit).

Nvisy.example.toml: rewrite the `[analyzer.label_catalog]` block to
show the new `builtins = [...]` + optional `[[custom]]` shape.

nvisy-engine adds `tracing` as a workspace dep for the warn point.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`schemars` falls back to numeric collision names for generic enum
monomorphisations — `ScalarOverride`, `ScalarOverride2`,
`ScalarOverride3`, ... — which read as garbage in the OpenAPI spec
and on the rendered docs.

Add `#[schemars(rename = "{T}Override")]` on `ScalarOverride<T>` and
`#[schemars(rename = "{T}CollectionOverride")]` on
`CollectionOverride<T, S>` so each instantiation gets a distinct,
descriptive schema name (`DeduplicationParamsOverride`,
`ScopeParamsOverride`, `LabelCatalogParamsOverride`,
`NerRecognizerParamsCollectionOverride`, etc.).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@martsokha martsokha self-assigned this Jun 27, 2026
@martsokha martsokha added refactor code restructuring without behavior change feat request for or implementation of a new feature engine redaction engine, pipeline runtime, orchestration, configuration core content model, errors, shared types server server, API handlers, middleware recognition pattern, NER, LLM, and OCR backends (elide::recognition::*) redaction anonymizer, deanonymizer, redaction operators, replacements labels Jun 27, 2026
@martsokha martsokha merged commit 40e3ded into main Jun 27, 2026
6 checks passed
@martsokha martsokha deleted the refactor/engine-orchestrator branch June 27, 2026 02:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core content model, errors, shared types engine redaction engine, pipeline runtime, orchestration, configuration feat request for or implementation of a new feature recognition pattern, NER, LLM, and OCR backends (elide::recognition::*) redaction anonymizer, deanonymizer, redaction operators, replacements refactor code restructuring without behavior change server server, API handlers, middleware

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant