Refactor: Engine type + Orchestrator adoption; request-time label catalog#288
Merged
Conversation
Replace EngineHandle with `Engine`, the single entry point for both
per-document analyze/apply verbs and the long-lived state (registry +
codecs). Adopt elide's upstream `Orchestrator` for per-document
pipelines now that `Report::new()` + `insert_body::<M>` are public —
the two-phase cross-process round-trip (persist entities, review,
re-apply) is supported.
Layout:
- engine/mod.rs hosts `Engine` and the per-call orchestrator builder.
Two verbs:
* `analyze(&mut UntypedDocumentHandle, &AnalyzerParams) -> Report`
* `anonymize_with(&mut UntypedDocumentHandle, &AnalyzerParams,
&[Policy], ModalityKind,
&[(Uuid, RuleAction)], Report) -> ()`
Internal `build_orchestrator` constructs a modality-free `Scope` (now
bundles the label catalog), compiles the per-modality analyzers, and
builds each anonymizer by layering catalog → body-modality overrides
→ policy chain on a fresh `Anonymizer<M>`.
- engine/{analyzer,anonymizer}/ moved under engine/ as pub(crate)
helpers. Per-modality `compile_*` now returns just `Analyzer<M>`
(scope is global on the orchestrator). Standalone
`compile_text`/`tabular`/`image`/`audio` in anonymizer/ deleted —
Engine attaches policies directly via `attach_policies_*`.
- engine/scope.rs: untyped `compile_scope(params, catalog) -> Scope`,
matching upstream's modality-free Scope shape.
- runs/pipeline.rs: collapses from ~325 → ~225 LOC. Outer per-modality
switch deleted; `analyze_document` is one `engine.analyze` call
followed by per-M extraction of `report.entities::<M>()` to pin the
body modality for persistence. `apply_document` rebuilds a `Report`
via `Report::new().insert_body::<M>(entities)`, extracts overrides
from records, calls `engine.anonymize_with`, re-types the handle,
re-encodes the bytes.
Container parts (PDF embedded images, archive members) are detected by
the orchestrator but discarded at the `DocBody` persistence boundary
today; evolving `DocBody` to carry parts is a follow-up.
Upstream elide bumped `70e6e8bf` → `0ca39cf3` for the public
`Report` constructors + the simplified orchestrator surface
(`analyze` / `anonymize_with` / `anonymize`, untyped handles,
modality-free `Scope`, `with_modality(analyzer, anonymizer)` without
scope).
Server + CLI: EngineHandle → Engine (mechanical rename).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
… paths
Plan + wire:
- nvisy-core: new `LabelCatalogParams { builtins: Vec<String>,
custom: Vec<LabelSchema> }` (`plan/label.rs`). `AnalyzerParams.
label_catalog` switches from `Vec<LabelSchema>` to the new type.
Defaults empty — deployments may pre-populate via the server-default
block; the request can scalar-replace it wholesale.
- nvisy-server: `AnalyzerOverrides.label_catalog` collapses from
`CollectionOverride<LabelSchema, LabelSelector>` to
`ScalarOverride<LabelCatalogParams>`. `LabelSelector` +
`label_matches` deleted.
- nvisy-engine: `build_catalog` resolves the per-request catalog by
looking each `builtins` name up against a process-cached
`LabelCatalog::with_builtins()` (`OnceLock`); unknown names log a
warning and are skipped. Customs insert after.
Style sweep:
- Hoist inlined fully-qualified paths to `use` statements across the
engine + server (registry/fjall_ext, runs/{orchestrate,persist,
pipeline,state}, keyspace/{context,file,policy},
engine/{mod,scope,analyzer/common}, server handler/{detections,
files}). `tracing::*` macros + rustdoc reference-link targets stay
inline per the standing rule.
- Collapse three manual `impl Default` blocks on
`AnalyzerOverrides` / `RecognizerOverrides` / `EnricherOverrides`
to `#[derive(Default)]` (`ScalarOverride::default()` /
`CollectionOverride::default()` already produce the right Inherit).
Nvisy.example.toml: rewrite the `[analyzer.label_catalog]` block to
show the new `builtins = [...]` + optional `[[custom]]` shape.
nvisy-engine adds `tracing` as a workspace dep for the warn point.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
`schemars` falls back to numeric collision names for generic enum
monomorphisations — `ScalarOverride`, `ScalarOverride2`,
`ScalarOverride3`, ... — which read as garbage in the OpenAPI spec
and on the rendered docs.
Add `#[schemars(rename = "{T}Override")]` on `ScalarOverride<T>` and
`#[schemars(rename = "{T}CollectionOverride")]` on
`CollectionOverride<T, S>` so each instantiation gets a distinct,
descriptive schema name (`DeduplicationParamsOverride`,
`ScopeParamsOverride`, `LabelCatalogParamsOverride`,
`NerRecognizerParamsCollectionOverride`, etc.).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Rewires the runtime's per-document pipelines on top of upstream
elide::Orchestrator, introduces anEnginetype as the single entry point for both long-lived state (registry + codecs) and per-request verbs (analyze/anonymize_with), and switches the label catalog to a request-timebuiltins-by-name +custom-inline shape.Enginetype + Orchestrator adoptionEngineHandle→Engine. Same fields (RegistryHandle+Arc<FormatRegistry>), cheaply cloneable. New verbs:Engine::analyze(&mut UntypedDocumentHandle, &AnalyzerParams) -> Report— builds anOrchestrator, runs detection, returns the editable report.Engine::anonymize_with(handle, spec, &[Policy], ModalityKind, &[(Uuid, RuleAction)], Report) -> ()— same construction with policies + body-modality reviewer overrides layered on, runs the redaction phase.analyzer/+anonymizer/underengine/aspub(crate). Per-modalitycompile_*returns justAnalyzer<M>; the (now modality-free)Scopeis built once viaengine/scope.rs::compile_scopeand attached to the orchestrator withwith_scope.runs/pipeline.rscollapses ~325 → ~225 LOC — outer per-modality switch deleted.analyze_documentis oneengine.analyzecall followed by per-Mextraction ofreport.entities::<M>()to pin the body modality for persistence.apply_documentrebuilds the report withReport::new().insert_body::<M>(entities), extracts overrides from records, callsengine.anonymize_with, re-types the handle, re-encodes the bytes.DocBodypersistence boundary today. Follow-up: evolveDocBodyto retain parts.Upstream elide bump
70e6e8bf→0ca39cf3. Critical changes absorbed:Report::new()+Report::insert_body::<M>(entities)+insert_part::<P>(id, entities)now public — two-phase cross-process round-trip is supported.Orchestrator::analyze/anonymize_withtake&mut UntypedDocumentHandle(was generic<M>) — orchestrator dispatches modality internally.Scopeis modality-free now; one scope per orchestrator viawith_scope, bundles the label catalog.analyze_document→analyze,apply→anonymize_with,anonymize_document→anonymize.with_modality(analyzer, anonymizer, scope)→with_modality(analyzer, anonymizer)(scope is global).Label catalog (request-time builtins + custom)
nvisy-core::plan::LabelCatalogParams { builtins: Vec<String>, custom: Vec<LabelSchema> }(newplan/label.rs).AnalyzerParams.label_catalogswitches fromVec<LabelSchema>to this type. Defaults empty.nvisy-server::AnalyzerOverrides.label_catalogcollapses fromCollectionOverride<LabelSchema, LabelSelector>toScalarOverride<LabelCatalogParams>— the catalog is now plain data, replaced wholesale per request (no collection patches).engine::build_catalogresolvesbuiltinsagainst a process-cachedLabelCatalog::with_builtins()(OnceLock); unknown names logtracing::warn!and are skipped.customschemas insert after.Nvisy.example.toml: shows the new[analyzer.label_catalog]block withbuiltins = [...]and optional[[custom]].Style sweep
usestatements across the engine + server.tracing::*macros + rustdoc reference-link targets stay inline per project convention.impl Defaultblocks (AnalyzerOverrides,RecognizerOverrides,EnricherOverrides) to#[derive(Default)]—ScalarOverride::default()/CollectionOverride::default()already produce the rightInherit.#[schemars(rename = "{T}Override")]onScalarOverride<T>and#[schemars(rename = "{T}CollectionOverride")]onCollectionOverride<T, S>so OpenAPI getsDeduplicationParamsOverride/ScopeParamsOverride/ etc. instead ofScalarOverride,ScalarOverride2,ScalarOverride3, ...Commits
f515aca5— refactor(engine): introduce Engine type; adopt elide::Orchestrator371c6a9b— feat(plan,engine,server): builtin+custom label catalog; hoist inlined pathse5050d49— chore(server): name override schemas by their inner typeTest plan
cargo build --workspacecargo clippy --workspace --all-targets -- -D warningscargo test --workspaceRUSTDOCFLAGS="-D warnings" cargo doc --workspace --no-depscargo deny check allFollow-ups
DocBodyto retain container parts (HashMap<PartId, _>) so PDF-embedded-image OCR + archive-member redactions survive the analyze → review → apply round-trip.🤖 Generated with Claude Code