From 0f970661e418be444acf82c355aa475a5ac65533 Mon Sep 17 00:00:00 2001 From: Nigel Jones Date: Thu, 7 May 2026 15:56:45 +0100 Subject: [PATCH 01/29] docs: add design proposal for intrinsic adapter lifecycle (#929) Draft proposal for agreeing the end-state shape of the intrinsic adapter refactor covered by Epic #929. Part I covers problem / goals / terminology / rough end result for high-level review; Part II contains supporting detail (reality breakdown, full thread mapping, observability, docs/tests/tutorials, migration phases, open questions). Branch intended as a preservation point while the design is circulated; not a PR candidate yet. Assisted-by: Claude Code Signed-off-by: Nigel Jones --- docs/dev/issue-929-adapter-design-proposal.md | 243 ++++++++++++++++++ 1 file changed, 243 insertions(+) create mode 100644 docs/dev/issue-929-adapter-design-proposal.md diff --git a/docs/dev/issue-929-adapter-design-proposal.md b/docs/dev/issue-929-adapter-design-proposal.md new file mode 100644 index 000000000..f1830a412 --- /dev/null +++ b/docs/dev/issue-929-adapter-design-proposal.md @@ -0,0 +1,243 @@ +# Intrinsic Adapter Lifecycle — Design Proposal + +> Status: proposal for shape agreement. +> Addresses Epic #929. +> Structure: **Part I** is for agreeing the problem, goals, terminology, and end state. **Part II** contains the supporting detail — read after Part I lands, not before. + +--- + +# Part I — Summary for agreement + +## 1. The problem we are solving + +Mellea intrinsics — `check_answerability`, `requirement_check`, `find_citations`, the Guardian helpers — let users add specialised capabilities to a base model. Under the hood each one is an **adapter**: a small artefact that specialises the base model for that one task. + +Three sources of friction have accumulated: + +1. **Three different kinds of adapter share one class hierarchy.** Local PEFT adapters (weights on disk), Granite Switch "embedded" adapters (weights baked into the base model), and the yet-to-return OpenAI-compatible adapters (weights served behind an API) all try to live under one base class. The code branches on backend identity (`if backend._uses_embedded_adapters:`) to route between them. +2. **Adapter lifecycle is not modelled.** `call_intrinsic` constructs an `IntrinsicAdapter` as a side effect of invoking one, which triggers an unconditional weight download even when no download is needed. The user sees a misleading download error; the real error is masked. There is no concept of "prepare," "activate," "deactivate" as distinct steps. +3. **Small, visible follow-on issues cluster around these two roots** — a five-place model-options hierarchy with a silent-overwrite bug; JSON output keys hardcoded in helpers (`result_json["answerability"]`) that break when an adapter ships a new output schema; the `"requirement-check"` string duplicated across four files; a `CustomIntrinsicAdapter` whose constructor monkey-patches the global catalog with a self-confessed "temporary hack." + +Every thread in #929 is a symptom of not having separated the kinds of adapter and their lifecycles cleanly. + +## 2. What we are trying to achieve + +- **One coherent mental model** of what an adapter is, so users and contributors can reason about intrinsic behaviour without reading the implementation. +- **One code path** through `call_intrinsic` that works regardless of whether the adapter's weights are local, baked into the base model, or hosted on a server. +- **Correct, documented model-option precedence** that does not silently overwrite caller intent. +- **Schema-version safety** so adapters can evolve their output format without breaking callers, and so an adapter whose schema drifts is visible rather than silent. +- **First-class custom adapters** — users can ship their own without monkey-patching a global registry. +- **Observable intrinsic calls** so failures during download, activation, or parsing are diagnosable on first run, not after ad-hoc `print` debugging. +- **Parity, not breakage** — high-level helpers (`check_answerability` etc.) keep their shape; manual adapter construction becomes simpler, not harder. + +## 3. Terminology + +Names matter in this design because they appear in user-facing error messages, docs, and telemetry attributes. Glossary for this proposal: + +| Term | Meaning | +| --- | --- | +| **Base model** | The general-purpose LLM (e.g. `ibm-granite/granite-4.1-3b`) that everything runs on top of. | +| **Intrinsic** | A specialised capability — answerability, citations, requirement-check, and so on — invoked via a high-level helper or the `Intrinsic` AST component. | +| **Adapter** | The artefact that implements an intrinsic on top of a base model. In the redesign, `Adapter` is one class composed of three parts (identity, I/O contract, weights binding). | +| **Identity** | The part of an adapter that says *what it is*: name (e.g. `answerability`), adapter type (`lora` / `alora`), schema version, and optional role. | +| **I/O contract** | The parsed `io.yaml` — prompt template, output parser, model-option defaults. Always present, same shape regardless of reality. | +| **Weights binding** | The part of an adapter that says *how its weights are made available*. Three subclasses, one per reality (see below). Exposes `prepare`, `activate`, `deactivate`, `release`. | +| **Reality A / B / C** | Shorthand for the three "where the weights live" stories: A = local PEFT file, B = baked into the base model (Granite Switch), C = server-mediated (future OpenAI/vLLM). | +| **LoRA / aLoRA** | Two PEFT technologies. LoRA weights always participate; aLoRA only participates after an activation token is seen. Both are adapter types that a given intrinsic may ship as. | +| **Role** | A *semantic* label on an adapter distinct from its name — e.g. `requirement_check`, `context_attribution`. Used by callers (the `Requirement` rerouting path) to find "the adapter that plays this role" without hardcoding a name string. | +| **Qualified name** | Today's disambiguator: `_`. In the redesign, derived on demand from `identity` rather than stored as a field. | +| **Catalog** | The registry of known intrinsics at `mellea/backends/adapters/catalog.py`. Becomes optional and advisory rather than mandatory and monkey-patched. | +| **`io.yaml`** | The YAML file that declares an adapter's input template, output schema, and generation parameters. Lives in the adapter's HuggingFace repo. | + +## 4. Rough end result + +An **Adapter** is a small object composed of three parts: + +``` +Adapter +├── identity — name, adapter type (lora/alora), schema version, optional role +├── io_contract — parsed io.yaml: prompt building, output parsing, model options +└── weights — one of three pluggable bindings (LocalFile, Embedded, ServerMediated) +``` + +The **weights binding** is where the three realities live. It exposes a single verb set — `prepare`, `activate`, `deactivate`, `release` — that every backend calls uniformly. Each concrete binding implements those verbs for its reality: + +| Binding | `prepare` | `activate` | `deactivate` | +| --- | --- | --- | --- | +| `LocalFileBinding` (Reality A) | Download from repo → cache path | PEFT `load_adapter` | PEFT `unload_adapter` | +| `EmbeddedBinding` (Reality B) | No-op (weights baked in) | Render `controls` field into chat template | Drop the `controls` field | +| `ServerMediatedBinding` (Reality C) | No-op (or push weights, depending on sub-case) | Set adapter identifier on API request | Unset identifier | + +`call_intrinsic` becomes one flow, no branches on backend type: + +``` +adapter = backend.resolve_adapter(name) +with backend.adapter_scope(adapter): + raw = backend.generate(adapter.io_contract.build_prompt(...)) +return adapter.io_contract.parse(raw) +``` + +From this shape, the seven threads of #929 resolve cleanly. Full mapping is in Part II §8. + +**What users see:** high-level helpers (`check_answerability` etc.) keep their current shape, with the `model_options=` addition that PR #1003 is introducing. Manual adapter construction collapses from four classes to one, with the binding as the pluggable part. Custom intrinsics no longer require monkey-patching the catalog. Detail in Part II §9. + +**What cross-cutting concerns look like:** observability (spans + a schema-drift metric), docs rewrite (`intrinsics_and_adapters.md` is 39 lines describing classes this renames), and a test-parity commitment travel **with** the refactor, not after it. Detail in Part II §10–§11. + +## 5. Decisions needed now + +These gate decomposition; everything else can live in sub-issues once these land. + +1. **Does the end-state shape (§4) hold?** Three realities, `Adapter = identity + io_contract + weights`, role-based lookup for rerouting. Yes / no / what's missing. +2. **Adapter lifecycle default — session-scoped or request-scoped?** Today's HF backend keeps adapters loaded once added; request-scoped load/unload is safer for multi-tenancy but costs latency on a 7B base. +3. **OpenAI Reality C — which concrete shape first?** vLLM-backed LoRA serving (client-known weight file, server-side load) or commercial fine-tunes (fully hosted)? The binding covers both; the first subclass sets the idiom. +4. **Telemetry coupling with #1035.** Land intrinsic spans as part of this refactor, or as a follow-on to PR #1036's Gap 5? Coupling avoids designing content capture twice; decoupling keeps #1036 moving. +5. **Deprecation window.** How long do `IntrinsicAdapter` / `EmbeddedIntrinsicAdapter` / `CustomIntrinsicAdapter` stay as shims before removal? One minor release is the default; confirm. + +--- + +# Part II — Supporting detail + +> For deeper review once Part I lands. Skim headings first. + +## 6. Three realities of "where the weights live" + +### 6.1 Reality A — Local PEFT adapter (today's `IntrinsicAdapter`) + +- Weights are a distinct file Mellea downloads from HuggingFace into the local cache. +- At call time, the backend uses the PEFT library to plug those weights into the base model. +- After the call, the backend can unplug them. +- **Physical weights, runtime activation, downloadable lifecycle.** + +### 6.2 Reality B — Embedded adapter (today's `EmbeddedIntrinsicAdapter`, used by Granite Switch) + +- Weights were **baked into the base model during training**. No separate file to download. +- Activation is done by rendering a `controls` field (structured JSON) into the chat template. The Jinja template places it either at the beginning of the sequence (LoRA technology) or before the generation prompt (aLoRA technology). The model itself routes the request to the right baked-in weights. +- You still need the `io.yaml` for input/output formatting — that's the only artefact the client needs. +- **Pre-installed weights, prompt-level activation, no download lifecycle.** + +### 6.3 Reality C — Server-mediated adapter (today's gap, #929 point 5) + +Two plausible sub-cases; design must accommodate both. + +- **C1 — Client-pulled, server-activated**: weights exist as a file on the client side (or somewhere pullable), but activation happens on a remote inference server (e.g. vLLM loads them and exposes them via a LoRA ID or per-request model alias). The client sends either `model=` or a dedicated LoRA field in the API request. PR #543 removed this path because vLLM dropped aLoRA support; #929 point 5 anticipates its return as "a much different implementation." This is the likely near-term shape. +- **C2 — Provider-hosted**: weights live entirely on the provider's infrastructure. The client only ever passes an identifier. Applies to commercial fine-tunes behind OpenAI, Azure, etc. + +Both share: **no local weight loading, API-parameter activation, `io.yaml` still required client-side.** + +## 7. Why the current code is tangled (concrete example) + +Inside `_util.call_intrinsic`: + +```python +if getattr(backend, "_uses_embedded_adapters", False): + adapters = EmbeddedIntrinsicAdapter.from_source(...) +else: + intrinsic_adapter = IntrinsicAdapter(...) # Reality A path +``` + +Three problems: + +1. **`_uses_embedded_adapters` is a backend flag, not an adapter property.** It hard-codes "this backend type → always this adapter type." Reality C needs a third branch, then a fourth if a backend supports both. +2. **The `else` branch calls `obtain_lora` unconditionally** via `IntrinsicAdapter.__init__` → `download_and_get_path`. If the adapter was meant to be a different type, the user sees a misleading download-path error instead of the real cause. +3. **Output parsing assumes one schema.** `result_json["answerability"]` is hardcoded in helpers. When PR #1008 changed `requirement-check` output from `{"requirement_likelihood": 0.9}` to `{"requirement_check": {"score": 0.9}}`, the parsing helper had to be rewritten and the catalog gained a second entry (`requirement_check` for Granite 3.x, `requirement-check` for Granite 4.x) to support both. + +## 8. Full #929 thread mapping + +| Thread | Resolution | +| --- | --- | +| 1a. Loading/unloading divergence | One `WeightsBinding` verb set; control flow identical across realities. | +| 1b. `obtain_lora` always-called bug | Only `LocalFileBinding.prepare` calls `obtain_lora`; others no-op. | +| 1c. Backend- + adapter-type-specific abstraction | `WeightsBinding` is the adapter-type axis; `AdapterMixin` verbs are the backend axis. | +| 2a. Intrinsic rewriters overwrite options | `Adapter.resolve_model_options()` replaces the five-place merge with one documented stack. | +| 2b/2c. Model-option hierarchy | Five layers enforced in `resolve_model_options` (base model → adapter config → `io.yaml` defaults → `io.yaml` per-intrinsic → caller). | +| 3. Naming consistency | Three-axis identity (`name`, `adapter_type`, `version`) plus explicit `role`. | +| 4a. `call_intrinsic` assumes one output schema | `io_contract.parse()` dispatches on `(name, version)`; helpers see normalised shape. | +| 4b. Per-adapter vs standard schema | `io_contract.parse()` is per-adapter; helpers define the normalised post-parse shape. | +| 4c. Versioning | Schema version declared in `io.yaml` (`schema_version:`); defaults to `v1`. | +| 5. OpenAI backend support | Ships as one or two `ServerMediatedBinding` subclasses. | +| 6. Catalog cleanup | Catalog becomes optional resolver (`LocalFileBinding.from_catalog(name)`). Custom adapters bypass it; no monkey-patching. Duplicate `requirement_check` / `requirement-check` entries collapse into one entry with two schema versions. | +| 7. Hardcoded `requirement-check` refs | Callers look up by **role**, not name. | + +## 9. What users see — detailed + +**High-level helpers** keep their signatures. The `model_options=` parameter lands via PR #1003: + +```python +score = check_answerability(question, documents, context, backend) +score = check_answerability(question, documents, context, backend, + model_options={"temperature": 0.1}) +``` + +**Manual adapter construction** collapses from four classes (`IntrinsicAdapter`, `EmbeddedIntrinsicAdapter`, `CustomIntrinsicAdapter`, abstract base) to one `Adapter` + a binding: + +```python +# Stock intrinsic from the catalogue: +adapter = Adapter(name="answerability", + weights=LocalFileBinding.from_catalog("answerability")) + +# Custom intrinsic — no catalog monkey-patching: +adapter = Adapter(name="my-thing", + weights=LocalFileBinding(source="myuser/my-adapter", + base_model_name="granite-4.1-3b"), + io_contract=IOContract.from_yaml("./io.yaml")) + +# Granite Switch embedded: +adapter = Adapter(name="answerability", + weights=EmbeddedBinding.from_base_model(backend)) +``` + +**Backend authors** keep `AdapterMixin` as the backend surface, but it exposes only the verbs a backend naturally has: `load_peft_adapter`, `unload_peft_adapter`, `render_controls`, `set_request_adapter`. Bindings call into these verbs. Adding a new reality = adding a new verb + new binding. + +## 10. Observability + +Intrinsic calls have no bespoke instrumentation today. Folding it into the redesign costs one span attribute per verb; retrofitting means re-editing every binding later. + +**Spans** — each `adapter_scope` wraps a child span tree rooted at `intrinsic.call`, with children `intrinsic.prepare`, `intrinsic.activate`, `intrinsic.generate`, `intrinsic.parse`, `intrinsic.deactivate`. Standard attributes: `intrinsic.name`, `intrinsic.version`, `intrinsic.role`, `intrinsic.adapter_type`, `intrinsic.binding_type`, `intrinsic.source`, `intrinsic.target`. Errors set OTel `ERROR` status (aligns with #1035 gap 4). + +**Metrics** — an `IntrinsicMetricsPlugin` alongside the existing Token / Latency / Error plugins: +- `mellea.intrinsic.invocations` — counter labelled by name, version, binding type, adapter type, outcome. +- `mellea.intrinsic.phase_duration_ms` — histogram labelled by name, phase. +- `mellea.intrinsic.parse_failures` — counter labelled by name, version. This is the **schema-drift detector**: a climbing counter against a specific `(name, version)` pair means an upstream adapter shipped a schema change without a version bump. + +**Content capture** — gated behind PR #1036's `MELLEA_TRACE_CONTENT` flag. Intrinsics emit `intrinsic.input.kwargs` (structured dict), `intrinsic.output.raw` (raw JSON string), and `intrinsic.output.parsed` (normalised shape) as span events. Different shape from chat `gen_ai.*.message` events because intrinsics have different semantics. + +## 11. Docs, tests, tutorials + +First-class deliverables, not afterthoughts. + +**Docs** — rewrite (not edit) for `docs/dev/intrinsics_and_adapters.md` (39 lines describing classes that get renamed). Update `docs/dev/requirement_aLoRA_rerouting.md` to describe role-based lookup instead of hardcoded strings. User-facing `docs/docs/advanced/intrinsics.md` and examples under `docs/examples/intrinsics/` are breaking-API touched. New dev doc for adapter observability. Update AGENTS.md §13 once normalised post-parse shapes are stable. + +**Tests** — existing intrinsic tests stay green per phase. New tests cover: each binding × each verb (unit); integration matrix `{HF, OpenAI} × {applicable bindings} × {lora, alora where applicable} × {every existing intrinsic}`; per-version parse round-trips (with `requirement-check` v1 / v2 as the worked case); concurrency window correctness; span/metric emission assertions. + +**Tutorials** — three worth writing alongside the refactor: +- "Adding a custom intrinsic in 20 lines" — replaces the `CustomIntrinsicAdapter` monkey-patch story. +- "Shipping a new schema version without breaking users" — worked example using `requirement-check` v1 → v2. +- "Reading intrinsic telemetry" — short dashboard-building guide. + +**Release notes** separate: no-op for high-level helper users; deprecated-but-shimmed for direct adapter constructors; removed at Phase 4 (see below). + +## 12. Migration (rough shape only) + +Detail deferred until Part I §5 decisions land, but the intended phasing is: + +1. **Phase 0 — parallel types.** Introduce `Adapter` / `WeightsBinding` / `IOContract` alongside existing classes. No call-site changes, tests unchanged. +2. **Phase 1 — callers move.** `_util.call_intrinsic`, requirement rerouting, and each helper switch to new types. Old classes become deprecation shims. +3. **Phase 2 — backends move.** `AdapterMixin` narrows to the new verb set. Backends drop per-call `_simplify_and_merge` in favour of `resolve_model_options`. +4. **Phase 3 — Reality C lands.** `ServerMediatedBinding` subclass(es) written; OpenAI backend drops `_uses_embedded_adapters` hard-code. +5. **Phase 4 — shim removal.** After one minor release with deprecation warnings. + +Observability and docs deliverables attach to the phase that first exercises them. + +## 13. Open questions (full list) + +1. **Naming.** `WeightsBinding` vs `ResourceStrategy` vs `AdapterProvider`. Pick one; the term leaks into error messages. +2. **Lifecycle default** — session-scoped or request-scoped (also in Part I §5). +3. **Role vs name.** Free-form `role` string, or a small enum so users can't invent roles backends don't honour? +4. **Reality C idiom.** vLLM LoRA serving first or commercial fine-tunes first (also in Part I §5). +5. **Rewind interaction (PR #1028).** `factuality_detection` / `factuality_correction` mutate context via `context.previous_node`. Belongs on `io_contract.build_prompt` (cleaner) or stay in the helper (smaller migration blast radius)? +6. **Telemetry coupling with #1035** (also in Part I §5). +7. **Deprecation window** (also in Part I §5). + +--- + +_Verified against: `mellea/backends/adapters/{adapter,catalog,__init__}.py`, `mellea/stdlib/components/intrinsic/{_util,intrinsic,core,rag,guardian}.py`, `mellea/backends/{openai,huggingface}.py`, `mellea/formatters/granite/intrinsics/input.py`, `mellea/stdlib/requirements/requirement.py`, `docs/dev/{intrinsics_and_adapters,requirement_aLoRA_rerouting}.md`, PRs #543 / #881 / #986 / #994 / #1003 / #1008 / #1028 / #1036, commits `666d646a`, `8b6b8d55`, `c57aba1d`, `8577d092`._ From bdb8a81f28566332b71c0b7349c35910cc37ff6c Mon Sep 17 00:00:00 2001 From: Nigel Jones Date: Thu, 7 May 2026 16:05:50 +0100 Subject: [PATCH 02/29] docs: add diagrams to adapter design proposal (#929) - ASCII side-by-side of the three "where weights live" realities - Mermaid sequence diagram for the adapter_scope lifecycle - Mermaid tree for the intrinsic.* span hierarchy No content change; visuals make the key shape-agreement points scannable. Assisted-by: Claude Code Signed-off-by: Nigel Jones --- docs/dev/issue-929-adapter-design-proposal.md | 67 ++++++++++++++++++- 1 file changed, 66 insertions(+), 1 deletion(-) diff --git a/docs/dev/issue-929-adapter-design-proposal.md b/docs/dev/issue-929-adapter-design-proposal.md index f1830a412..13bcbea73 100644 --- a/docs/dev/issue-929-adapter-design-proposal.md +++ b/docs/dev/issue-929-adapter-design-proposal.md @@ -68,6 +68,28 @@ The **weights binding** is where the three realities live. It exposes a single v | `EmbeddedBinding` (Reality B) | No-op (weights baked in) | Render `controls` field into chat template | Drop the `controls` field | | `ServerMediatedBinding` (Reality C) | No-op (or push weights, depending on sub-case) | Set adapter identifier on API request | Unset identifier | +Visually, the three realities differ only in where the weights are and how the backend reaches them; the I/O contract is shared: + +``` + Reality A: Local PEFT Reality B: Embedded Reality C: Server-mediated + ───────────────────── ──────────────────── ────────────────────────── + + ┌──────────┐ ┌──────────┐ + │ HF repo │ (weights baked │ server │ + └────┬─────┘ into base model) └────┬─────┘ + │ download │ + ▼ │ + ┌──────────┐ │ + │ cache │ │ + └────┬─────┘ │ + │ PEFT load render `controls` │ adapter_id + ▼ into chat template │ in request + ┌──────────┐ ┌──────────┐ ┌─────▼────┐ + │base+LoRA │ │base model│ │base model│ + │ (local) │ │ (Switch) │ │ (remote) │ + └──────────┘ └──────────┘ └──────────┘ +``` + `call_intrinsic` becomes one flow, no branches on backend type: ``` @@ -77,6 +99,37 @@ with backend.adapter_scope(adapter): return adapter.io_contract.parse(raw) ``` +The lifecycle inside `adapter_scope` is the same for every binding — only the verbs do reality-specific work: + +```mermaid +sequenceDiagram + participant C as Caller + participant B as Backend + participant A as Adapter + participant W as WeightsBinding + participant M as Base Model + + C->>B: check_answerability(...) + B->>A: resolve_adapter(name) + + rect rgb(245, 245, 245) + Note over B,W: adapter_scope(adapter) + B->>W: prepare() + W-->>M: download / no-op + B->>W: activate() + W-->>M: load / render controls / set adapter_id + B->>A: io_contract.build_prompt(...) + B->>M: generate(prompt) + M-->>B: raw output + B->>A: io_contract.parse(raw) + A-->>B: normalised result + B->>W: deactivate() + W-->>M: unload / drop controls / unset + end + + B-->>C: score +``` + From this shape, the seven threads of #929 resolve cleanly. Full mapping is in Part II §8. **What users see:** high-level helpers (`check_answerability` etc.) keep their current shape, with the `model_options=` addition that PR #1003 is introducing. Manual adapter construction collapses from four classes to one, with the binding as the pluggable part. Custom intrinsics no longer require monkey-patching the catalog. Detail in Part II §9. @@ -192,7 +245,19 @@ adapter = Adapter(name="answerability", Intrinsic calls have no bespoke instrumentation today. Folding it into the redesign costs one span attribute per verb; retrofitting means re-editing every binding later. -**Spans** — each `adapter_scope` wraps a child span tree rooted at `intrinsic.call`, with children `intrinsic.prepare`, `intrinsic.activate`, `intrinsic.generate`, `intrinsic.parse`, `intrinsic.deactivate`. Standard attributes: `intrinsic.name`, `intrinsic.version`, `intrinsic.role`, `intrinsic.adapter_type`, `intrinsic.binding_type`, `intrinsic.source`, `intrinsic.target`. Errors set OTel `ERROR` status (aligns with #1035 gap 4). +**Spans** — each `adapter_scope` wraps a child span tree rooted at `intrinsic.call`: + +```mermaid +graph TD + root["intrinsic.call
name, version, role,
binding_type, adapter_type
"] + root --> prep["intrinsic.prepare
LocalFile: download ms"] + root --> act["intrinsic.activate
peft_name / controls / api_id"] + root --> gen["intrinsic.generate
(regular backend span:
tokens, latency)
"] + root --> par["intrinsic.parse
schema_version,
parse_ok, raw_len
"] + root --> deact["intrinsic.deactivate"] +``` + +Standard attributes: `intrinsic.name`, `intrinsic.version`, `intrinsic.role`, `intrinsic.adapter_type`, `intrinsic.binding_type`, `intrinsic.source`, `intrinsic.target`. Errors set OTel `ERROR` status (aligns with #1035 gap 4). **Metrics** — an `IntrinsicMetricsPlugin` alongside the existing Token / Latency / Error plugins: - `mellea.intrinsic.invocations` — counter labelled by name, version, binding type, adapter type, outcome. From 333610b8d7b2be3540860c394130fd56fb7d691d Mon Sep 17 00:00:00 2001 From: Nigel Jones Date: Thu, 7 May 2026 16:08:19 +0100 Subject: [PATCH 03/29] docs: replace misaligned ASCII reality diagram with Mermaid (#929) The ASCII three-realities comparison had a one-column drift on the Reality C top border. Mermaid flowchart with three subgraphs makes the layout self-aligning and more readable. Assisted-by: Claude Code Signed-off-by: Nigel Jones --- docs/dev/issue-929-adapter-design-proposal.md | 33 +++++++++---------- 1 file changed, 15 insertions(+), 18 deletions(-) diff --git a/docs/dev/issue-929-adapter-design-proposal.md b/docs/dev/issue-929-adapter-design-proposal.md index 13bcbea73..485d4389a 100644 --- a/docs/dev/issue-929-adapter-design-proposal.md +++ b/docs/dev/issue-929-adapter-design-proposal.md @@ -70,24 +70,21 @@ The **weights binding** is where the three realities live. It exposes a single v Visually, the three realities differ only in where the weights are and how the backend reaches them; the I/O contract is shared: -``` - Reality A: Local PEFT Reality B: Embedded Reality C: Server-mediated - ───────────────────── ──────────────────── ────────────────────────── - - ┌──────────┐ ┌──────────┐ - │ HF repo │ (weights baked │ server │ - └────┬─────┘ into base model) └────┬─────┘ - │ download │ - ▼ │ - ┌──────────┐ │ - │ cache │ │ - └────┬─────┘ │ - │ PEFT load render `controls` │ adapter_id - ▼ into chat template │ in request - ┌──────────┐ ┌──────────┐ ┌─────▼────┐ - │base+LoRA │ │base model│ │base model│ - │ (local) │ │ (Switch) │ │ (remote) │ - └──────────┘ └──────────┘ └──────────┘ +```mermaid +flowchart LR + subgraph A["Reality A — Local PEFT"] + direction TB + A1["HF repo"] -->|"download"| A2["local cache"] + A2 -->|"PEFT load"| A3["base model
+ LoRA"] + end + subgraph B["Reality B — Embedded (Granite Switch)"] + direction TB + B1["base model
(weights baked in)"] -->|"render controls
in chat template"| B2["base model
(activated)"] + end + subgraph C["Reality C — Server-mediated"] + direction TB + C1["remote server
(holds weights)"] -->|"adapter_id
in API request"| C2["base model
(remote)"] + end ``` `call_intrinsic` becomes one flow, no branches on backend type: From 959ef50edca9880037abe82cf6be7f8aecebbd8f Mon Sep 17 00:00:00 2001 From: Nigel Jones Date: Thu, 7 May 2026 16:11:49 +0100 Subject: [PATCH 04/29] docs: replace 'lands' phrasing with specific verbs (#929) Reword seven instances of "lands" / "landing" to more precise alternatives (merges, ships, is agreed, is added, include). No semantic change. Assisted-by: Claude Code Signed-off-by: Nigel Jones --- docs/dev/issue-929-adapter-design-proposal.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/docs/dev/issue-929-adapter-design-proposal.md b/docs/dev/issue-929-adapter-design-proposal.md index 485d4389a..b4b8d3bee 100644 --- a/docs/dev/issue-929-adapter-design-proposal.md +++ b/docs/dev/issue-929-adapter-design-proposal.md @@ -2,7 +2,7 @@ > Status: proposal for shape agreement. > Addresses Epic #929. -> Structure: **Part I** is for agreeing the problem, goals, terminology, and end state. **Part II** contains the supporting detail — read after Part I lands, not before. +> Structure: **Part I** is for agreeing the problem, goals, terminology, and end state. **Part II** contains the supporting detail — read after Part I is agreed, not before. --- @@ -135,19 +135,19 @@ From this shape, the seven threads of #929 resolve cleanly. Full mapping is in P ## 5. Decisions needed now -These gate decomposition; everything else can live in sub-issues once these land. +These gate decomposition; everything else can live in sub-issues once these are agreed. 1. **Does the end-state shape (§4) hold?** Three realities, `Adapter = identity + io_contract + weights`, role-based lookup for rerouting. Yes / no / what's missing. 2. **Adapter lifecycle default — session-scoped or request-scoped?** Today's HF backend keeps adapters loaded once added; request-scoped load/unload is safer for multi-tenancy but costs latency on a 7B base. 3. **OpenAI Reality C — which concrete shape first?** vLLM-backed LoRA serving (client-known weight file, server-side load) or commercial fine-tunes (fully hosted)? The binding covers both; the first subclass sets the idiom. -4. **Telemetry coupling with #1035.** Land intrinsic spans as part of this refactor, or as a follow-on to PR #1036's Gap 5? Coupling avoids designing content capture twice; decoupling keeps #1036 moving. +4. **Telemetry coupling with #1035.** Include intrinsic spans as part of this refactor, or as a follow-on to PR #1036's Gap 5? Coupling avoids designing content capture twice; decoupling keeps #1036 moving. 5. **Deprecation window.** How long do `IntrinsicAdapter` / `EmbeddedIntrinsicAdapter` / `CustomIntrinsicAdapter` stay as shims before removal? One minor release is the default; confirm. --- # Part II — Supporting detail -> For deeper review once Part I lands. Skim headings first. +> For deeper review once Part I is agreed. Skim headings first. ## 6. Three realities of "where the weights live" @@ -210,7 +210,7 @@ Three problems: ## 9. What users see — detailed -**High-level helpers** keep their signatures. The `model_options=` parameter lands via PR #1003: +**High-level helpers** keep their signatures. The `model_options=` parameter is added via PR #1003: ```python score = check_answerability(question, documents, context, backend) @@ -280,12 +280,12 @@ First-class deliverables, not afterthoughts. ## 12. Migration (rough shape only) -Detail deferred until Part I §5 decisions land, but the intended phasing is: +Detail deferred until Part I §5 decisions are agreed, but the intended phasing is: 1. **Phase 0 — parallel types.** Introduce `Adapter` / `WeightsBinding` / `IOContract` alongside existing classes. No call-site changes, tests unchanged. 2. **Phase 1 — callers move.** `_util.call_intrinsic`, requirement rerouting, and each helper switch to new types. Old classes become deprecation shims. 3. **Phase 2 — backends move.** `AdapterMixin` narrows to the new verb set. Backends drop per-call `_simplify_and_merge` in favour of `resolve_model_options`. -4. **Phase 3 — Reality C lands.** `ServerMediatedBinding` subclass(es) written; OpenAI backend drops `_uses_embedded_adapters` hard-code. +4. **Phase 3 — Reality C ships.** `ServerMediatedBinding` subclass(es) written; OpenAI backend drops `_uses_embedded_adapters` hard-code. 5. **Phase 4 — shim removal.** After one minor release with deprecation warnings. Observability and docs deliverables attach to the phase that first exercises them. From 131ae2240dc80abe1e1ff575b24d57d631158c40 Mon Sep 17 00:00:00 2001 From: Nigel Jones Date: Thu, 7 May 2026 16:27:06 +0100 Subject: [PATCH 05/29] docs: address first round of review comments (#929) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Title and prose shift to "adapter" as primary term; "intrinsic" noted as legacy vocabulary with a rename-scope decision in Part I §5 Q6. - Reality B clarified: weights ship in the same HF repo as the base model (not "baked in"); both LoRA and aLoRA supported via the technology field on each embedded adapter. - Reality C reframed: OpenAI-compatible support is already kept for embedded (PR #881); the gap is non-embedded server-mediated, which may or may not re-land in 0.6.0. - Add §10.1 "Why adapters need bespoke observability" with four concrete failure modes, two already seen in production. - Goals elevate customer-built adapters as a first-class target. - Tests section adds an optional per-adapter qualitative effectiveness suite (opt-in via @pytest.mark.qualitative). Assisted-by: Claude Code Signed-off-by: Nigel Jones --- docs/dev/issue-929-adapter-design-proposal.md | 65 +++++++++++++------ 1 file changed, 44 insertions(+), 21 deletions(-) diff --git a/docs/dev/issue-929-adapter-design-proposal.md b/docs/dev/issue-929-adapter-design-proposal.md index b4b8d3bee..551d8a0b9 100644 --- a/docs/dev/issue-929-adapter-design-proposal.md +++ b/docs/dev/issue-929-adapter-design-proposal.md @@ -1,8 +1,9 @@ -# Intrinsic Adapter Lifecycle — Design Proposal +# Adapter Lifecycle — Design Proposal > Status: proposal for shape agreement. > Addresses Epic #929. > Structure: **Part I** is for agreeing the problem, goals, terminology, and end state. **Part II** contains the supporting detail — read after Part I is agreed, not before. +> Terminology note: this proposal uses **"adapter"** as the primary term for the thing users add to a backend to gain a specialised capability. "Intrinsic" appears as the legacy name where it still refers to existing Mellea classes (e.g. the `Intrinsic` AST component). The rename strategy is called out in §5. --- @@ -22,12 +23,12 @@ Every thread in #929 is a symptom of not having separated the kinds of adapter a ## 2. What we are trying to achieve -- **One coherent mental model** of what an adapter is, so users and contributors can reason about intrinsic behaviour without reading the implementation. -- **One code path** through `call_intrinsic` that works regardless of whether the adapter's weights are local, baked into the base model, or hosted on a server. +- **One coherent mental model** of what an adapter is, so users and contributors can reason about behaviour without reading the implementation. +- **One code path** through adapter invocation that works regardless of whether the adapter's weights are local, shipped with the base model, or served by a remote server. - **Correct, documented model-option precedence** that does not silently overwrite caller intent. - **Schema-version safety** so adapters can evolve their output format without breaking callers, and so an adapter whose schema drifts is visible rather than silent. -- **First-class custom adapters** — users can ship their own without monkey-patching a global registry. -- **Observable intrinsic calls** so failures during download, activation, or parsing are diagnosable on first run, not after ad-hoc `print` debugging. +- **First-class customer adapters** — customers can build and ship their own adapters against the same API as first-party ones, without monkey-patching a global registry and without privileged access to internal APIs. +- **Observable adapter calls** — every phase (download, activation, generation, parse, deactivation) is a distinct span with standard attributes; a per-adapter metrics plugin makes failures visible in dashboards without bespoke instrumentation. - **Parity, not breakage** — high-level helpers (`check_answerability` etc.) keep their shape; manual adapter construction becomes simpler, not harder. ## 3. Terminology @@ -37,16 +38,16 @@ Names matter in this design because they appear in user-facing error messages, d | Term | Meaning | | --- | --- | | **Base model** | The general-purpose LLM (e.g. `ibm-granite/granite-4.1-3b`) that everything runs on top of. | -| **Intrinsic** | A specialised capability — answerability, citations, requirement-check, and so on — invoked via a high-level helper or the `Intrinsic` AST component. | -| **Adapter** | The artefact that implements an intrinsic on top of a base model. In the redesign, `Adapter` is one class composed of three parts (identity, I/O contract, weights binding). | +| **Adapter** | The user-facing term for a specialised capability added to a base model — answerability, citations, requirement-check, etc. In the redesign, `Adapter` is one class composed of three parts (identity, I/O contract, weights binding). This is the primary noun users and docs should reach for. | +| **Intrinsic** | Legacy Mellea term for the same concept. Still appears in the current class names (`Intrinsic` AST component, `IntrinsicAdapter`, `mellea.stdlib.components.intrinsic` module). The direction of travel is to fold "intrinsic" language into "adapter" — the rename scope is a decision in §5. | | **Identity** | The part of an adapter that says *what it is*: name (e.g. `answerability`), adapter type (`lora` / `alora`), schema version, and optional role. | | **I/O contract** | The parsed `io.yaml` — prompt template, output parser, model-option defaults. Always present, same shape regardless of reality. | -| **Weights binding** | The part of an adapter that says *how its weights are made available*. Three subclasses, one per reality (see below). Exposes `prepare`, `activate`, `deactivate`, `release`. | -| **Reality A / B / C** | Shorthand for the three "where the weights live" stories: A = local PEFT file, B = baked into the base model (Granite Switch), C = server-mediated (future OpenAI/vLLM). | -| **LoRA / aLoRA** | Two PEFT technologies. LoRA weights always participate; aLoRA only participates after an activation token is seen. Both are adapter types that a given intrinsic may ship as. | +| **Weights binding** | The part of an adapter that says *how its weights are made available*. Three subclasses, one per reality. Exposes `prepare`, `activate`, `deactivate`, `release`. | +| **Reality A / B / C** | Shorthand for the three "where the weights live" stories: A = local PEFT file, B = shipped with the base model (Granite Switch), C = server-mediated (future OpenAI/vLLM). | +| **LoRA / aLoRA** | Two PEFT technologies. LoRA weights always participate; aLoRA only participates after an activation token is seen. A single adapter ships as one or the other (some intrinsics as either); both are supported across all three realities (including embedded — Granite Switch has LoRA and aLoRA adapters in the same repo, `technology` field on each). | | **Role** | A *semantic* label on an adapter distinct from its name — e.g. `requirement_check`, `context_attribution`. Used by callers (the `Requirement` rerouting path) to find "the adapter that plays this role" without hardcoding a name string. | | **Qualified name** | Today's disambiguator: `_`. In the redesign, derived on demand from `identity` rather than stored as a field. | -| **Catalog** | The registry of known intrinsics at `mellea/backends/adapters/catalog.py`. Becomes optional and advisory rather than mandatory and monkey-patched. | +| **Catalog** | The registry of known adapters at `mellea/backends/adapters/catalog.py`. Becomes optional and advisory rather than mandatory and monkey-patched. | | **`io.yaml`** | The YAML file that declares an adapter's input template, output schema, and generation parameters. Lives in the adapter's HuggingFace repo. | ## 4. Rough end result @@ -142,6 +143,11 @@ These gate decomposition; everything else can live in sub-issues once these are 3. **OpenAI Reality C — which concrete shape first?** vLLM-backed LoRA serving (client-known weight file, server-side load) or commercial fine-tunes (fully hosted)? The binding covers both; the first subclass sets the idiom. 4. **Telemetry coupling with #1035.** Include intrinsic spans as part of this refactor, or as a follow-on to PR #1036's Gap 5? Coupling avoids designing content capture twice; decoupling keeps #1036 moving. 5. **Deprecation window.** How long do `IntrinsicAdapter` / `EmbeddedIntrinsicAdapter` / `CustomIntrinsicAdapter` stay as shims before removal? One minor release is the default; confirm. +6. **Terminology rename scope.** Three levels of commitment to the "adapter over intrinsic" shift: + a. **Prose only** (docs, error messages, help text). Zero breakage. Recommended unconditionally. + b. **Module rename**: `mellea.stdlib.components.intrinsic` → `mellea.stdlib.components.adapter`, with the old path re-exported for one release. Breaking for anyone importing from the submodule path. + c. **AST class rename**: `Intrinsic` → something like `AdapterCall`, with `Intrinsic` as an alias for one release. Breaking for advanced users calling `mfuncs.act(Intrinsic(...))` directly. + Confirm how deep to go. --- @@ -160,19 +166,23 @@ These gate decomposition; everything else can live in sub-issues once these are ### 6.2 Reality B — Embedded adapter (today's `EmbeddedIntrinsicAdapter`, used by Granite Switch) -- Weights were **baked into the base model during training**. No separate file to download. -- Activation is done by rendering a `controls` field (structured JSON) into the chat template. The Jinja template places it either at the beginning of the sequence (LoRA technology) or before the generation prompt (aLoRA technology). The model itself routes the request to the right baked-in weights. -- You still need the `io.yaml` for input/output formatting — that's the only artefact the client needs. -- **Pre-installed weights, prompt-level activation, no download lifecycle.** +- Adapter weights **ship in the same HuggingFace repo as the base model**. They come down with the base-model snapshot and are not fetched separately — confirmed by the fact that `EmbeddedIntrinsicAdapter.from_hub` downloads only `adapter_index.json` + `io_configs/**`, not weight files. The phrase "baked into the base model" is a useful shorthand but imprecise: the weights are still distinct PEFT modules, just co-located and pre-loaded by the inference runtime. +- **Both LoRA and aLoRA are supported.** `adapter_index.json` lists each embedded adapter with a `technology` field (`"lora"` or `"alora"`). The chat template uses that field to place the `controls` JSON at the correct position — beginning of sequence for LoRA, before the generation prompt for aLoRA — so the right adapter is active for the right span of tokens. Granite Switch therefore genuinely carries both technologies; it is not a LoRA-only reality. +- On the client side, only `io.yaml` is needed to format inputs and parse outputs. +- **Pre-installed weights, prompt-level activation, no separate download lifecycle.** -### 6.3 Reality C — Server-mediated adapter (today's gap, #929 point 5) +### 6.3 Reality C — Server-mediated adapter (partially gap today, #929 point 5) -Two plausible sub-cases; design must accommodate both. +The OpenAI-compatible backend **already supports adapters** — but only embedded ones (Granite Switch via Reality B, added in PR #881). The gap this reality addresses is *non-embedded* server-side adapters: the path that PR #543 removed when vLLM dropped aLoRA support and that #929 point 5 describes as "requires discussion." -- **C1 — Client-pulled, server-activated**: weights exist as a file on the client side (or somewhere pullable), but activation happens on a remote inference server (e.g. vLLM loads them and exposes them via a LoRA ID or per-request model alias). The client sends either `model=` or a dedicated LoRA field in the API request. PR #543 removed this path because vLLM dropped aLoRA support; #929 point 5 anticipates its return as "a much different implementation." This is the likely near-term shape. +Whether we actively re-enable this path in 0.6.0 is a design decision (see §5 Q3). The shape is worth naming now so the binding abstraction accommodates it rather than having to be re-opened later. Two plausible sub-cases: + +- **C1 — Client-pulled, server-activated**: weights exist as a file on the client side (or somewhere pullable), but activation happens on a remote inference server (e.g. vLLM loads them and exposes them via a LoRA ID or per-request model alias). The client sends either `model=` or a dedicated LoRA field in the API request. - **C2 — Provider-hosted**: weights live entirely on the provider's infrastructure. The client only ever passes an identifier. Applies to commercial fine-tunes behind OpenAI, Azure, etc. -Both share: **no local weight loading, API-parameter activation, `io.yaml` still required client-side.** +Both share: **no local weight loading, API-parameter activation, `io.yaml` still required client-side.** The first concrete `ServerMediatedBinding` subclass sets the idiom for the API shape. + +**Intent summary for OpenAI-compatible support:** keep and extend. Embedded support stays. The design leaves a clean slot for non-embedded when we decide to re-add it. ## 7. Why the current code is tangled (concrete example) @@ -240,7 +250,18 @@ adapter = Adapter(name="answerability", ## 10. Observability -Intrinsic calls have no bespoke instrumentation today. Folding it into the redesign costs one span attribute per verb; retrofitting means re-editing every binding later. +### 10.1 Why adapters need bespoke observability + +Adapter calls hide the complexity that matters most when something goes wrong (weight fetching, activation side-effects, schema contracts). Without per-phase instrumentation, four failure modes are hard or impossible to diagnose — and Mellea has already hit the first two in production: + +1. **Masked errors.** The `obtain_lora`-always-called bug (#929 point 1b) showed users a misleading download error while the real cause (adapter-type mismatch) stayed invisible. A span at the `prepare` boundary recording the exception would have surfaced the actual cause on first run. +2. **Silent schema drift.** When PR #1008 changed `requirement-check` output from `{"requirement_likelihood": 0.9}` to `{"requirement_check": {"score": 0.9}}`, `requirement_check_to_bool` silently returned `False` for every call until someone noticed. A `parse_failures` counter labelled by `(name, version)` would have climbed immediately; a parse-status span attribute would have shown every call as "parsed with warnings." +3. **Latency attribution.** "`check_answerability` is slow" is unanswerable today — download, PEFT load, generation, and JSON parse collapse into one backend span. Phase-level spans make the culprit obvious in any trace viewer. +4. **Alerting and cost attribution.** OTel `ERROR` status on failed download/activation makes generic dashboards and alerts work. Token counts labelled by adapter answer "which capability is 30% of our spend?" Both impossible today. + +Adding instrumentation now costs one span attribute per verb. Retrofitting after the refactor means re-editing every binding. And during a refactor this wide, the fastest way to spot a regression in a specific reality is a dashboard, not a bug report. + +### 10.2 Spans and metrics **Spans** — each `adapter_scope` wraps a child span tree rooted at `intrinsic.call`: @@ -269,7 +290,9 @@ First-class deliverables, not afterthoughts. **Docs** — rewrite (not edit) for `docs/dev/intrinsics_and_adapters.md` (39 lines describing classes that get renamed). Update `docs/dev/requirement_aLoRA_rerouting.md` to describe role-based lookup instead of hardcoded strings. User-facing `docs/docs/advanced/intrinsics.md` and examples under `docs/examples/intrinsics/` are breaking-API touched. New dev doc for adapter observability. Update AGENTS.md §13 once normalised post-parse shapes are stable. -**Tests** — existing intrinsic tests stay green per phase. New tests cover: each binding × each verb (unit); integration matrix `{HF, OpenAI} × {applicable bindings} × {lora, alora where applicable} × {every existing intrinsic}`; per-version parse round-trips (with `requirement-check` v1 / v2 as the worked case); concurrency window correctness; span/metric emission assertions. +**Tests** — existing adapter tests stay green per phase. New tests cover: each binding × each verb (unit); integration matrix `{HF, OpenAI} × {applicable bindings} × {lora, alora where applicable} × {every existing adapter}`; per-version parse round-trips (with `requirement-check` v1 / v2 as the worked case); concurrency window correctness; span/metric emission assertions. + +**Qualitative effectiveness suite (optional, per-adapter).** The tests above verify plumbing. They do *not* answer "does the answerability adapter actually judge answerability correctly?" A per-adapter qualitative suite (`@pytest.mark.qualitative`, opt-in, kept out of the fast loop) takes a small canonical dataset per adapter and asserts an accuracy floor on its outputs — the same kind of eval a user would run before deploying. Kept cheap (tens of examples, not hundreds) so it fits in a reasonable CI budget. Without this, a refactor can pass every structural test while silently degrading the behaviour users care about. **Tutorials** — three worth writing alongside the refactor: - "Adding a custom intrinsic in 20 lines" — replaces the `CustomIntrinsicAdapter` monkey-patch story. From 15e7bf609e316cf51cb83a473f0d131a998aa26c Mon Sep 17 00:00:00 2001 From: Nigel Jones Date: Thu, 7 May 2026 16:28:31 +0100 Subject: [PATCH 06/29] docs: cite TestBasedEval and BenchDrift for qualitative tests (#929) Replace the generic "qualitative effectiveness suite" description with concrete tools already available: - TestBasedEval (in-tree Mellea component, JSON + LLM-as-judge, has CLI via `m eval run`) as the default per-adapter mechanism. - BenchDrift (IBM/BenchDrift) as an optional variation-testing extension for adapters where phrasing-invariance matters (answerability, context-relevance, requirement-check, Guardian). Assisted-by: Claude Code Signed-off-by: Nigel Jones --- docs/dev/issue-929-adapter-design-proposal.md | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/docs/dev/issue-929-adapter-design-proposal.md b/docs/dev/issue-929-adapter-design-proposal.md index 551d8a0b9..b22e67d18 100644 --- a/docs/dev/issue-929-adapter-design-proposal.md +++ b/docs/dev/issue-929-adapter-design-proposal.md @@ -292,7 +292,14 @@ First-class deliverables, not afterthoughts. **Tests** — existing adapter tests stay green per phase. New tests cover: each binding × each verb (unit); integration matrix `{HF, OpenAI} × {applicable bindings} × {lora, alora where applicable} × {every existing adapter}`; per-version parse round-trips (with `requirement-check` v1 / v2 as the worked case); concurrency window correctness; span/metric emission assertions. -**Qualitative effectiveness suite (optional, per-adapter).** The tests above verify plumbing. They do *not* answer "does the answerability adapter actually judge answerability correctly?" A per-adapter qualitative suite (`@pytest.mark.qualitative`, opt-in, kept out of the fast loop) takes a small canonical dataset per adapter and asserts an accuracy floor on its outputs — the same kind of eval a user would run before deploying. Kept cheap (tens of examples, not hundreds) so it fits in a reasonable CI budget. Without this, a refactor can pass every structural test while silently degrading the behaviour users care about. +**Qualitative effectiveness suite (optional, per-adapter).** The tests above verify plumbing. They do *not* answer "does the answerability adapter actually judge answerability correctly?" A per-adapter qualitative suite (`@pytest.mark.qualitative`, opt-in, kept out of the fast loop) takes a small canonical dataset per adapter and asserts an accuracy floor on its outputs. Without this, a refactor can pass every structural test while silently degrading the behaviour users care about. + +Two existing tools fit naturally here and should be preferred over ad-hoc harnesses: + +- **`TestBasedEval`** (in-tree — `mellea/templates/prompts/default/TestBasedEval.jinja2`, documented at `docs.mellea.ai/how-to/unit-test-generative-code`) is Mellea's own LLM-as-judge component. Each adapter gets a JSON file of test cases (`{input, target, guidelines}`); a judge model returns `{"score": 0|1, "justification": "..."}`. Runnable from the CLI (`m eval run tests/eval_data/.json --backend ollama --model granite4.1:3b`) so the same fixtures power both CI and interactive debugging. This is the default mechanism for per-adapter qualitative coverage. +- **BenchDrift** (`github.com/IBM/BenchDrift`) addresses a second failure mode: an adapter that works on its canonical phrasing but breaks on semantically-equivalent rephrasings. BenchDrift generates syntactic variations of each test case while preserving meaning, then scores consistency across variations. Worth applying to the adapters where phrasing-invariance is a real concern — answerability, context-relevance, requirement-check, and the Guardian family all qualify. Optional extension rather than baseline coverage, but enabling it per-adapter is a one-config-file step once the `TestBasedEval` fixtures exist. + +Kept cheap (tens of test cases per adapter, not hundreds) so qualitative runs fit in a reasonable nightly-CI budget. **Tutorials** — three worth writing alongside the refactor: - "Adding a custom intrinsic in 20 lines" — replaces the `CustomIntrinsicAdapter` monkey-patch story. From 38b81126751c9b80c913cf2af7e4796d9b41b8cd Mon Sep 17 00:00:00 2001 From: Nigel Jones Date: Fri, 8 May 2026 10:20:15 +0100 Subject: [PATCH 07/29] docs: second round review edits (#929) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Prominent linked header to Epic #929 + reference table of all related issues/PRs at the top of the doc so context is immediate on share. - §1: add evidence table of seven recent fix-up commits in the adapter area to demonstrate the friction is real. - §2: rewrite customer-adapter goal to distinguish current partial state (m alora train exists; consuming needs hacks; #424 open) from the refactor's first-class intent. - §4.1: new backend × reality matrix showing Reality A/B/C support today and planned (with #1018 for HF + embedded, #27 for vLLM). - New §6 "Impact and blast radius": API surface, user-archetype impact table, code reach, release planning, blocking/unblocking, performance notes, risk register. - §6.3: fix vLLM/aLoRA factual error — upstream vLLM never had aLoRA; we ran a custom vLLM build; PR #543 was triggered by upstream declining the aLoRA PR, not by vLLM "dropping" support. Cite #27 as the live tracking item. - §5 reviewer questions: drop telemetry coupling as a reviewer question (move to implementation note); reframe Reality C question around #27 upstream status rather than C1-vs-C2 choice. - §11: acknowledge TestBasedEval and BenchDrift are part of the broader LLM-unit-testing conversation, not invented for this refactor. Assisted-by: Claude Code Signed-off-by: Nigel Jones --- docs/dev/issue-929-adapter-design-proposal.md | 139 ++++++++++++++++-- 1 file changed, 123 insertions(+), 16 deletions(-) diff --git a/docs/dev/issue-929-adapter-design-proposal.md b/docs/dev/issue-929-adapter-design-proposal.md index b22e67d18..f6b9d1c01 100644 --- a/docs/dev/issue-929-adapter-design-proposal.md +++ b/docs/dev/issue-929-adapter-design-proposal.md @@ -1,9 +1,33 @@ # Adapter Lifecycle — Design Proposal -> Status: proposal for shape agreement. -> Addresses Epic #929. -> Structure: **Part I** is for agreeing the problem, goals, terminology, and end state. **Part II** contains the supporting detail — read after Part I is agreed, not before. -> Terminology note: this proposal uses **"adapter"** as the primary term for the thing users add to a backend to gain a specialised capability. "Intrinsic" appears as the legacy name where it still refers to existing Mellea classes (e.g. the `Intrinsic` AST component). The rename strategy is called out in §5. +> **Addresses:** [Epic #929 — Fix Intrinsic Adapter Lifecycle & Consistency in Mellea](https://github.com/generative-computing/mellea/issues/929). Read the epic first if you haven't — it catalogues the specific threads this proposal tries to resolve coherently rather than individually. +> **Status:** proposal for shape agreement; not a PR candidate. Preserved on a branch for review; once agreed, the content moves into `docs/dev/intrinsics_and_adapters.md` as the current-state doc and this file is deleted. +> **Structure:** **Part I** covers problem, goals, terminology, end state, and the decisions that gate decomposition. **Part II** contains supporting detail — read after Part I is agreed, not before. +> **Terminology:** this proposal uses **"adapter"** as the primary user-facing term; "intrinsic" appears only as the legacy name where it still refers to existing Mellea classes (e.g. the `Intrinsic` AST component). Rename strategy is in §5 Q6. + +## Referenced issues and PRs + +| Ref | Title | Why cited | +| --- | --- | --- | +| [Epic #929](https://github.com/generative-computing/mellea/issues/929) | Fix Intrinsic Adapter Lifecycle & Consistency in Mellea | *the epic this proposal addresses* | +| [#27](https://github.com/generative-computing/mellea/issues/27) | Add support for aloras to remote vllm when vllm supports it | live tracking item for Reality C | +| [#423](https://github.com/generative-computing/mellea/issues/423) | Adapter code is undocumented and over-specialized to Intrinsics | Priority-labelled; overlaps this refactor | +| [#424](https://github.com/generative-computing/mellea/issues/424) | Cannot use intrinsics without uploading them | customer-adapter friction | +| [#543](https://github.com/generative-computing/mellea/pull/543) | revert: remove adapters/intrinsics/alora/lora from openai code | why OpenAI backend lost adapter support | +| [#881](https://github.com/generative-computing/mellea/pull/881) | feat: add embedded adapters (granite switch) to openai backend | why OpenAI backend got Reality B back | +| [#946](https://github.com/generative-computing/mellea/pull/946) | feat: simplify intrinsics | rework evidence | +| [#972](https://github.com/generative-computing/mellea/pull/972) | fix: model options with intrinsics | rework evidence for #929 point 2 | +| [#979](https://github.com/generative-computing/mellea/pull/979) | fix: key in json returned by policy_guardrails intrinsic | rework evidence for output parsing | +| [#986](https://github.com/generative-computing/mellea/pull/986) | fix: issues introduced by intrinsic changes | rework evidence | +| [#994](https://github.com/generative-computing/mellea/pull/994) | fix: default intrinsic adapter types; granite-switch tests | rework evidence | +| [#1003](https://github.com/generative-computing/mellea/issues/1003) | fix: intrinsic function signatures (model_options on helpers) | high-level helper signature evolution | +| [#1008](https://github.com/generative-computing/mellea/pull/1008) | fix: rewrite requirement_check_to_bool for new schema | worked example for schema-version story | +| [#1028](https://github.com/generative-computing/mellea/pull/1028) | feat: normalize intrinsics interfaces | introduces factuality rewind path | +| [#1035](https://github.com/generative-computing/mellea/issues/1035) | OTel emission gaps | parent for telemetry coordination | +| [#1036](https://github.com/generative-computing/mellea/pull/1036) | feat(telemetry): close five OTel GenAI semconv gaps | in-flight telemetry work to coordinate with | +| [#1018](https://github.com/generative-computing/mellea/issues/1018) | add support for granite-switch / embedded adapters on HF backend | depends on this refactor; explicit sequencing note in issue body | + +**Sequencing note:** #1018's issue body states "May require sorting out some of the issues in #929 first. Or at least creating a comprehensive plan." That makes this proposal the gating item: resolve the decisions here, land Phase 0–2 of the migration, then #1018 reduces to a straightforward "add the EmbeddedBinding path to `LocalHFBackend`" change following the established pattern. Attempting #1018 without the refactor re-creates the branching problem on a second backend. --- @@ -21,13 +45,29 @@ Three sources of friction have accumulated: Every thread in #929 is a symptom of not having separated the kinds of adapter and their lifecycles cleanly. +### Evidence that this is friction, not theory + +Seven fix-up commits in the adapter area in recent history, all symptomatic of the design gaps above rather than straightforward feature work: + +| Commit / PR | What it fixed | +| --- | --- | +| `1734900d` | Remove `answer_relevance*` intrinsics and unrelated intrinsic issues. | +| `8b6b8d55` (#972) | Model options with intrinsics (precedence bug surfaced). | +| `c57aba1d` (#986) | Issues introduced by preceding intrinsic changes. | +| `8577d092` (#994) | Default intrinsic adapter types; canned I/O with temperature. | +| `4d372b0e` (#979) | Key in JSON returned by `policy_guardrails` intrinsic. | +| `0617bd96` (#1008) | Rewrote `requirement_check_to_bool` for a changed output schema; flipped `"requirement_check"` → `"requirement-check"` in four files. | +| `75465d29` (#946) | "Simplify intrinsics" — reacting to accumulated complexity. | + +Add the `obtain_lora`-always-called masked error (#929 1b) and the hardcoded `"requirement-check"` references called out in #929 point 7 and PR #1008, and the picture is of a subsystem that receives repeated small-scope fixes rather than a stable abstraction. + ## 2. What we are trying to achieve - **One coherent mental model** of what an adapter is, so users and contributors can reason about behaviour without reading the implementation. - **One code path** through adapter invocation that works regardless of whether the adapter's weights are local, shipped with the base model, or served by a remote server. - **Correct, documented model-option precedence** that does not silently overwrite caller intent. - **Schema-version safety** so adapters can evolve their output format without breaking callers, and so an adapter whose schema drifts is visible rather than silent. -- **First-class customer adapters** — customers can build and ship their own adapters against the same API as first-party ones, without monkey-patching a global registry and without privileged access to internal APIs. +- **First-class customer adapters** — customers can build and ship their own adapters against the same API as first-party ones, without monkey-patching a global registry and without privileged access to internal APIs. *Current state is partial:* training infrastructure exists (`m alora train` / `m alora upload`), but consuming a custom adapter today requires either patching the catalog, subclassing `CustomIntrinsicAdapter` (self-confessed "temporary hack"), or uploading to HuggingFace first ([#424](https://github.com/generative-computing/mellea/issues/424)). The refactor closes this to a single declarative path: construct an `Adapter` with a `LocalFileBinding` pointing at any HF repo (or a local path) plus a local or remote `io.yaml`. - **Observable adapter calls** — every phase (download, activation, generation, parse, deactivation) is a distinct span with standard attributes; a per-adapter metrics plugin makes failures visible in dashboards without bespoke instrumentation. - **Parity, not breakage** — high-level helpers (`check_answerability` etc.) keep their shape; manual adapter construction becomes simpler, not harder. @@ -134,21 +174,84 @@ From this shape, the seven threads of #929 resolve cleanly. Full mapping is in P **What cross-cutting concerns look like:** observability (spans + a schema-drift metric), docs rewrite (`intrinsics_and_adapters.md` is 39 lines describing classes this renames), and a test-parity commitment travel **with** the refactor, not after it. Detail in Part II §10–§11. +### 4.1 Backend × reality matrix + +Which realities does each backend support today, and where this design takes them: + +| Backend | Reality A (Local PEFT) | Reality B (Embedded) | Reality C (Server-mediated) | +| ------------------- | :--------------------: | :------------------: | :-------------------------: | +| `LocalHFBackend` | ✅ today | ⏳ [#1018](https://github.com/generative-computing/mellea/issues/1018) | — | +| `OpenAIBackend` | — | ✅ today ([#881](https://github.com/generative-computing/mellea/pull/881)) | ⏳ [#27](https://github.com/generative-computing/mellea/issues/27) | + +- ✅ = supported; ⏳ = in-scope future work tracked by the linked issue; — = not applicable / not planned. +- Granite Switch (embedded) is the newest addition but is **not** "the premier option": local PEFT via `LocalHFBackend` remains the development/on-prem path and is the only reality that ships with both LoRA and aLoRA today. +- The design keeps every cell that is ✅ working, adds clean paths for the ⏳ cells without ad-hoc branching, and does not preclude new rows (backends) or columns (realities) later. + ## 5. Decisions needed now These gate decomposition; everything else can live in sub-issues once these are agreed. 1. **Does the end-state shape (§4) hold?** Three realities, `Adapter = identity + io_contract + weights`, role-based lookup for rerouting. Yes / no / what's missing. 2. **Adapter lifecycle default — session-scoped or request-scoped?** Today's HF backend keeps adapters loaded once added; request-scoped load/unload is safer for multi-tenancy but costs latency on a 7B base. -3. **OpenAI Reality C — which concrete shape first?** vLLM-backed LoRA serving (client-known weight file, server-side load) or commercial fine-tunes (fully hosted)? The binding covers both; the first subclass sets the idiom. -4. **Telemetry coupling with #1035.** Include intrinsic spans as part of this refactor, or as a follow-on to PR #1036's Gap 5? Coupling avoids designing content capture twice; decoupling keeps #1036 moving. -5. **Deprecation window.** How long do `IntrinsicAdapter` / `EmbeddedIntrinsicAdapter` / `CustomIntrinsicAdapter` stay as shims before removal? One minor release is the default; confirm. -6. **Terminology rename scope.** Three levels of commitment to the "adapter over intrinsic" shift: +3. **Reality C target shape.** The active work item is [#27](https://github.com/generative-computing/mellea/issues/27) (aLoRA on remote vLLM), paced by upstream vLLM's position. Do we leave the `ServerMediatedBinding` slot empty in 0.6.0 and populate it when #27 unblocks, or invest in a no-op/stub subclass now? Recommendation: leave empty, design the slot, revisit when upstream moves. +4. **Deprecation window.** How long do `IntrinsicAdapter` / `EmbeddedIntrinsicAdapter` / `CustomIntrinsicAdapter` stay as shims before removal? One minor release is the default; confirm. +5. **Terminology rename scope.** Three levels of commitment to the "adapter over intrinsic" shift: a. **Prose only** (docs, error messages, help text). Zero breakage. Recommended unconditionally. b. **Module rename**: `mellea.stdlib.components.intrinsic` → `mellea.stdlib.components.adapter`, with the old path re-exported for one release. Breaking for anyone importing from the submodule path. c. **AST class rename**: `Intrinsic` → something like `AdapterCall`, with `Intrinsic` as an alias for one release. Breaking for advanced users calling `mfuncs.act(Intrinsic(...))` directly. Confirm how deep to go. +> **Implementation note, not a reviewer question:** intrinsic-level observability (§10) should coordinate with the in-flight [#1035](https://github.com/generative-computing/mellea/issues/1035) / [PR #1036](https://github.com/generative-computing/mellea/pull/1036) work so content capture uses the same `MELLEA_TRACE_CONTENT` flag and doesn't get designed twice. Flagged here for awareness; sequenced during implementation. + +## 6. Impact and blast radius + +Scope of this refactor in concrete terms so reviewers can weigh the cost. + +### API surface + +- **Unchanged** — every high-level helper (`check_answerability` etc.) keeps its signature. `m.instruct`, `m.validate`, `m.chat` unaffected. The `model_options=` addition from [#1003](https://github.com/generative-computing/mellea/issues/1003) lands on top, not instead. +- **Deprecated but shimmed for one release** — `IntrinsicAdapter`, `EmbeddedIntrinsicAdapter`, `CustomIntrinsicAdapter` public classes. Direct users get `DeprecationWarning` pointing to the new constructor. +- **Optional, was mandatory** — the adapter catalogue. Stays as a convenience resolver, stops being a gate. +- **Possibly moved/renamed** — depends on §5 Q5 (terminology rename scope). + +### User-archetype impact + +| Audience | Impact | +| --- | --- | +| Helper user (`check_answerability`-style calls) | None beyond the `model_options=` addition from [#1003](https://github.com/generative-computing/mellea/issues/1003) and clearer error messages. | +| Advanced user constructing adapters directly | One release of deprecation warnings, then adopt the new `Adapter(name=…, weights=…)` constructor. | +| Customer writing their own adapter | First-class path; no more `CustomIntrinsicAdapter` monkey-patching; no forced catalogue upload. Resolves [#424](https://github.com/generative-computing/mellea/issues/424). | +| Backend author | `AdapterMixin` verb set narrows to the natural operations each backend can perform; existing implementations update or use shim methods. | +| Operator / SRE | New spans and metrics per §10; easier diagnosis of adapter failures and cost attribution. | + +### Code reach + +Files and modules touched, approximate: `mellea/backends/adapters/{adapter,catalog,__init__}.py`, `mellea/backends/{huggingface,openai}.py`, `mellea/stdlib/components/intrinsic/*`, `mellea/formatters/granite/intrinsics/*`, `mellea/stdlib/requirements/requirement.py`, `docs/examples/intrinsics/*`, `docs/dev/{intrinsics_and_adapters,requirement_aLoRA_rerouting}.md`. Larger than a typical feature PR; phased per §13 so individual PRs stay reviewable. + +### Release planning + +- **0.6.0 target**: §5 agreement plus Phases 0–2 of the migration (new `Adapter` / `WeightsBinding` / `IOContract` types, call-site adoption, backend narrowing, deprecation shims for old classes, unified model-option precedence, observability per §10, tests per §11). +- **0.6.x follow-on**: [#1018](https://github.com/generative-computing/mellea/issues/1018) (embedded adapters on `LocalHFBackend`), Phase 4 shim removal. +- **Deferred until upstream moves**: Reality C / [#27](https://github.com/generative-computing/mellea/issues/27). + +### Blocking and unblocking + +- **Blocks** [#1018](https://github.com/generative-computing/mellea/issues/1018) (explicitly stated in its issue body). +- **Substantially addresses** [#423](https://github.com/generative-computing/mellea/issues/423) (adapter code undocumented and over-specialised), [#424](https://github.com/generative-computing/mellea/issues/424) (cannot use intrinsics without uploading), all seven threads of [#929](https://github.com/generative-computing/mellea/issues/929). +- **Coordinates with** [PR #1036](https://github.com/generative-computing/mellea/pull/1036) on content-capture semantics. +- **Blocked by** upstream vLLM position on aLoRA ([#27](https://github.com/generative-computing/mellea/issues/27)) — and only for Reality C. Parts I–II of this design are not gated on upstream. + +### Performance + +- **Likely neutral or improved.** Session-scoped lifecycle is the proposed default (matches current `LocalHFBackend` behaviour); no additional load/unload cost per call. Unified parsing avoids the double-parse that the current output-normalisation sometimes does. +- **Regression watch**: if §5 Q2 chooses request-scoped, per-call PEFT load/unload becomes a visible cost. Measure before adoption. + +### Risk + +- **Biggest unknown**: whether the unified `resolve_model_options` handles every combination currently in use. Mitigation: keep the five-layer precedence explicit, add per-adapter override documentation, and assert resolved values in tests. +- **Second biggest**: schema-version dispatch (§5.4 in Part II). Worked example is the [#1008](https://github.com/generative-computing/mellea/pull/1008) `requirement-check` change — verifying v1 and v2 both pass through cleanly gates the parsing refactor. +- **Mitigated by**: per-phase test-parity commitment (nothing merges if existing tests regress); observability introduced alongside the refactor so production regressions surface as dashboard signals rather than silent behavioural drift. + --- # Part II — Supporting detail @@ -171,18 +274,22 @@ These gate decomposition; everything else can live in sub-issues once these are - On the client side, only `io.yaml` is needed to format inputs and parse outputs. - **Pre-installed weights, prompt-level activation, no separate download lifecycle.** -### 6.3 Reality C — Server-mediated adapter (partially gap today, #929 point 5) +### 6.3 Reality C — Server-mediated adapter (partially gap today) + +The OpenAI-compatible backend **already supports adapters** — but only embedded ones (Granite Switch via Reality B, added in [PR #881](https://github.com/generative-computing/mellea/pull/881)). What's missing is *non-embedded* server-side adapters. + +**The history (corrected):** Mellea previously ran aLoRA adapters through the OpenAI backend against a **custom vLLM build** that carried an aLoRA patch. The upstream vLLM project declined to merge that patch (confirmed in [PR #543](https://github.com/generative-computing/mellea/pull/543)'s review: "the vLLM aLoRA PR will not [be] accepted, so the alora/intrinsics code for openai is now all dead code"), so PR #543 removed the dead path. Upstream vLLM has therefore **never carried** aLoRA support — the right framing is "declined upstream," not "dropped." -The OpenAI-compatible backend **already supports adapters** — but only embedded ones (Granite Switch via Reality B, added in PR #881). The gap this reality addresses is *non-embedded* server-side adapters: the path that PR #543 removed when vLLM dropped aLoRA support and that #929 point 5 describes as "requires discussion." +**Live tracking item:** [Issue #27 "Add support for aloras to remote vllm when vllm supports it"](https://github.com/generative-computing/mellea/issues/27) is the open work item for this reality. It remains open because the upstream situation has not changed. -Whether we actively re-enable this path in 0.6.0 is a design decision (see §5 Q3). The shape is worth naming now so the binding abstraction accommodates it rather than having to be re-opened later. Two plausible sub-cases: +**Scope of this reality:** whatever the eventual technology path, the design slot is the same. Two sub-cases the binding must accommodate when the path becomes viable: -- **C1 — Client-pulled, server-activated**: weights exist as a file on the client side (or somewhere pullable), but activation happens on a remote inference server (e.g. vLLM loads them and exposes them via a LoRA ID or per-request model alias). The client sends either `model=` or a dedicated LoRA field in the API request. -- **C2 — Provider-hosted**: weights live entirely on the provider's infrastructure. The client only ever passes an identifier. Applies to commercial fine-tunes behind OpenAI, Azure, etc. +- **C1 — Client-pulled, server-activated**: weights exist as a file client-side (or somewhere pullable), but activation happens on a remote inference server which loads them and exposes them via a LoRA ID or per-request model alias. This is the vLLM-shaped path, paced by #27 being unblocked. +- **C2 — Provider-hosted**: weights live entirely on the provider's infrastructure. The client only ever passes an identifier. Applies to commercial fine-tunes behind OpenAI, Azure, etc. Not currently a known target in Mellea. Both share: **no local weight loading, API-parameter activation, `io.yaml` still required client-side.** The first concrete `ServerMediatedBinding` subclass sets the idiom for the API shape. -**Intent summary for OpenAI-compatible support:** keep and extend. Embedded support stays. The design leaves a clean slot for non-embedded when we decide to re-add it. +**Intent summary for OpenAI-compatible support:** keep and extend. Embedded support stays. The design leaves a clean slot for C1 to be populated when #27 is unblocked upstream; C2 is noted for completeness but not a near-term target. ## 7. Why the current code is tangled (concrete example) @@ -294,7 +401,7 @@ First-class deliverables, not afterthoughts. **Qualitative effectiveness suite (optional, per-adapter).** The tests above verify plumbing. They do *not* answer "does the answerability adapter actually judge answerability correctly?" A per-adapter qualitative suite (`@pytest.mark.qualitative`, opt-in, kept out of the fast loop) takes a small canonical dataset per adapter and asserts an accuracy floor on its outputs. Without this, a refactor can pass every structural test while silently degrading the behaviour users care about. -Two existing tools fit naturally here and should be preferred over ad-hoc harnesses: +Two existing tools — already part of Mellea's broader LLM-unit-testing conversation rather than bespoke to this refactor — fit naturally here and should be preferred over ad-hoc harnesses: - **`TestBasedEval`** (in-tree — `mellea/templates/prompts/default/TestBasedEval.jinja2`, documented at `docs.mellea.ai/how-to/unit-test-generative-code`) is Mellea's own LLM-as-judge component. Each adapter gets a JSON file of test cases (`{input, target, guidelines}`); a judge model returns `{"score": 0|1, "justification": "..."}`. Runnable from the CLI (`m eval run tests/eval_data/.json --backend ollama --model granite4.1:3b`) so the same fixtures power both CI and interactive debugging. This is the default mechanism for per-adapter qualitative coverage. - **BenchDrift** (`github.com/IBM/BenchDrift`) addresses a second failure mode: an adapter that works on its canonical phrasing but breaks on semantically-equivalent rephrasings. BenchDrift generates syntactic variations of each test case while preserving meaning, then scores consistency across variations. Worth applying to the adapters where phrasing-invariance is a real concern — answerability, context-relevance, requirement-check, and the Guardian family all qualify. Optional extension rather than baseline coverage, but enabling it per-adapter is a one-config-file step once the `TestBasedEval` fixtures exist. From 68c09b6b1cef76aae0628667a39075552e790444 Mon Sep 17 00:00:00 2001 From: Nigel Jones Date: Fri, 8 May 2026 10:36:33 +0100 Subject: [PATCH 08/29] docs: improve progressive read; expand backend matrix (#929) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Break up top-of-doc banner into visibly separated lines (blank blockquote rows) so fields don't run together on render. - Move the referenced-issues table and sequencing note out of the top (too much noise before the reader gets any narrative) into a late "Appendix — Referenced issues and PRs" section with sub-tables for tracking items, rework evidence, and related work. - §4.1 backend matrix expanded to cover all five Mellea backends: LocalHFBackend, OpenAIBackend, OllamaBackend, WatsonxBackend, LiteLLMBackend. Only the first two are in scope for adapter support under this design; the others are out-of-scope with a stated reason. Assisted-by: Claude Code Signed-off-by: Nigel Jones --- docs/dev/issue-929-adapter-design-proposal.md | 107 ++++++++++++------ 1 file changed, 71 insertions(+), 36 deletions(-) diff --git a/docs/dev/issue-929-adapter-design-proposal.md b/docs/dev/issue-929-adapter-design-proposal.md index f6b9d1c01..fe3ce662b 100644 --- a/docs/dev/issue-929-adapter-design-proposal.md +++ b/docs/dev/issue-929-adapter-design-proposal.md @@ -1,33 +1,14 @@ # Adapter Lifecycle — Design Proposal -> **Addresses:** [Epic #929 — Fix Intrinsic Adapter Lifecycle & Consistency in Mellea](https://github.com/generative-computing/mellea/issues/929). Read the epic first if you haven't — it catalogues the specific threads this proposal tries to resolve coherently rather than individually. -> **Status:** proposal for shape agreement; not a PR candidate. Preserved on a branch for review; once agreed, the content moves into `docs/dev/intrinsics_and_adapters.md` as the current-state doc and this file is deleted. -> **Structure:** **Part I** covers problem, goals, terminology, end state, and the decisions that gate decomposition. **Part II** contains supporting detail — read after Part I is agreed, not before. -> **Terminology:** this proposal uses **"adapter"** as the primary user-facing term; "intrinsic" appears only as the legacy name where it still refers to existing Mellea classes (e.g. the `Intrinsic` AST component). Rename strategy is in §5 Q6. - -## Referenced issues and PRs - -| Ref | Title | Why cited | -| --- | --- | --- | -| [Epic #929](https://github.com/generative-computing/mellea/issues/929) | Fix Intrinsic Adapter Lifecycle & Consistency in Mellea | *the epic this proposal addresses* | -| [#27](https://github.com/generative-computing/mellea/issues/27) | Add support for aloras to remote vllm when vllm supports it | live tracking item for Reality C | -| [#423](https://github.com/generative-computing/mellea/issues/423) | Adapter code is undocumented and over-specialized to Intrinsics | Priority-labelled; overlaps this refactor | -| [#424](https://github.com/generative-computing/mellea/issues/424) | Cannot use intrinsics without uploading them | customer-adapter friction | -| [#543](https://github.com/generative-computing/mellea/pull/543) | revert: remove adapters/intrinsics/alora/lora from openai code | why OpenAI backend lost adapter support | -| [#881](https://github.com/generative-computing/mellea/pull/881) | feat: add embedded adapters (granite switch) to openai backend | why OpenAI backend got Reality B back | -| [#946](https://github.com/generative-computing/mellea/pull/946) | feat: simplify intrinsics | rework evidence | -| [#972](https://github.com/generative-computing/mellea/pull/972) | fix: model options with intrinsics | rework evidence for #929 point 2 | -| [#979](https://github.com/generative-computing/mellea/pull/979) | fix: key in json returned by policy_guardrails intrinsic | rework evidence for output parsing | -| [#986](https://github.com/generative-computing/mellea/pull/986) | fix: issues introduced by intrinsic changes | rework evidence | -| [#994](https://github.com/generative-computing/mellea/pull/994) | fix: default intrinsic adapter types; granite-switch tests | rework evidence | -| [#1003](https://github.com/generative-computing/mellea/issues/1003) | fix: intrinsic function signatures (model_options on helpers) | high-level helper signature evolution | -| [#1008](https://github.com/generative-computing/mellea/pull/1008) | fix: rewrite requirement_check_to_bool for new schema | worked example for schema-version story | -| [#1028](https://github.com/generative-computing/mellea/pull/1028) | feat: normalize intrinsics interfaces | introduces factuality rewind path | -| [#1035](https://github.com/generative-computing/mellea/issues/1035) | OTel emission gaps | parent for telemetry coordination | -| [#1036](https://github.com/generative-computing/mellea/pull/1036) | feat(telemetry): close five OTel GenAI semconv gaps | in-flight telemetry work to coordinate with | -| [#1018](https://github.com/generative-computing/mellea/issues/1018) | add support for granite-switch / embedded adapters on HF backend | depends on this refactor; explicit sequencing note in issue body | - -**Sequencing note:** #1018's issue body states "May require sorting out some of the issues in #929 first. Or at least creating a comprehensive plan." That makes this proposal the gating item: resolve the decisions here, land Phase 0–2 of the migration, then #1018 reduces to a straightforward "add the EmbeddedBinding path to `LocalHFBackend`" change following the established pattern. Attempting #1018 without the refactor re-creates the branching problem on a second backend. +> **Addresses:** [Epic #929 — Fix Intrinsic Adapter Lifecycle & Consistency in Mellea](https://github.com/generative-computing/mellea/issues/929). Read the epic first if you haven't; it catalogues the specific threads this proposal tries to resolve coherently rather than individually. +> +> **Status:** proposal for shape agreement; not a PR candidate. Preserved on a branch for review. Once agreed, the content moves into `docs/dev/intrinsics_and_adapters.md` as the current-state doc and this file is deleted. +> +> **Structure:** **Part I** covers the problem, goals, terminology, end state, and the decisions that gate decomposition. **Part II** contains supporting detail — read after Part I is agreed, not before. +> +> **Terminology:** this proposal uses **"adapter"** as the primary user-facing term. "Intrinsic" appears only as the legacy name where it still refers to existing Mellea classes (e.g. the `Intrinsic` AST component). Rename strategy is in §5 Q5. +> +> **Related issues and prior work:** see the appendix at the end of this document for a linked index with annotations. --- @@ -176,16 +157,24 @@ From this shape, the seven threads of #929 resolve cleanly. Full mapping is in P ### 4.1 Backend × reality matrix -Which realities does each backend support today, and where this design takes them: +Mellea currently exposes five backends. Adapter support varies — and is not a goal for every backend. + +| Backend | Reality A (Local PEFT) | Reality B (Embedded) | Reality C (Server-mediated) | Notes | +| ------------------- | :--------------------: | :------------------: | :-------------------------: | --- | +| `LocalHFBackend` | ✅ today | ⏳ [#1018](https://github.com/generative-computing/mellea/issues/1018) | — | Primary local backend; only one with aLoRA support today. | +| `OpenAIBackend` | — | ✅ today ([#881](https://github.com/generative-computing/mellea/pull/881)) | ⏳ [#27](https://github.com/generative-computing/mellea/issues/27) | OpenAI-compatible endpoint, including vLLM servers. | +| `OllamaBackend` | — | — | — | Ollama's LoRA/PEFT story is GGUF-based and immature; not a current target. | +| `WatsonxBackend` | — | — | — | Would require watsonx-side adapter support; no current plan. | +| `LiteLLMBackend` | — | — | — | Multi-provider shim; adapter support would depend on the underlying provider and is not a coherent single-backend target. Could opportunistically inherit C2 if any wrapped provider exposes fine-tuned identifiers. | + +Legend: ✅ supported today, ⏳ planned future work tracked by the linked issue, — not applicable or not planned. -| Backend | Reality A (Local PEFT) | Reality B (Embedded) | Reality C (Server-mediated) | -| ------------------- | :--------------------: | :------------------: | :-------------------------: | -| `LocalHFBackend` | ✅ today | ⏳ [#1018](https://github.com/generative-computing/mellea/issues/1018) | — | -| `OpenAIBackend` | — | ✅ today ([#881](https://github.com/generative-computing/mellea/pull/881)) | ⏳ [#27](https://github.com/generative-computing/mellea/issues/27) | +**What this says about intent:** -- ✅ = supported; ⏳ = in-scope future work tracked by the linked issue; — = not applicable / not planned. +- The two **primary adapter backends are `LocalHFBackend` and `OpenAIBackend`.** The refactor targets these first. - Granite Switch (embedded) is the newest addition but is **not** "the premier option": local PEFT via `LocalHFBackend` remains the development/on-prem path and is the only reality that ships with both LoRA and aLoRA today. -- The design keeps every cell that is ✅ working, adds clean paths for the ⏳ cells without ad-hoc branching, and does not preclude new rows (backends) or columns (realities) later. +- The remaining three backends (`OllamaBackend`, `WatsonxBackend`, `LiteLLMBackend`) are **out of scope for adapter support under this design**. The new `WeightsBinding` abstraction does not preclude adding them later, but no issue currently tracks the intent and the underlying providers do not support the mechanisms Mellea's adapters need. +- The design keeps every ✅ cell working, adds clean paths for the ⏳ cells without ad-hoc branching, and leaves empty cells empty rather than stubbing them speculatively. ## 5. Decisions needed now @@ -439,4 +428,50 @@ Observability and docs deliverables attach to the phase that first exercises the --- -_Verified against: `mellea/backends/adapters/{adapter,catalog,__init__}.py`, `mellea/stdlib/components/intrinsic/{_util,intrinsic,core,rag,guardian}.py`, `mellea/backends/{openai,huggingface}.py`, `mellea/formatters/granite/intrinsics/input.py`, `mellea/stdlib/requirements/requirement.py`, `docs/dev/{intrinsics_and_adapters,requirement_aLoRA_rerouting}.md`, PRs #543 / #881 / #986 / #994 / #1003 / #1008 / #1028 / #1036, commits `666d646a`, `8b6b8d55`, `c57aba1d`, `8577d092`._ +# Appendix — Referenced issues and PRs + +Linked index of every issue, PR, and commit cited in this document. Use this to jump to primary sources. + +### Tracking items (open, design-relevant) + +| Ref | Title | Relevance | +| --- | --- | --- | +| [Epic #929](https://github.com/generative-computing/mellea/issues/929) | Fix Intrinsic Adapter Lifecycle & Consistency in Mellea | *the epic this proposal addresses* | +| [#27](https://github.com/generative-computing/mellea/issues/27) | Add support for aloras to remote vllm when vllm supports it | live tracking item for Reality C | +| [#423](https://github.com/generative-computing/mellea/issues/423) | Adapter code is undocumented and over-specialized to Intrinsics | Priority-labelled; overlaps this refactor | +| [#424](https://github.com/generative-computing/mellea/issues/424) | Cannot use intrinsics without uploading them | customer-adapter friction | +| [#1018](https://github.com/generative-computing/mellea/issues/1018) | add support for granite-switch / embedded adapters on HF backend | explicitly sequenced after this refactor | + +### History and rework evidence + +| Ref | Title | Role in this doc | +| --- | --- | --- | +| [#543](https://github.com/generative-computing/mellea/pull/543) | revert: remove adapters/intrinsics/alora/lora from openai code | why OpenAI backend lost adapter support (upstream vLLM declined aLoRA PR) | +| [#881](https://github.com/generative-computing/mellea/pull/881) | feat: add embedded adapters (granite switch) to openai backend | why OpenAI backend got Reality B back | +| [#946](https://github.com/generative-computing/mellea/pull/946) | feat: simplify intrinsics | rework evidence | +| [#972](https://github.com/generative-computing/mellea/pull/972) | fix: model options with intrinsics | rework evidence for #929 point 2 | +| [#979](https://github.com/generative-computing/mellea/pull/979) | fix: key in json returned by policy_guardrails intrinsic | rework evidence for output parsing | +| [#986](https://github.com/generative-computing/mellea/pull/986) | fix: issues introduced by intrinsic changes | rework evidence | +| [#994](https://github.com/generative-computing/mellea/pull/994) | fix: default intrinsic adapter types; granite-switch tests | rework evidence | +| [#1008](https://github.com/generative-computing/mellea/pull/1008) | fix: rewrite requirement_check_to_bool for new schema | worked example for the schema-version story | +| [#1028](https://github.com/generative-computing/mellea/pull/1028) | feat: normalize intrinsics interfaces | introduces the factuality rewind path | + +### Related in-flight and planned work + +| Ref | Title | Role in this doc | +| --- | --- | --- | +| [#1003](https://github.com/generative-computing/mellea/issues/1003) | fix: intrinsic function signatures (model_options on helpers) | high-level helper signature evolution | +| [#1035](https://github.com/generative-computing/mellea/issues/1035) | OTel emission gaps | parent for telemetry coordination | +| [PR #1036](https://github.com/generative-computing/mellea/pull/1036) | feat(telemetry): close five OTel GenAI semconv gaps | in-flight telemetry work to coordinate with | + +### Sequencing + +**Why [#1018](https://github.com/generative-computing/mellea/issues/1018) waits for this proposal:** + +- #1018's own body states: *"May require sorting out some of the issues in #929 first. Or at least creating a comprehensive plan."* +- Once Part I is agreed and Phase 0–2 of the migration have landed, #1018 reduces to *"add the `EmbeddedBinding` path to `LocalHFBackend`"* following the pattern already used for `OpenAIBackend`. +- Attempting #1018 without this refactor re-creates the same branching problem on a second backend. + +### Verification trail + +Verified against: `mellea/backends/adapters/{adapter,catalog,__init__}.py`, `mellea/stdlib/components/intrinsic/{_util,intrinsic,core,rag,guardian}.py`, `mellea/backends/{openai,huggingface}.py`, `mellea/formatters/granite/intrinsics/input.py`, `mellea/stdlib/requirements/requirement.py`, `docs/dev/{intrinsics_and_adapters,requirement_aLoRA_rerouting}.md`; commits `666d646a`, `8b6b8d55`, `c57aba1d`, `8577d092`, `c6a3e643` (aLoRA → PEFT 0.18.1 migration). From 3cb5ebcacb6c576a43055494c60c06e6cf29ea37 Mon Sep 17 00:00:00 2001 From: Nigel Jones Date: Fri, 8 May 2026 10:50:33 +0100 Subject: [PATCH 09/29] docs: restructure for progressive disclosure (#929) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Reorder to lead readers gently from context → problem → goals → shape → decisions → impact before descending into detail. No content lost; sections rearranged so reviewers can stop at any point and still have learned something useful. Part I changes: - §1 evidence table compressed to one sentence + appendix pointer - §2 goals compressed from 7 detailed bullets to 4 core outcomes - §3 Terminology replaced with a 4-term "key terms" list; full glossary moved to Part II §7 - §4 end result trimmed: keeps the composition box, the three-reality flowchart, and the one-line simplified flow. Binding verb table and lifecycle sequence diagram moved to Part II §9. Full backend matrix moved to Part II §10. - §4.1 is now a one-paragraph summary with a §10 pointer Part II changes: - New §7 Terminology (full glossary moved from Part I §3) - §8 Three realities (subsection numbers corrected) - New §9 End-state design detail (binding table + sequence diagram) - New §10 Backend × reality matrix (full version) - §11-§17 are the former §7-§13 with renumbered intra-doc refs Appendix unchanged in content; now reached after a narrative that builds up to it. Assisted-by: Claude Code Signed-off-by: Nigel Jones --- docs/dev/issue-929-adapter-design-proposal.md | 262 ++++++++++-------- 1 file changed, 141 insertions(+), 121 deletions(-) diff --git a/docs/dev/issue-929-adapter-design-proposal.md b/docs/dev/issue-929-adapter-design-proposal.md index fe3ce662b..875ba1992 100644 --- a/docs/dev/issue-929-adapter-design-proposal.md +++ b/docs/dev/issue-929-adapter-design-proposal.md @@ -24,52 +24,27 @@ Three sources of friction have accumulated: 2. **Adapter lifecycle is not modelled.** `call_intrinsic` constructs an `IntrinsicAdapter` as a side effect of invoking one, which triggers an unconditional weight download even when no download is needed. The user sees a misleading download error; the real error is masked. There is no concept of "prepare," "activate," "deactivate" as distinct steps. 3. **Small, visible follow-on issues cluster around these two roots** — a five-place model-options hierarchy with a silent-overwrite bug; JSON output keys hardcoded in helpers (`result_json["answerability"]`) that break when an adapter ships a new output schema; the `"requirement-check"` string duplicated across four files; a `CustomIntrinsicAdapter` whose constructor monkey-patches the global catalog with a self-confessed "temporary hack." -Every thread in #929 is a symptom of not having separated the kinds of adapter and their lifecycles cleanly. +Every thread in #929 is a symptom of not having separated the kinds of adapter and their lifecycles cleanly. This is not a theoretical concern: **seven fix-up commits have landed in the adapter area in recent history** (full list in the appendix), alongside the `obtain_lora`-always-called masked error and the hardcoded `"requirement-check"` strings flagged by #929 point 7 / PR #1008 — the picture is of a subsystem that receives repeated small-scope fixes rather than a stable abstraction. -### Evidence that this is friction, not theory - -Seven fix-up commits in the adapter area in recent history, all symptomatic of the design gaps above rather than straightforward feature work: - -| Commit / PR | What it fixed | -| --- | --- | -| `1734900d` | Remove `answer_relevance*` intrinsics and unrelated intrinsic issues. | -| `8b6b8d55` (#972) | Model options with intrinsics (precedence bug surfaced). | -| `c57aba1d` (#986) | Issues introduced by preceding intrinsic changes. | -| `8577d092` (#994) | Default intrinsic adapter types; canned I/O with temperature. | -| `4d372b0e` (#979) | Key in JSON returned by `policy_guardrails` intrinsic. | -| `0617bd96` (#1008) | Rewrote `requirement_check_to_bool` for a changed output schema; flipped `"requirement_check"` → `"requirement-check"` in four files. | -| `75465d29` (#946) | "Simplify intrinsics" — reacting to accumulated complexity. | +## 2. What we are trying to achieve -Add the `obtain_lora`-always-called masked error (#929 1b) and the hardcoded `"requirement-check"` references called out in #929 point 7 and PR #1008, and the picture is of a subsystem that receives repeated small-scope fixes rather than a stable abstraction. +Four outcomes, in order of importance. Detail on each lives in Part II; this list is the ask. -## 2. What we are trying to achieve +1. **One adapter model, one code path.** Reasonable from the outside, unified from the inside — no more `if backend._uses_embedded_adapters:` branches. +2. **Safe evolution.** Model-option precedence is documented and enforced; adapter output schemas can version without breaking callers. +3. **First-class customer adapters.** Customers can ship their own against the same API as first-party ones — today it requires patching the catalog or subclassing a self-confessed "temporary hack" ([#424](https://github.com/generative-computing/mellea/issues/424)). +4. **Observable and parity-respecting.** Every lifecycle phase is a distinct span; high-level helpers (`check_answerability` etc.) keep their shape; manual adapter construction becomes simpler, not harder. -- **One coherent mental model** of what an adapter is, so users and contributors can reason about behaviour without reading the implementation. -- **One code path** through adapter invocation that works regardless of whether the adapter's weights are local, shipped with the base model, or served by a remote server. -- **Correct, documented model-option precedence** that does not silently overwrite caller intent. -- **Schema-version safety** so adapters can evolve their output format without breaking callers, and so an adapter whose schema drifts is visible rather than silent. -- **First-class customer adapters** — customers can build and ship their own adapters against the same API as first-party ones, without monkey-patching a global registry and without privileged access to internal APIs. *Current state is partial:* training infrastructure exists (`m alora train` / `m alora upload`), but consuming a custom adapter today requires either patching the catalog, subclassing `CustomIntrinsicAdapter` (self-confessed "temporary hack"), or uploading to HuggingFace first ([#424](https://github.com/generative-computing/mellea/issues/424)). The refactor closes this to a single declarative path: construct an `Adapter` with a `LocalFileBinding` pointing at any HF repo (or a local path) plus a local or remote `io.yaml`. -- **Observable adapter calls** — every phase (download, activation, generation, parse, deactivation) is a distinct span with standard attributes; a per-adapter metrics plugin makes failures visible in dashboards without bespoke instrumentation. -- **Parity, not breakage** — high-level helpers (`check_answerability` etc.) keep their shape; manual adapter construction becomes simpler, not harder. +## 3. Key terms (brief) -## 3. Terminology +Only the few terms needed to read Part I: -Names matter in this design because they appear in user-facing error messages, docs, and telemetry attributes. Glossary for this proposal: +- **Adapter** — the user-facing term for a specialised capability added to a base model (answerability, requirement-check, etc.). The "adapter" replaces Mellea's legacy term "intrinsic" in prose; legacy class names (`Intrinsic`, `IntrinsicAdapter`) are a migration question, not a meaning question. +- **Base model** — the general-purpose LLM everything runs on top of (e.g. `ibm-granite/granite-4.1-3b`). +- **LoRA / aLoRA** — the two PEFT technologies adapters use. Both are supported. +- **Reality A / B / C** — shorthand introduced in §4 for the three "where the weights live" stories. -| Term | Meaning | -| --- | --- | -| **Base model** | The general-purpose LLM (e.g. `ibm-granite/granite-4.1-3b`) that everything runs on top of. | -| **Adapter** | The user-facing term for a specialised capability added to a base model — answerability, citations, requirement-check, etc. In the redesign, `Adapter` is one class composed of three parts (identity, I/O contract, weights binding). This is the primary noun users and docs should reach for. | -| **Intrinsic** | Legacy Mellea term for the same concept. Still appears in the current class names (`Intrinsic` AST component, `IntrinsicAdapter`, `mellea.stdlib.components.intrinsic` module). The direction of travel is to fold "intrinsic" language into "adapter" — the rename scope is a decision in §5. | -| **Identity** | The part of an adapter that says *what it is*: name (e.g. `answerability`), adapter type (`lora` / `alora`), schema version, and optional role. | -| **I/O contract** | The parsed `io.yaml` — prompt template, output parser, model-option defaults. Always present, same shape regardless of reality. | -| **Weights binding** | The part of an adapter that says *how its weights are made available*. Three subclasses, one per reality. Exposes `prepare`, `activate`, `deactivate`, `release`. | -| **Reality A / B / C** | Shorthand for the three "where the weights live" stories: A = local PEFT file, B = shipped with the base model (Granite Switch), C = server-mediated (future OpenAI/vLLM). | -| **LoRA / aLoRA** | Two PEFT technologies. LoRA weights always participate; aLoRA only participates after an activation token is seen. A single adapter ships as one or the other (some intrinsics as either); both are supported across all three realities (including embedded — Granite Switch has LoRA and aLoRA adapters in the same repo, `technology` field on each). | -| **Role** | A *semantic* label on an adapter distinct from its name — e.g. `requirement_check`, `context_attribution`. Used by callers (the `Requirement` rerouting path) to find "the adapter that plays this role" without hardcoding a name string. | -| **Qualified name** | Today's disambiguator: `_`. In the redesign, derived on demand from `identity` rather than stored as a field. | -| **Catalog** | The registry of known adapters at `mellea/backends/adapters/catalog.py`. Becomes optional and advisory rather than mandatory and monkey-patched. | -| **`io.yaml`** | The YAML file that declares an adapter's input template, output schema, and generation parameters. Lives in the adapter's HuggingFace repo. | +Full glossary (identity, I/O contract, weights binding, role, qualified name, catalog, io.yaml) is in §7 — needed only when you descend into the detail. ## 4. Rough end result @@ -82,15 +57,7 @@ Adapter └── weights — one of three pluggable bindings (LocalFile, Embedded, ServerMediated) ``` -The **weights binding** is where the three realities live. It exposes a single verb set — `prepare`, `activate`, `deactivate`, `release` — that every backend calls uniformly. Each concrete binding implements those verbs for its reality: - -| Binding | `prepare` | `activate` | `deactivate` | -| --- | --- | --- | --- | -| `LocalFileBinding` (Reality A) | Download from repo → cache path | PEFT `load_adapter` | PEFT `unload_adapter` | -| `EmbeddedBinding` (Reality B) | No-op (weights baked in) | Render `controls` field into chat template | Drop the `controls` field | -| `ServerMediatedBinding` (Reality C) | No-op (or push weights, depending on sub-case) | Set adapter identifier on API request | Unset identifier | - -Visually, the three realities differ only in where the weights are and how the backend reaches them; the I/O contract is shared: +The **weights binding** is where the three realities live. It exposes a single verb set — `prepare`, `activate`, `deactivate`, `release` — that every backend calls uniformly. What each verb does per reality lives in §9; the high-level picture is all three realities converging on one shared `io_contract`: ```mermaid flowchart LR @@ -109,7 +76,7 @@ flowchart LR end ``` -`call_intrinsic` becomes one flow, no branches on backend type: +Adapter invocation becomes one flow, no branches on backend type: ``` adapter = backend.resolve_adapter(name) @@ -118,63 +85,15 @@ with backend.adapter_scope(adapter): return adapter.io_contract.parse(raw) ``` -The lifecycle inside `adapter_scope` is the same for every binding — only the verbs do reality-specific work: - -```mermaid -sequenceDiagram - participant C as Caller - participant B as Backend - participant A as Adapter - participant W as WeightsBinding - participant M as Base Model - - C->>B: check_answerability(...) - B->>A: resolve_adapter(name) - - rect rgb(245, 245, 245) - Note over B,W: adapter_scope(adapter) - B->>W: prepare() - W-->>M: download / no-op - B->>W: activate() - W-->>M: load / render controls / set adapter_id - B->>A: io_contract.build_prompt(...) - B->>M: generate(prompt) - M-->>B: raw output - B->>A: io_contract.parse(raw) - A-->>B: normalised result - B->>W: deactivate() - W-->>M: unload / drop controls / unset - end - - B-->>C: score -``` - -From this shape, the seven threads of #929 resolve cleanly. Full mapping is in Part II §8. +From this shape, the seven threads of #929 resolve cleanly. Full verb semantics per binding, the lifecycle sequence diagram, and the thread-by-thread mapping are in Part II (§9 and §12). -**What users see:** high-level helpers (`check_answerability` etc.) keep their current shape, with the `model_options=` addition that PR #1003 is introducing. Manual adapter construction collapses from four classes to one, with the binding as the pluggable part. Custom intrinsics no longer require monkey-patching the catalog. Detail in Part II §9. +**What users see:** high-level helpers (`check_answerability` etc.) keep their current shape, with the `model_options=` addition that PR #1003 is introducing. Manual adapter construction collapses from four classes to one, with the binding as the pluggable part. Custom intrinsics no longer require monkey-patching the catalog. Detail in Part II §13. -**What cross-cutting concerns look like:** observability (spans + a schema-drift metric), docs rewrite (`intrinsics_and_adapters.md` is 39 lines describing classes this renames), and a test-parity commitment travel **with** the refactor, not after it. Detail in Part II §10–§11. +**What cross-cutting concerns look like:** observability (spans + a schema-drift metric), docs rewrite (`intrinsics_and_adapters.md` is 39 lines describing classes this renames), and a test-parity commitment travel **with** the refactor, not after it. Detail in Part II §14–§15. -### 4.1 Backend × reality matrix +### 4.1 Backend scope -Mellea currently exposes five backends. Adapter support varies — and is not a goal for every backend. - -| Backend | Reality A (Local PEFT) | Reality B (Embedded) | Reality C (Server-mediated) | Notes | -| ------------------- | :--------------------: | :------------------: | :-------------------------: | --- | -| `LocalHFBackend` | ✅ today | ⏳ [#1018](https://github.com/generative-computing/mellea/issues/1018) | — | Primary local backend; only one with aLoRA support today. | -| `OpenAIBackend` | — | ✅ today ([#881](https://github.com/generative-computing/mellea/pull/881)) | ⏳ [#27](https://github.com/generative-computing/mellea/issues/27) | OpenAI-compatible endpoint, including vLLM servers. | -| `OllamaBackend` | — | — | — | Ollama's LoRA/PEFT story is GGUF-based and immature; not a current target. | -| `WatsonxBackend` | — | — | — | Would require watsonx-side adapter support; no current plan. | -| `LiteLLMBackend` | — | — | — | Multi-provider shim; adapter support would depend on the underlying provider and is not a coherent single-backend target. Could opportunistically inherit C2 if any wrapped provider exposes fine-tuned identifiers. | - -Legend: ✅ supported today, ⏳ planned future work tracked by the linked issue, — not applicable or not planned. - -**What this says about intent:** - -- The two **primary adapter backends are `LocalHFBackend` and `OpenAIBackend`.** The refactor targets these first. -- Granite Switch (embedded) is the newest addition but is **not** "the premier option": local PEFT via `LocalHFBackend` remains the development/on-prem path and is the only reality that ships with both LoRA and aLoRA today. -- The remaining three backends (`OllamaBackend`, `WatsonxBackend`, `LiteLLMBackend`) are **out of scope for adapter support under this design**. The new `WeightsBinding` abstraction does not preclude adding them later, but no issue currently tracks the intent and the underlying providers do not support the mechanisms Mellea's adapters need. -- The design keeps every ✅ cell working, adds clean paths for the ⏳ cells without ad-hoc branching, and leaves empty cells empty rather than stubbing them speculatively. +Of Mellea's five backends (`LocalHFBackend`, `OpenAIBackend`, `OllamaBackend`, `WatsonxBackend`, `LiteLLMBackend`), **the two primary adapter backends are `LocalHFBackend` and `OpenAIBackend`** — those are what this design targets. The remaining three are out of scope for adapter support because the underlying providers do not support the mechanisms Mellea's adapters need today. The `WeightsBinding` abstraction does not preclude adding them later. Full backend × reality matrix with per-backend reasoning is in §10. ## 5. Decisions needed now @@ -190,7 +109,7 @@ These gate decomposition; everything else can live in sub-issues once these are c. **AST class rename**: `Intrinsic` → something like `AdapterCall`, with `Intrinsic` as an alias for one release. Breaking for advanced users calling `mfuncs.act(Intrinsic(...))` directly. Confirm how deep to go. -> **Implementation note, not a reviewer question:** intrinsic-level observability (§10) should coordinate with the in-flight [#1035](https://github.com/generative-computing/mellea/issues/1035) / [PR #1036](https://github.com/generative-computing/mellea/pull/1036) work so content capture uses the same `MELLEA_TRACE_CONTENT` flag and doesn't get designed twice. Flagged here for awareness; sequenced during implementation. +> **Implementation note, not a reviewer question:** intrinsic-level observability (§14) should coordinate with the in-flight [#1035](https://github.com/generative-computing/mellea/issues/1035) / [PR #1036](https://github.com/generative-computing/mellea/pull/1036) work so content capture uses the same `MELLEA_TRACE_CONTENT` flag and doesn't get designed twice. Flagged here for awareness; sequenced during implementation. ## 6. Impact and blast radius @@ -211,15 +130,15 @@ Scope of this refactor in concrete terms so reviewers can weigh the cost. | Advanced user constructing adapters directly | One release of deprecation warnings, then adopt the new `Adapter(name=…, weights=…)` constructor. | | Customer writing their own adapter | First-class path; no more `CustomIntrinsicAdapter` monkey-patching; no forced catalogue upload. Resolves [#424](https://github.com/generative-computing/mellea/issues/424). | | Backend author | `AdapterMixin` verb set narrows to the natural operations each backend can perform; existing implementations update or use shim methods. | -| Operator / SRE | New spans and metrics per §10; easier diagnosis of adapter failures and cost attribution. | +| Operator / SRE | New spans and metrics per §14; easier diagnosis of adapter failures and cost attribution. | ### Code reach -Files and modules touched, approximate: `mellea/backends/adapters/{adapter,catalog,__init__}.py`, `mellea/backends/{huggingface,openai}.py`, `mellea/stdlib/components/intrinsic/*`, `mellea/formatters/granite/intrinsics/*`, `mellea/stdlib/requirements/requirement.py`, `docs/examples/intrinsics/*`, `docs/dev/{intrinsics_and_adapters,requirement_aLoRA_rerouting}.md`. Larger than a typical feature PR; phased per §13 so individual PRs stay reviewable. +Files and modules touched, approximate: `mellea/backends/adapters/{adapter,catalog,__init__}.py`, `mellea/backends/{huggingface,openai}.py`, `mellea/stdlib/components/intrinsic/*`, `mellea/formatters/granite/intrinsics/*`, `mellea/stdlib/requirements/requirement.py`, `docs/examples/intrinsics/*`, `docs/dev/{intrinsics_and_adapters,requirement_aLoRA_rerouting}.md`. Larger than a typical feature PR; phased per §16 so individual PRs stay reviewable. ### Release planning -- **0.6.0 target**: §5 agreement plus Phases 0–2 of the migration (new `Adapter` / `WeightsBinding` / `IOContract` types, call-site adoption, backend narrowing, deprecation shims for old classes, unified model-option precedence, observability per §10, tests per §11). +- **0.6.0 target**: §5 agreement plus Phases 0–2 of the migration (new `Adapter` / `WeightsBinding` / `IOContract` types, call-site adoption, backend narrowing, deprecation shims for old classes, unified model-option precedence, observability per §14, tests per §15). - **0.6.x follow-on**: [#1018](https://github.com/generative-computing/mellea/issues/1018) (embedded adapters on `LocalHFBackend`), Phase 4 shim removal. - **Deferred until upstream moves**: Reality C / [#27](https://github.com/generative-computing/mellea/issues/27). @@ -238,32 +157,51 @@ Files and modules touched, approximate: `mellea/backends/adapters/{adapter,catal ### Risk - **Biggest unknown**: whether the unified `resolve_model_options` handles every combination currently in use. Mitigation: keep the five-layer precedence explicit, add per-adapter override documentation, and assert resolved values in tests. -- **Second biggest**: schema-version dispatch (§5.4 in Part II). Worked example is the [#1008](https://github.com/generative-computing/mellea/pull/1008) `requirement-check` change — verifying v1 and v2 both pass through cleanly gates the parsing refactor. +- **Second biggest**: schema-version dispatch (§12 and §9 in Part II). Worked example is the [#1008](https://github.com/generative-computing/mellea/pull/1008) `requirement-check` change — verifying v1 and v2 both pass through cleanly gates the parsing refactor. - **Mitigated by**: per-phase test-parity commitment (nothing merges if existing tests regress); observability introduced alongside the refactor so production regressions surface as dashboard signals rather than silent behavioural drift. --- # Part II — Supporting detail -> For deeper review once Part I is agreed. Skim headings first. +> For deeper review once Part I is agreed. Part II expands the definitions and the design so that reviewers can pressure-test the specifics. Sections are roughly ordered from "what exactly are we talking about" (terminology, realities, end-state detail) through "why this shape is right" (current tangle, thread mapping) to "what it looks like in practice" (user-facing, observability, docs/tests, migration, open questions). + +## 7. Terminology (full glossary) + +Names matter because they appear in user-facing error messages, docs, and telemetry attributes. The short list for quick reading is in Part I §3; this is the complete reference. + +| Term | Meaning | +| --- | --- | +| **Base model** | The general-purpose LLM (e.g. `ibm-granite/granite-4.1-3b`) that everything runs on top of. | +| **Adapter** | The user-facing term for a specialised capability added to a base model — answerability, citations, requirement-check, etc. In the redesign, `Adapter` is one class composed of three parts (identity, I/O contract, weights binding). This is the primary noun users and docs should reach for. | +| **Intrinsic** | Legacy Mellea term for the same concept. Still appears in the current class names (`Intrinsic` AST component, `IntrinsicAdapter`, `mellea.stdlib.components.intrinsic` module). The direction of travel is to fold "intrinsic" language into "adapter" — the rename scope is a decision in Part I §5. | +| **Identity** | The part of an adapter that says *what it is*: name (e.g. `answerability`), adapter type (`lora` / `alora`), schema version, and optional role. | +| **I/O contract** | The parsed `io.yaml` — prompt template, output parser, model-option defaults. Always present, same shape regardless of reality. | +| **Weights binding** | The part of an adapter that says *how its weights are made available*. Three subclasses, one per reality. Exposes `prepare`, `activate`, `deactivate`, `release`. | +| **Reality A / B / C** | Shorthand for the three "where the weights live" stories: A = local PEFT file, B = shipped with the base model (Granite Switch), C = server-mediated (future OpenAI/vLLM). | +| **LoRA / aLoRA** | Two PEFT technologies. LoRA weights always participate; aLoRA only participates after an activation token is seen. A single adapter ships as one or the other (some intrinsics as either); both are supported across all three realities (including embedded — Granite Switch has LoRA and aLoRA adapters in the same repo, `technology` field on each). | +| **Role** | A *semantic* label on an adapter distinct from its name — e.g. `requirement_check`, `context_attribution`. Used by callers (the `Requirement` rerouting path) to find "the adapter that plays this role" without hardcoding a name string. | +| **Qualified name** | Today's disambiguator: `_`. In the redesign, derived on demand from `identity` rather than stored as a field. | +| **Catalog** | The registry of known adapters at `mellea/backends/adapters/catalog.py`. Becomes optional and advisory rather than mandatory and monkey-patched. | +| **`io.yaml`** | The YAML file that declares an adapter's input template, output schema, and generation parameters. Lives in the adapter's HuggingFace repo. | -## 6. Three realities of "where the weights live" +## 8. Three realities of "where the weights live" -### 6.1 Reality A — Local PEFT adapter (today's `IntrinsicAdapter`) +### 8.1 Reality A — Local PEFT adapter (today's `IntrinsicAdapter`) - Weights are a distinct file Mellea downloads from HuggingFace into the local cache. - At call time, the backend uses the PEFT library to plug those weights into the base model. - After the call, the backend can unplug them. - **Physical weights, runtime activation, downloadable lifecycle.** -### 6.2 Reality B — Embedded adapter (today's `EmbeddedIntrinsicAdapter`, used by Granite Switch) +### 8.2 Reality B — Embedded adapter (today's `EmbeddedIntrinsicAdapter`, used by Granite Switch) - Adapter weights **ship in the same HuggingFace repo as the base model**. They come down with the base-model snapshot and are not fetched separately — confirmed by the fact that `EmbeddedIntrinsicAdapter.from_hub` downloads only `adapter_index.json` + `io_configs/**`, not weight files. The phrase "baked into the base model" is a useful shorthand but imprecise: the weights are still distinct PEFT modules, just co-located and pre-loaded by the inference runtime. - **Both LoRA and aLoRA are supported.** `adapter_index.json` lists each embedded adapter with a `technology` field (`"lora"` or `"alora"`). The chat template uses that field to place the `controls` JSON at the correct position — beginning of sequence for LoRA, before the generation prompt for aLoRA — so the right adapter is active for the right span of tokens. Granite Switch therefore genuinely carries both technologies; it is not a LoRA-only reality. - On the client side, only `io.yaml` is needed to format inputs and parse outputs. - **Pre-installed weights, prompt-level activation, no separate download lifecycle.** -### 6.3 Reality C — Server-mediated adapter (partially gap today) +### 8.3 Reality C — Server-mediated adapter (partially gap today) The OpenAI-compatible backend **already supports adapters** — but only embedded ones (Granite Switch via Reality B, added in [PR #881](https://github.com/generative-computing/mellea/pull/881)). What's missing is *non-embedded* server-side adapters. @@ -280,7 +218,75 @@ Both share: **no local weight loading, API-parameter activation, `io.yaml` still **Intent summary for OpenAI-compatible support:** keep and extend. Embedded support stays. The design leaves a clean slot for C1 to be populated when #27 is unblocked upstream; C2 is noted for completeness but not a near-term target. -## 7. Why the current code is tangled (concrete example) +## 9. End-state design detail + +### 9.1 Weights binding verbs per reality + +Each concrete binding implements the four-verb set from Part I §4. The column meanings do not change between realities — only what happens inside the verb does. + +| Binding | `prepare` | `activate` | `deactivate` | +| --- | --- | --- | --- | +| `LocalFileBinding` (Reality A) | Download from repo → cache path | PEFT `load_adapter` | PEFT `unload_adapter` | +| `EmbeddedBinding` (Reality B) | No-op (weights shipped with base model) | Render `controls` field into chat template | Drop the `controls` field | +| `ServerMediatedBinding` (Reality C) | No-op (or push weights, depending on sub-case) | Set adapter identifier on API request | Unset identifier | + +`release()` is implemented per-binding as needed (cache eviction for LocalFile; no-op for the others). + +### 9.2 Lifecycle sequence + +The lifecycle inside `adapter_scope` is the same for every binding — only the verbs do reality-specific work: + +```mermaid +sequenceDiagram + participant C as Caller + participant B as Backend + participant A as Adapter + participant W as WeightsBinding + participant M as Base Model + + C->>B: check_answerability(...) + B->>A: resolve_adapter(name) + + rect rgb(245, 245, 245) + Note over B,W: adapter_scope(adapter) + B->>W: prepare() + W-->>M: download / no-op + B->>W: activate() + W-->>M: load / render controls / set adapter_id + B->>A: io_contract.build_prompt(...) + B->>M: generate(prompt) + M-->>B: raw output + B->>A: io_contract.parse(raw) + A-->>B: normalised result + B->>W: deactivate() + W-->>M: unload / drop controls / unset + end + + B-->>C: score +``` + +## 10. Backend × reality matrix + +Mellea currently exposes five backends. Adapter support varies — and is not a goal for every backend. + +| Backend | Reality A (Local PEFT) | Reality B (Embedded) | Reality C (Server-mediated) | Notes | +| ------------------- | :--------------------: | :------------------: | :-------------------------: | --- | +| `LocalHFBackend` | ✅ today | ⏳ [#1018](https://github.com/generative-computing/mellea/issues/1018) | — | Primary local backend; only one with aLoRA support today. | +| `OpenAIBackend` | — | ✅ today ([#881](https://github.com/generative-computing/mellea/pull/881)) | ⏳ [#27](https://github.com/generative-computing/mellea/issues/27) | OpenAI-compatible endpoint, including vLLM servers. | +| `OllamaBackend` | — | — | — | Ollama's LoRA/PEFT story is GGUF-based and immature; not a current target. | +| `WatsonxBackend` | — | — | — | Would require watsonx-side adapter support; no current plan. | +| `LiteLLMBackend` | — | — | — | Multi-provider shim; adapter support would depend on the underlying provider and is not a coherent single-backend target. Could opportunistically inherit C2 if any wrapped provider exposes fine-tuned identifiers. | + +Legend: ✅ supported today, ⏳ planned future work tracked by the linked issue, — not applicable or not planned. + +**What this says about intent:** + +- The two **primary adapter backends are `LocalHFBackend` and `OpenAIBackend`.** The refactor targets these first. +- Granite Switch (embedded) is the newest addition but is **not** "the premier option": local PEFT via `LocalHFBackend` remains the development/on-prem path and is the only reality that ships with both LoRA and aLoRA today. +- The remaining three backends (`OllamaBackend`, `WatsonxBackend`, `LiteLLMBackend`) are **out of scope for adapter support under this design**. The `WeightsBinding` abstraction does not preclude adding them later, but no issue currently tracks the intent and the underlying providers do not support the mechanisms Mellea's adapters need. +- The design keeps every ✅ cell working, adds clean paths for the ⏳ cells without ad-hoc branching, and leaves empty cells empty rather than stubbing them speculatively. + +## 11. Why the current code is tangled (concrete example) Inside `_util.call_intrinsic`: @@ -297,7 +303,7 @@ Three problems: 2. **The `else` branch calls `obtain_lora` unconditionally** via `IntrinsicAdapter.__init__` → `download_and_get_path`. If the adapter was meant to be a different type, the user sees a misleading download-path error instead of the real cause. 3. **Output parsing assumes one schema.** `result_json["answerability"]` is hardcoded in helpers. When PR #1008 changed `requirement-check` output from `{"requirement_likelihood": 0.9}` to `{"requirement_check": {"score": 0.9}}`, the parsing helper had to be rewritten and the catalog gained a second entry (`requirement_check` for Granite 3.x, `requirement-check` for Granite 4.x) to support both. -## 8. Full #929 thread mapping +## 12. Full #929 thread mapping | Thread | Resolution | | --- | --- | @@ -314,7 +320,7 @@ Three problems: | 6. Catalog cleanup | Catalog becomes optional resolver (`LocalFileBinding.from_catalog(name)`). Custom adapters bypass it; no monkey-patching. Duplicate `requirement_check` / `requirement-check` entries collapse into one entry with two schema versions. | | 7. Hardcoded `requirement-check` refs | Callers look up by **role**, not name. | -## 9. What users see — detailed +## 13. What users see — detailed **High-level helpers** keep their signatures. The `model_options=` parameter is added via PR #1003: @@ -344,9 +350,9 @@ adapter = Adapter(name="answerability", **Backend authors** keep `AdapterMixin` as the backend surface, but it exposes only the verbs a backend naturally has: `load_peft_adapter`, `unload_peft_adapter`, `render_controls`, `set_request_adapter`. Bindings call into these verbs. Adding a new reality = adding a new verb + new binding. -## 10. Observability +## 14. Observability -### 10.1 Why adapters need bespoke observability +### 14.1 Why adapters need bespoke observability Adapter calls hide the complexity that matters most when something goes wrong (weight fetching, activation side-effects, schema contracts). Without per-phase instrumentation, four failure modes are hard or impossible to diagnose — and Mellea has already hit the first two in production: @@ -357,7 +363,7 @@ Adapter calls hide the complexity that matters most when something goes wrong (w Adding instrumentation now costs one span attribute per verb. Retrofitting after the refactor means re-editing every binding. And during a refactor this wide, the fastest way to spot a regression in a specific reality is a dashboard, not a bug report. -### 10.2 Spans and metrics +### 14.2 Spans and metrics **Spans** — each `adapter_scope` wraps a child span tree rooted at `intrinsic.call`: @@ -380,7 +386,7 @@ Standard attributes: `intrinsic.name`, `intrinsic.version`, `intrinsic.role`, `i **Content capture** — gated behind PR #1036's `MELLEA_TRACE_CONTENT` flag. Intrinsics emit `intrinsic.input.kwargs` (structured dict), `intrinsic.output.raw` (raw JSON string), and `intrinsic.output.parsed` (normalised shape) as span events. Different shape from chat `gen_ai.*.message` events because intrinsics have different semantics. -## 11. Docs, tests, tutorials +## 15. Docs, tests, tutorials First-class deliverables, not afterthoughts. @@ -404,7 +410,7 @@ Kept cheap (tens of test cases per adapter, not hundreds) so qualitative runs fi **Release notes** separate: no-op for high-level helper users; deprecated-but-shimmed for direct adapter constructors; removed at Phase 4 (see below). -## 12. Migration (rough shape only) +## 16. Migration (rough shape only) Detail deferred until Part I §5 decisions are agreed, but the intended phasing is: @@ -416,7 +422,7 @@ Detail deferred until Part I §5 decisions are agreed, but the intended phasing Observability and docs deliverables attach to the phase that first exercises them. -## 13. Open questions (full list) +## 17. Open questions (full list) 1. **Naming.** `WeightsBinding` vs `ResourceStrategy` vs `AdapterProvider`. Pick one; the term leaks into error messages. 2. **Lifecycle default** — session-scoped or request-scoped (also in Part I §5). @@ -456,6 +462,20 @@ Linked index of every issue, PR, and commit cited in this document. Use this to | [#1008](https://github.com/generative-computing/mellea/pull/1008) | fix: rewrite requirement_check_to_bool for new schema | worked example for the schema-version story | | [#1028](https://github.com/generative-computing/mellea/pull/1028) | feat: normalize intrinsics interfaces | introduces the factuality rewind path | +#### Rework evidence in detail + +Seven recent fix-up commits in the adapter area, all symptomatic of the design gaps described in §1 rather than straightforward feature work. Referenced from §1 as evidence that this is friction, not theory: + +| Commit / PR | What it fixed | +| --- | --- | +| `1734900d` | Remove `answer_relevance*` intrinsics and unrelated intrinsic issues. | +| `8b6b8d55` ([#972](https://github.com/generative-computing/mellea/pull/972)) | Model options with intrinsics (precedence bug surfaced). | +| `c57aba1d` ([#986](https://github.com/generative-computing/mellea/pull/986)) | Issues introduced by preceding intrinsic changes. | +| `8577d092` ([#994](https://github.com/generative-computing/mellea/pull/994)) | Default intrinsic adapter types; canned I/O with temperature. | +| `4d372b0e` ([#979](https://github.com/generative-computing/mellea/pull/979)) | Key in JSON returned by `policy_guardrails` intrinsic. | +| `0617bd96` ([#1008](https://github.com/generative-computing/mellea/pull/1008)) | Rewrote `requirement_check_to_bool` for a changed output schema; flipped `"requirement_check"` → `"requirement-check"` in four files. | +| `75465d29` ([#946](https://github.com/generative-computing/mellea/pull/946)) | "Simplify intrinsics" — reacting to accumulated complexity. | + ### Related in-flight and planned work | Ref | Title | Role in this doc | From 6b303d8134f1f22215cf0b5fdde342bba0fb61d1 Mon Sep 17 00:00:00 2001 From: Nigel Jones Date: Fri, 8 May 2026 10:54:02 +0100 Subject: [PATCH 10/29] =?UTF-8?q?docs:=20review=20fixes=20=E2=80=94=20remo?= =?UTF-8?q?ve=20'lands'=20vocab,=20resolve=20Q3=20dup,=20soften=20release?= =?UTF-8?q?=20version=20(#929)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Remove three re-emergences of the "lands"/"landed" status vocabulary (§1, §6 API surface, appendix sequencing note). - §17 open-question 4 now back-references §5 Q3 rather than duplicating it with slightly different framing. Reviewers see one question about Reality C, not two. - §6 release planning no longer version-pins ("0.6.0 target") since §16 explicitly says the detail is deferred. Uses "target release (minor, exact number TBD)" / "follow-on minor release" instead. Review feedback captured from an independent review pass. Further should-fix items (§11 overlap with §1, §4 density, ⏳ symbol conflation of different blocking types) left for the author's judgement. Assisted-by: Claude Code Signed-off-by: Nigel Jones --- docs/dev/issue-929-adapter-design-proposal.md | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/docs/dev/issue-929-adapter-design-proposal.md b/docs/dev/issue-929-adapter-design-proposal.md index 875ba1992..dfb9f567e 100644 --- a/docs/dev/issue-929-adapter-design-proposal.md +++ b/docs/dev/issue-929-adapter-design-proposal.md @@ -24,7 +24,7 @@ Three sources of friction have accumulated: 2. **Adapter lifecycle is not modelled.** `call_intrinsic` constructs an `IntrinsicAdapter` as a side effect of invoking one, which triggers an unconditional weight download even when no download is needed. The user sees a misleading download error; the real error is masked. There is no concept of "prepare," "activate," "deactivate" as distinct steps. 3. **Small, visible follow-on issues cluster around these two roots** — a five-place model-options hierarchy with a silent-overwrite bug; JSON output keys hardcoded in helpers (`result_json["answerability"]`) that break when an adapter ships a new output schema; the `"requirement-check"` string duplicated across four files; a `CustomIntrinsicAdapter` whose constructor monkey-patches the global catalog with a self-confessed "temporary hack." -Every thread in #929 is a symptom of not having separated the kinds of adapter and their lifecycles cleanly. This is not a theoretical concern: **seven fix-up commits have landed in the adapter area in recent history** (full list in the appendix), alongside the `obtain_lora`-always-called masked error and the hardcoded `"requirement-check"` strings flagged by #929 point 7 / PR #1008 — the picture is of a subsystem that receives repeated small-scope fixes rather than a stable abstraction. +Every thread in #929 is a symptom of not having separated the kinds of adapter and their lifecycles cleanly. This is not a theoretical concern: **seven fix-up commits have been merged in the adapter area in recent history** (full list in the appendix), alongside the `obtain_lora`-always-called masked error and the hardcoded `"requirement-check"` strings flagged by #929 point 7 / PR #1008 — the picture is of a subsystem that receives repeated small-scope fixes rather than a stable abstraction. ## 2. What we are trying to achieve @@ -117,7 +117,7 @@ Scope of this refactor in concrete terms so reviewers can weigh the cost. ### API surface -- **Unchanged** — every high-level helper (`check_answerability` etc.) keeps its signature. `m.instruct`, `m.validate`, `m.chat` unaffected. The `model_options=` addition from [#1003](https://github.com/generative-computing/mellea/issues/1003) lands on top, not instead. +- **Unchanged** — every high-level helper (`check_answerability` etc.) keeps its signature. `m.instruct`, `m.validate`, `m.chat` unaffected. The `model_options=` addition from [#1003](https://github.com/generative-computing/mellea/issues/1003) arrives on top, not instead. - **Deprecated but shimmed for one release** — `IntrinsicAdapter`, `EmbeddedIntrinsicAdapter`, `CustomIntrinsicAdapter` public classes. Direct users get `DeprecationWarning` pointing to the new constructor. - **Optional, was mandatory** — the adapter catalogue. Stays as a convenience resolver, stops being a gate. - **Possibly moved/renamed** — depends on §5 Q5 (terminology rename scope). @@ -138,8 +138,8 @@ Files and modules touched, approximate: `mellea/backends/adapters/{adapter,catal ### Release planning -- **0.6.0 target**: §5 agreement plus Phases 0–2 of the migration (new `Adapter` / `WeightsBinding` / `IOContract` types, call-site adoption, backend narrowing, deprecation shims for old classes, unified model-option precedence, observability per §14, tests per §15). -- **0.6.x follow-on**: [#1018](https://github.com/generative-computing/mellea/issues/1018) (embedded adapters on `LocalHFBackend`), Phase 4 shim removal. +- **Target release (minor, exact number TBD)**: §5 agreement plus Phases 0–2 of the migration (new `Adapter` / `WeightsBinding` / `IOContract` types, call-site adoption, backend narrowing, deprecation shims for old classes, unified model-option precedence, observability per §14, tests per §15). +- **Follow-on minor release**: [#1018](https://github.com/generative-computing/mellea/issues/1018) (embedded adapters on `LocalHFBackend`), Phase 4 shim removal. - **Deferred until upstream moves**: Reality C / [#27](https://github.com/generative-computing/mellea/issues/27). ### Blocking and unblocking @@ -427,7 +427,7 @@ Observability and docs deliverables attach to the phase that first exercises the 1. **Naming.** `WeightsBinding` vs `ResourceStrategy` vs `AdapterProvider`. Pick one; the term leaks into error messages. 2. **Lifecycle default** — session-scoped or request-scoped (also in Part I §5). 3. **Role vs name.** Free-form `role` string, or a small enum so users can't invent roles backends don't honour? -4. **Reality C idiom.** vLLM LoRA serving first or commercial fine-tunes first (also in Part I §5). +4. **Reality C idiom.** Back-reference to Part I §5 Q3 — no separate question here; the sub-case framing (C1 = vLLM-backed, C2 = commercial fine-tunes) is in §8.3. 5. **Rewind interaction (PR #1028).** `factuality_detection` / `factuality_correction` mutate context via `context.previous_node`. Belongs on `io_contract.build_prompt` (cleaner) or stay in the helper (smaller migration blast radius)? 6. **Telemetry coupling with #1035** (also in Part I §5). 7. **Deprecation window** (also in Part I §5). @@ -489,7 +489,7 @@ Seven recent fix-up commits in the adapter area, all symptomatic of the design g **Why [#1018](https://github.com/generative-computing/mellea/issues/1018) waits for this proposal:** - #1018's own body states: *"May require sorting out some of the issues in #929 first. Or at least creating a comprehensive plan."* -- Once Part I is agreed and Phase 0–2 of the migration have landed, #1018 reduces to *"add the `EmbeddedBinding` path to `LocalHFBackend`"* following the pattern already used for `OpenAIBackend`. +- Once Part I is agreed and Phase 0–2 of the migration have merged, #1018 reduces to *"add the `EmbeddedBinding` path to `LocalHFBackend`"* following the pattern already used for `OpenAIBackend`. - Attempting #1018 without this refactor re-creates the same branching problem on a second backend. ### Verification trail From 8d9de331280fb0ad37fdd7887c60a67e756bca59 Mon Sep 17 00:00:00 2001 From: Nigel Jones Date: Fri, 8 May 2026 10:59:21 +0100 Subject: [PATCH 11/29] docs: apply review should-fix items (#929) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - §4 simplified-invocation pseudocode moved to Part II §9.1; Part I §4 now stays at the shape-agreement level without a code block. - §9 sub-numbering adjusted: §9.1 invocation, §9.2 verbs, §9.3 sequence. - §11 rewritten to name the structural cause (backend-keyed dispatch) rather than re-enumerate the three symptoms already listed in §1. - §10 backend matrix uses distinct ⏳🔽 (blocked by this proposal) and ⏳🔼 (blocked upstream) symbols so reviewers see the difference at a glance; legend updated accordingly. - §5 Q5 split into independent Q5a/Q5b/Q5c so each rename level can be agreed or declined on its own. - §6 "Optional, was mandatory — the adapter catalogue" expanded with a concrete clause explaining what changes for callers. - §17 item 5 now includes a one-sentence gloss of what rewind means before raising the placement question. Assisted-by: Claude Code Signed-off-by: Nigel Jones --- docs/dev/issue-929-adapter-design-proposal.md | 54 +++++++++---------- 1 file changed, 27 insertions(+), 27 deletions(-) diff --git a/docs/dev/issue-929-adapter-design-proposal.md b/docs/dev/issue-929-adapter-design-proposal.md index dfb9f567e..ee624ed21 100644 --- a/docs/dev/issue-929-adapter-design-proposal.md +++ b/docs/dev/issue-929-adapter-design-proposal.md @@ -76,16 +76,7 @@ flowchart LR end ``` -Adapter invocation becomes one flow, no branches on backend type: - -``` -adapter = backend.resolve_adapter(name) -with backend.adapter_scope(adapter): - raw = backend.generate(adapter.io_contract.build_prompt(...)) -return adapter.io_contract.parse(raw) -``` - -From this shape, the seven threads of #929 resolve cleanly. Full verb semantics per binding, the lifecycle sequence diagram, and the thread-by-thread mapping are in Part II (§9 and §12). +Adapter invocation becomes one flow, with no branches on backend type. From this shape, the seven threads of #929 resolve cleanly. The simplified invocation pseudocode, the per-binding verb semantics, the lifecycle sequence diagram, and the thread-by-thread mapping are in Part II (§9 and §12). **What users see:** high-level helpers (`check_answerability` etc.) keep their current shape, with the `model_options=` addition that PR #1003 is introducing. Manual adapter construction collapses from four classes to one, with the binding as the pluggable part. Custom intrinsics no longer require monkey-patching the catalog. Detail in Part II §13. @@ -103,11 +94,10 @@ These gate decomposition; everything else can live in sub-issues once these are 2. **Adapter lifecycle default — session-scoped or request-scoped?** Today's HF backend keeps adapters loaded once added; request-scoped load/unload is safer for multi-tenancy but costs latency on a 7B base. 3. **Reality C target shape.** The active work item is [#27](https://github.com/generative-computing/mellea/issues/27) (aLoRA on remote vLLM), paced by upstream vLLM's position. Do we leave the `ServerMediatedBinding` slot empty in 0.6.0 and populate it when #27 unblocks, or invest in a no-op/stub subclass now? Recommendation: leave empty, design the slot, revisit when upstream moves. 4. **Deprecation window.** How long do `IntrinsicAdapter` / `EmbeddedIntrinsicAdapter` / `CustomIntrinsicAdapter` stay as shims before removal? One minor release is the default; confirm. -5. **Terminology rename scope.** Three levels of commitment to the "adapter over intrinsic" shift: - a. **Prose only** (docs, error messages, help text). Zero breakage. Recommended unconditionally. - b. **Module rename**: `mellea.stdlib.components.intrinsic` → `mellea.stdlib.components.adapter`, with the old path re-exported for one release. Breaking for anyone importing from the submodule path. - c. **AST class rename**: `Intrinsic` → something like `AdapterCall`, with `Intrinsic` as an alias for one release. Breaking for advanced users calling `mfuncs.act(Intrinsic(...))` directly. - Confirm how deep to go. +5. **Terminology rename scope.** Three independent yes/no decisions for the "adapter over intrinsic" shift — each can be taken or declined on its own merits: + - **Q5a. Prose rename** — shift docs, error messages, help text, and dev-doc vocabulary to "adapter." Zero breakage. Recommended unconditionally. + - **Q5b. Module rename** — rename `mellea.stdlib.components.intrinsic` → `mellea.stdlib.components.adapter`, with the old path re-exported for one release. Breaking for anyone importing from the submodule path. + - **Q5c. AST class rename** — rename `Intrinsic` → something like `AdapterCall`, with `Intrinsic` as an alias for one release. Breaking for advanced users calling `mfuncs.act(Intrinsic(...))` directly. > **Implementation note, not a reviewer question:** intrinsic-level observability (§14) should coordinate with the in-flight [#1035](https://github.com/generative-computing/mellea/issues/1035) / [PR #1036](https://github.com/generative-computing/mellea/pull/1036) work so content capture uses the same `MELLEA_TRACE_CONTENT` flag and doesn't get designed twice. Flagged here for awareness; sequenced during implementation. @@ -119,7 +109,7 @@ Scope of this refactor in concrete terms so reviewers can weigh the cost. - **Unchanged** — every high-level helper (`check_answerability` etc.) keeps its signature. `m.instruct`, `m.validate`, `m.chat` unaffected. The `model_options=` addition from [#1003](https://github.com/generative-computing/mellea/issues/1003) arrives on top, not instead. - **Deprecated but shimmed for one release** — `IntrinsicAdapter`, `EmbeddedIntrinsicAdapter`, `CustomIntrinsicAdapter` public classes. Direct users get `DeprecationWarning` pointing to the new constructor. -- **Optional, was mandatory** — the adapter catalogue. Stays as a convenience resolver, stops being a gate. +- **Optional, was mandatory** — the adapter catalogue. Callers no longer have to register custom adapters in `catalog.py` before use; the catalogue stays as a convenience resolver for first-party names, not a precondition. - **Possibly moved/renamed** — depends on §5 Q5 (terminology rename scope). ### User-archetype impact @@ -220,7 +210,20 @@ Both share: **no local weight loading, API-parameter activation, `io.yaml` still ## 9. End-state design detail -### 9.1 Weights binding verbs per reality +### 9.1 Simplified invocation + +Adapter invocation collapses to a single flow with no branching on backend type: + +``` +adapter = backend.resolve_adapter(name) +with backend.adapter_scope(adapter): + raw = backend.generate(adapter.io_contract.build_prompt(...)) +return adapter.io_contract.parse(raw) +``` + +Every verb that varies per reality lives inside `adapter_scope` (see §9.3); the outer flow is the same whether the adapter is a local PEFT file, an embedded Granite Switch adapter, or a server-mediated one. + +### 9.2 Weights binding verbs per reality Each concrete binding implements the four-verb set from Part I §4. The column meanings do not change between realities — only what happens inside the verb does. @@ -232,7 +235,7 @@ Each concrete binding implements the four-verb set from Part I §4. The column m `release()` is implemented per-binding as needed (cache eviction for LocalFile; no-op for the others). -### 9.2 Lifecycle sequence +### 9.3 Lifecycle sequence The lifecycle inside `adapter_scope` is the same for every binding — only the verbs do reality-specific work: @@ -271,13 +274,13 @@ Mellea currently exposes five backends. Adapter support varies — and is not a | Backend | Reality A (Local PEFT) | Reality B (Embedded) | Reality C (Server-mediated) | Notes | | ------------------- | :--------------------: | :------------------: | :-------------------------: | --- | -| `LocalHFBackend` | ✅ today | ⏳ [#1018](https://github.com/generative-computing/mellea/issues/1018) | — | Primary local backend; only one with aLoRA support today. | -| `OpenAIBackend` | — | ✅ today ([#881](https://github.com/generative-computing/mellea/pull/881)) | ⏳ [#27](https://github.com/generative-computing/mellea/issues/27) | OpenAI-compatible endpoint, including vLLM servers. | +| `LocalHFBackend` | ✅ today | ⏳🔽 [#1018](https://github.com/generative-computing/mellea/issues/1018) | — | Primary local backend; only one with aLoRA support today. | +| `OpenAIBackend` | — | ✅ today ([#881](https://github.com/generative-computing/mellea/pull/881)) | ⏳🔼 [#27](https://github.com/generative-computing/mellea/issues/27) | OpenAI-compatible endpoint, including vLLM servers. | | `OllamaBackend` | — | — | — | Ollama's LoRA/PEFT story is GGUF-based and immature; not a current target. | | `WatsonxBackend` | — | — | — | Would require watsonx-side adapter support; no current plan. | | `LiteLLMBackend` | — | — | — | Multi-provider shim; adapter support would depend on the underlying provider and is not a coherent single-backend target. Could opportunistically inherit C2 if any wrapped provider exposes fine-tuned identifiers. | -Legend: ✅ supported today, ⏳ planned future work tracked by the linked issue, — not applicable or not planned. +Legend: ✅ supported today; ⏳🔽 planned, blocked by this proposal (downstream); ⏳🔼 planned, blocked by an upstream dependency outside Mellea; — not applicable or not planned. **What this says about intent:** @@ -288,7 +291,7 @@ Legend: ✅ supported today, ⏳ planned future work tracked by the linked issue ## 11. Why the current code is tangled (concrete example) -Inside `_util.call_intrinsic`: +Part I §1 listed the symptoms; this section names the *structural* cause. The single piece of code that most clearly shows it is the branch in `_util.call_intrinsic`: ```python if getattr(backend, "_uses_embedded_adapters", False): @@ -297,11 +300,8 @@ else: intrinsic_adapter = IntrinsicAdapter(...) # Reality A path ``` -Three problems: +This is a **backend-keyed dispatch** where the branching key (`_uses_embedded_adapters`) is a property of the backend rather than of the adapter. Every new reality forces a new branch, and the `else` path is not a generic fallback — it is the Reality A path, so it unconditionally calls `obtain_lora` whether or not the adapter needs downloading. The three symptoms in §1 (misleading download errors, rigid output parsing, hardcoded role strings) are *all* consequences of this same shape: "the adapter doesn't know what kind it is, so the call site guesses." The new design flips this: the **binding** says what kind it is, and the backend simply executes its verbs. -1. **`_uses_embedded_adapters` is a backend flag, not an adapter property.** It hard-codes "this backend type → always this adapter type." Reality C needs a third branch, then a fourth if a backend supports both. -2. **The `else` branch calls `obtain_lora` unconditionally** via `IntrinsicAdapter.__init__` → `download_and_get_path`. If the adapter was meant to be a different type, the user sees a misleading download-path error instead of the real cause. -3. **Output parsing assumes one schema.** `result_json["answerability"]` is hardcoded in helpers. When PR #1008 changed `requirement-check` output from `{"requirement_likelihood": 0.9}` to `{"requirement_check": {"score": 0.9}}`, the parsing helper had to be rewritten and the catalog gained a second entry (`requirement_check` for Granite 3.x, `requirement-check` for Granite 4.x) to support both. ## 12. Full #929 thread mapping @@ -428,7 +428,7 @@ Observability and docs deliverables attach to the phase that first exercises the 2. **Lifecycle default** — session-scoped or request-scoped (also in Part I §5). 3. **Role vs name.** Free-form `role` string, or a small enum so users can't invent roles backends don't honour? 4. **Reality C idiom.** Back-reference to Part I §5 Q3 — no separate question here; the sub-case framing (C1 = vLLM-backed, C2 = commercial fine-tunes) is in §8.3. -5. **Rewind interaction (PR #1028).** `factuality_detection` / `factuality_correction` mutate context via `context.previous_node`. Belongs on `io_contract.build_prompt` (cleaner) or stay in the helper (smaller migration blast radius)? +5. **Rewind interaction (PR #1028).** Some helpers — specifically `factuality_detection` and `factuality_correction` — need to re-format the conversation so that documents are attached to the *last assistant message* rather than earlier in the history. They currently do this by walking back through `context.previous_node`. Question: does that rewind logic belong on `io_contract.build_prompt` (cleaner separation of concerns) or stay in the helper functions (smaller migration blast radius)? 6. **Telemetry coupling with #1035** (also in Part I §5). 7. **Deprecation window** (also in Part I §5). From e4fd2fc6a688e98c4732f9a664c1fd7d07fe39e4 Mon Sep 17 00:00:00 2001 From: Nigel Jones Date: Wed, 13 May 2026 11:03:24 +0100 Subject: [PATCH 12/29] docs: incorporate Jacob and Paul feedback into adapter design proposal (#929) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Add "Proposal at a glance" summary at top of Part I so the end-state and five gate decisions are visible immediately without reading the full problem/goals preamble (Paul's readability concern) - Q2: record Jacob's position — auto-load yes, auto-unload no; note LocalHFBackend is primarily single-user/local so multi-tenancy concern is lower priority - Q3: update Reality C framing — vLLM path confirmed blocked by Paul; recommendation is to design the slot, leave it empty, not invest in stubs - Q5: surface Jacob's competing model ("Intrinsic" must stay as IBM term; "adapter" = backend artefact, "intrinsic" = user abstraction) and Paul's "create new, deprecate old" preference; present as open question requiring alignment rather than three independent yes/no flags - Add backend consumption detail: clarify that the weights binding (not the backend) owns activation logic — EmbeddedBinding renders controls, LocalFileBinding calls PEFT load_adapter; backend calls binding.activate() uniformly (Jacob's missing detail) - Note granite-common / granite-formatters boundary constraint on io_contract.build_prompt() and io_contract.parse() (Jacob) - Flag io_contract naming as open discussion (Jacob prefers io_config) - Update §8.3 Reality C: vLLM status confirmed blocked, not "deferred" - Update §10 LocalHFBackend: note individual/local use context (Paul) Assisted-by: Claude Code --- docs/dev/issue-929-adapter-design-proposal.md | 53 +++++++++++++++---- 1 file changed, 44 insertions(+), 9 deletions(-) diff --git a/docs/dev/issue-929-adapter-design-proposal.md b/docs/dev/issue-929-adapter-design-proposal.md index ee624ed21..3d0be0f3c 100644 --- a/docs/dev/issue-929-adapter-design-proposal.md +++ b/docs/dev/issue-929-adapter-design-proposal.md @@ -14,6 +14,32 @@ # Part I — Summary for agreement +## Proposal at a glance + +**What changes:** three separate adapter classes (`IntrinsicAdapter`, `EmbeddedIntrinsicAdapter`, `CustomIntrinsicAdapter`) collapse into one `Adapter`: + +``` +Adapter = identity + io_contract + weights_binding +``` + +The `weights_binding` is pluggable — `LocalFileBinding`, `EmbeddedBinding`, or `ServerMediatedBinding` — each exposing the same four verbs (`prepare`, `activate`, `deactivate`, `release`). The backend calls these uniformly; it does not branch on adapter type. + +**What stays the same:** all high-level helpers (`check_answerability`, `requirement_check`, etc.) keep their current signatures. Deprecated classes are shimmed for one release. + +**Five decisions gate decomposition:** + +| # | Question | Status | +| --- | --- | --- | +| Q1 | Does the `Adapter = identity + io_contract + weights` shape hold? | Open | +| Q2 | Lifecycle default: session-scoped (no auto-unload) or request-scoped? | Lean: session-scoped, no auto-unload | +| Q3 | Reality C (server-mediated): design slot now, fill later — or leave fully empty? | Lean: design slot, leave empty | +| Q4 | Deprecation window for old classes? | Lean: 1 minor release (~4–6 weeks) | +| Q5 | Terminology: replace "intrinsic" with "adapter", or keep both with distinct meanings? | Open — competing positions received | + +Detail on each in §5. + +--- + ## 1. The problem we are solving Mellea intrinsics — `check_answerability`, `requirement_check`, `find_citations`, the Guardian helpers — let users add specialised capabilities to a base model. Under the hood each one is an **adapter**: a small artefact that specialises the base model for that one task. @@ -91,13 +117,18 @@ Of Mellea's five backends (`LocalHFBackend`, `OpenAIBackend`, `OllamaBackend`, ` These gate decomposition; everything else can live in sub-issues once these are agreed. 1. **Does the end-state shape (§4) hold?** Three realities, `Adapter = identity + io_contract + weights`, role-based lookup for rerouting. Yes / no / what's missing. -2. **Adapter lifecycle default — session-scoped or request-scoped?** Today's HF backend keeps adapters loaded once added; request-scoped load/unload is safer for multi-tenancy but costs latency on a 7B base. -3. **Reality C target shape.** The active work item is [#27](https://github.com/generative-computing/mellea/issues/27) (aLoRA on remote vLLM), paced by upstream vLLM's position. Do we leave the `ServerMediatedBinding` slot empty in 0.6.0 and populate it when #27 unblocks, or invest in a no-op/stub subclass now? Recommendation: leave empty, design the slot, revisit when upstream moves. +2. **Adapter lifecycle default — session-scoped or request-scoped?** Today's HF backend keeps adapters loaded once added; request-scoped load/unload is safer for multi-tenancy but costs latency on a 7B base. **Position received (Jacob):** auto-load yes, auto-unload no — once activated, leave the adapter loaded; let the caller or session teardown trigger explicit `release()`. The multi-tenancy concern is reduced for `LocalHFBackend`, which is primarily a single-user/local backend (see §10). +3. **Reality C target shape.** The aLoRA-on-vLLM path ([#27](https://github.com/generative-computing/mellea/issues/27)) is currently blocked: vLLM has declined to upstream aLoRA support (see §8.3 for history). **Position received (Paul):** leave non-switch vLLM adapters alone; no near-term path there. Recommendation: design the `ServerMediatedBinding` slot so the interface is clean when/if the upstream situation changes, but leave it empty and do not invest in stubs. 4. **Deprecation window.** How long do `IntrinsicAdapter` / `EmbeddedIntrinsicAdapter` / `CustomIntrinsicAdapter` stay as shims before removal? One minor release is the default; confirm. -5. **Terminology rename scope.** Three independent yes/no decisions for the "adapter over intrinsic" shift — each can be taken or declined on its own merits: - - **Q5a. Prose rename** — shift docs, error messages, help text, and dev-doc vocabulary to "adapter." Zero breakage. Recommended unconditionally. - - **Q5b. Module rename** — rename `mellea.stdlib.components.intrinsic` → `mellea.stdlib.components.adapter`, with the old path re-exported for one release. Breaking for anyone importing from the submodule path. - - **Q5c. AST class rename** — rename `Intrinsic` → something like `AdapterCall`, with `Intrinsic` as an alias for one release. Breaking for advanced users calling `mfuncs.act(Intrinsic(...))` directly. +5. **Terminology rename scope.** Feedback received challenges the framing in this proposal. Three competing positions: + - **This proposal's model:** "adapter" replaces "intrinsic" as the primary user-facing term; `Intrinsic` AST class and module path are renamed with shims. + - **Jacob's model:** keep "Intrinsic" — it is IBM's term and must survive. The semantic split is: "adapter" = the backend artefact (weights loaded by the backend); "intrinsic" = the user-facing abstraction (helper functions, input/output parsing, classes). Both names stay, with distinct meanings. + - **Paul's preference:** create a new `Adapter` API alongside the existing intrinsic API and deprecate the old — implement new, don't rename. + + These three models need alignment before §5 Q1 can be finalised. The three original sub-questions remain relevant once the higher-level question is resolved: + - **Q5a. Prose rename** — shift docs, error messages, help text to "adapter." Zero breakage. Likely agreed regardless of model chosen. + - **Q5b. Module rename** — rename `mellea.stdlib.components.intrinsic` → `mellea.stdlib.components.adapter`, with the old path re-exported for one release. Breaking for submodule importers. + - **Q5c. AST class rename** — rename `Intrinsic` → something like `AdapterCall`, with `Intrinsic` as an alias. If Jacob's model is adopted, Q5c answer is "no rename" — `Intrinsic` stays as the AST component name and receives a precise definition alongside the new `Adapter` class. > **Implementation note, not a reviewer question:** intrinsic-level observability (§14) should coordinate with the in-flight [#1035](https://github.com/generative-computing/mellea/issues/1035) / [PR #1036](https://github.com/generative-computing/mellea/pull/1036) work so content capture uses the same `MELLEA_TRACE_CONTENT` flag and doesn't get designed twice. Flagged here for awareness; sequenced during implementation. @@ -166,7 +197,7 @@ Names matter because they appear in user-facing error messages, docs, and teleme | **Adapter** | The user-facing term for a specialised capability added to a base model — answerability, citations, requirement-check, etc. In the redesign, `Adapter` is one class composed of three parts (identity, I/O contract, weights binding). This is the primary noun users and docs should reach for. | | **Intrinsic** | Legacy Mellea term for the same concept. Still appears in the current class names (`Intrinsic` AST component, `IntrinsicAdapter`, `mellea.stdlib.components.intrinsic` module). The direction of travel is to fold "intrinsic" language into "adapter" — the rename scope is a decision in Part I §5. | | **Identity** | The part of an adapter that says *what it is*: name (e.g. `answerability`), adapter type (`lora` / `alora`), schema version, and optional role. | -| **I/O contract** | The parsed `io.yaml` — prompt template, output parser, model-option defaults. Always present, same shape regardless of reality. | +| **I/O contract** | The parsed `io.yaml` — prompt template, output parser, model-option defaults. Always present, same shape regardless of reality. *Name under discussion: Jacob prefers `io_config`; `io_contract` is used throughout this proposal but is not final.* | | **Weights binding** | The part of an adapter that says *how its weights are made available*. Three subclasses, one per reality. Exposes `prepare`, `activate`, `deactivate`, `release`. | | **Reality A / B / C** | Shorthand for the three "where the weights live" stories: A = local PEFT file, B = shipped with the base model (Granite Switch), C = server-mediated (future OpenAI/vLLM). | | **LoRA / aLoRA** | Two PEFT technologies. LoRA weights always participate; aLoRA only participates after an activation token is seen. A single adapter ships as one or the other (some intrinsics as either); both are supported across all three realities (including embedded — Granite Switch has LoRA and aLoRA adapters in the same repo, `technology` field on each). | @@ -197,7 +228,7 @@ The OpenAI-compatible backend **already supports adapters** — but only embedde **The history (corrected):** Mellea previously ran aLoRA adapters through the OpenAI backend against a **custom vLLM build** that carried an aLoRA patch. The upstream vLLM project declined to merge that patch (confirmed in [PR #543](https://github.com/generative-computing/mellea/pull/543)'s review: "the vLLM aLoRA PR will not [be] accepted, so the alora/intrinsics code for openai is now all dead code"), so PR #543 removed the dead path. Upstream vLLM has therefore **never carried** aLoRA support — the right framing is "declined upstream," not "dropped." -**Live tracking item:** [Issue #27 "Add support for aloras to remote vllm when vllm supports it"](https://github.com/generative-computing/mellea/issues/27) is the open work item for this reality. It remains open because the upstream situation has not changed. +**Current status (confirmed by Paul):** The aLoRA-on-vLLM path is blocked. vLLM has declined the upstream aLoRA patch, and there is no known path to change this. [Issue #27](https://github.com/generative-computing/mellea/issues/27) remains open to track any change in upstream position, but it is not a near-term delivery target. The design slot in this proposal exists as an interface commitment — if the upstream situation ever changes, here is the clean implementation path — not as an active work item. **Scope of this reality:** whatever the eventual technology path, the design slot is the same. Two sub-cases the binding must accommodate when the path becomes viable: @@ -223,6 +254,8 @@ return adapter.io_contract.parse(raw) Every verb that varies per reality lives inside `adapter_scope` (see §9.3); the outer flow is the same whether the adapter is a local PEFT file, an embedded Granite Switch adapter, or a server-mediated one. +> **Boundary constraint:** `io_contract.build_prompt()` and `io_contract.parse()` must delegate to `granite-common` / `granite-formatters` for all `io.yaml` handling and parsing. The `IOContract` class in Mellea wraps these libraries; it does not re-implement their logic. (Jacob's requirement — keep `io.yaml` parsing in the granite-common / granite-formatters boundary.) + ### 9.2 Weights binding verbs per reality Each concrete binding implements the four-verb set from Part I §4. The column meanings do not change between realities — only what happens inside the verb does. @@ -235,6 +268,8 @@ Each concrete binding implements the four-verb set from Part I §4. The column m `release()` is implemented per-binding as needed (cache eviction for LocalFile; no-op for the others). +> **Which class knows an adapter doesn't need PEFT activation? The binding does — not the backend.** `EmbeddedBinding.activate()` renders `controls` JSON into the chat template; `LocalFileBinding.activate()` calls PEFT `load_adapter`. The backend calls `binding.activate()` uniformly and has no conditional on binding type. This is the mechanism that eliminates the `if getattr(backend, "_uses_embedded_adapters", False):` branch (§11). When embedded-adapter support is later added to `LocalHFBackend` ([#1018](https://github.com/generative-computing/mellea/issues/1018)), the backend does not need to learn about embedding — it calls the same verbs, and `EmbeddedBinding` handles the difference. The backend only needs the verb interface. (Addressing Jacob's review question on backend consumption.) + ### 9.3 Lifecycle sequence The lifecycle inside `adapter_scope` is the same for every binding — only the verbs do reality-specific work: @@ -274,7 +309,7 @@ Mellea currently exposes five backends. Adapter support varies — and is not a | Backend | Reality A (Local PEFT) | Reality B (Embedded) | Reality C (Server-mediated) | Notes | | ------------------- | :--------------------: | :------------------: | :-------------------------: | --- | -| `LocalHFBackend` | ✅ today | ⏳🔽 [#1018](https://github.com/generative-computing/mellea/issues/1018) | — | Primary local backend; only one with aLoRA support today. | +| `LocalHFBackend` | ✅ today | ⏳🔽 [#1018](https://github.com/generative-computing/mellea/issues/1018) | — | Primary local backend; only one with aLoRA support today. Primarily used for individual/local deployments rather than multi-tenant environments (Paul). | | `OpenAIBackend` | — | ✅ today ([#881](https://github.com/generative-computing/mellea/pull/881)) | ⏳🔼 [#27](https://github.com/generative-computing/mellea/issues/27) | OpenAI-compatible endpoint, including vLLM servers. | | `OllamaBackend` | — | — | — | Ollama's LoRA/PEFT story is GGUF-based and immature; not a current target. | | `WatsonxBackend` | — | — | — | Would require watsonx-side adapter support; no current plan. | From 04454c7519eaee22d6a8cfc87eccb573d705d480 Mon Sep 17 00:00:00 2001 From: Nigel Jones Date: Fri, 15 May 2026 09:47:37 +0100 Subject: [PATCH 13/29] docs: relocate proposal to docs/dev/proposals/ ahead of Draft PR (#929) Moves docs/dev/issue-929-adapter-design-proposal.md to docs/dev/proposals/929-adapter-lifecycle.md to introduce a proposals/ subdir convention and to make the file's draft/review status legible from its path. The proposal itself is unchanged by this commit. Assisted-by: Claude Code Signed-off-by: Nigel Jones --- .../929-adapter-lifecycle.md} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename docs/dev/{issue-929-adapter-design-proposal.md => proposals/929-adapter-lifecycle.md} (100%) diff --git a/docs/dev/issue-929-adapter-design-proposal.md b/docs/dev/proposals/929-adapter-lifecycle.md similarity index 100% rename from docs/dev/issue-929-adapter-design-proposal.md rename to docs/dev/proposals/929-adapter-lifecycle.md From 2d570f87b2c32d5ddcdd573a1eda160cc5152cd1 Mon Sep 17 00:00:00 2001 From: Nigel Jones Date: Fri, 15 May 2026 11:46:04 +0100 Subject: [PATCH 14/29] docs: realign post-feedback drift; Q4 deprecation framing; Q5 labels (#929) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Five contradictions / drift fixes plus the matching §13 comment cleanup, all from feedback round 3 (Jake commit comment + Paul Slack): - Status callout (line 5): drop "not a PR candidate" framing now that the Draft PR is the review channel; minimal "proposal" wording. - Terminology preamble (line 9): proposal's working position is now Jake's split — Intrinsic = user-facing term, Adapter = backend artefact. Drops the rename framing. - §3 Key terms: split single Adapter glossary entry into Intrinsic + Adapter, matching the new lean. - §5 Q5 labels: "This proposal's model" → "Original proposal model"; "Jacob's model" → "Current lean (Jacob's framing)". Positions and ordering unchanged. The PR thread continues to resolve Q5. - §5 Q4 body: add Working position (Jake's at-least-one-release framing with Paul's 4-6 week unit) and a breakage sub-question coupled to Q5. - §13 example comments: distinguish the adapter (artefact) from the intrinsic (capability) cleanly; code unchanged. Assisted-by: Claude Code Signed-off-by: Nigel Jones --- docs/dev/proposals/929-adapter-lifecycle.md | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/docs/dev/proposals/929-adapter-lifecycle.md b/docs/dev/proposals/929-adapter-lifecycle.md index 3d0be0f3c..37c5459dc 100644 --- a/docs/dev/proposals/929-adapter-lifecycle.md +++ b/docs/dev/proposals/929-adapter-lifecycle.md @@ -2,11 +2,11 @@ > **Addresses:** [Epic #929 — Fix Intrinsic Adapter Lifecycle & Consistency in Mellea](https://github.com/generative-computing/mellea/issues/929). Read the epic first if you haven't; it catalogues the specific threads this proposal tries to resolve coherently rather than individually. > -> **Status:** proposal for shape agreement; not a PR candidate. Preserved on a branch for review. Once agreed, the content moves into `docs/dev/intrinsics_and_adapters.md` as the current-state doc and this file is deleted. +> **Status:** proposal. Design docs produced during implementation will live under `docs/dev/`. > > **Structure:** **Part I** covers the problem, goals, terminology, end state, and the decisions that gate decomposition. **Part II** contains supporting detail — read after Part I is agreed, not before. > -> **Terminology:** this proposal uses **"adapter"** as the primary user-facing term. "Intrinsic" appears only as the legacy name where it still refers to existing Mellea classes (e.g. the `Intrinsic` AST component). Rename strategy is in §5 Q5. +> **Terminology:** **"Intrinsic"** is the user-facing term: helpers, input/output parsing, the `Intrinsic` AST component (matching IBM's terminology). **"Adapter"** is the backend artefact: the weights loaded by a backend. > > **Related issues and prior work:** see the appendix at the end of this document for a linked index with annotations. @@ -65,7 +65,8 @@ Four outcomes, in order of importance. Detail on each lives in Part II; this lis Only the few terms needed to read Part I: -- **Adapter** — the user-facing term for a specialised capability added to a base model (answerability, requirement-check, etc.). The "adapter" replaces Mellea's legacy term "intrinsic" in prose; legacy class names (`Intrinsic`, `IntrinsicAdapter`) are a migration question, not a meaning question. +- **Intrinsic** — the user-facing capability: helper functions like `check_answerability`, the `Intrinsic` AST component, and input/output parsing. Implemented by an adapter. +- **Adapter** — the backend artefact: the weights loaded by a backend (LoRA / aLoRA / embedded), with its identity and I/O contract. - **Base model** — the general-purpose LLM everything runs on top of (e.g. `ibm-granite/granite-4.1-3b`). - **LoRA / aLoRA** — the two PEFT technologies adapters use. Both are supported. - **Reality A / B / C** — shorthand introduced in §4 for the three "where the weights live" stories. @@ -119,10 +120,10 @@ These gate decomposition; everything else can live in sub-issues once these are 1. **Does the end-state shape (§4) hold?** Three realities, `Adapter = identity + io_contract + weights`, role-based lookup for rerouting. Yes / no / what's missing. 2. **Adapter lifecycle default — session-scoped or request-scoped?** Today's HF backend keeps adapters loaded once added; request-scoped load/unload is safer for multi-tenancy but costs latency on a 7B base. **Position received (Jacob):** auto-load yes, auto-unload no — once activated, leave the adapter loaded; let the caller or session teardown trigger explicit `release()`. The multi-tenancy concern is reduced for `LocalHFBackend`, which is primarily a single-user/local backend (see §10). 3. **Reality C target shape.** The aLoRA-on-vLLM path ([#27](https://github.com/generative-computing/mellea/issues/27)) is currently blocked: vLLM has declined to upstream aLoRA support (see §8.3 for history). **Position received (Paul):** leave non-switch vLLM adapters alone; no near-term path there. Recommendation: design the `ServerMediatedBinding` slot so the interface is clean when/if the upstream situation changes, but leave it empty and do not invest in stubs. -4. **Deprecation window.** How long do `IntrinsicAdapter` / `EmbeddedIntrinsicAdapter` / `CustomIntrinsicAdapter` stay as shims before removal? One minor release is the default; confirm. +4. **Deprecation window.** How long do `IntrinsicAdapter` / `EmbeddedIntrinsicAdapter` / `CustomIntrinsicAdapter` stay as shims before removal? **Working position:** at least one minor release; longer if user impact warrants (Jake's framing). One minor release ≈ 4–6 weeks (Paul). **Sub-question:** can this land without breakage at all? Under the current Q5 lean, `IntrinsicAdapter` could stay as a re-export of `Adapter`, which would remove the deprecation-window pressure entirely. 5. **Terminology rename scope.** Feedback received challenges the framing in this proposal. Three competing positions: - - **This proposal's model:** "adapter" replaces "intrinsic" as the primary user-facing term; `Intrinsic` AST class and module path are renamed with shims. - - **Jacob's model:** keep "Intrinsic" — it is IBM's term and must survive. The semantic split is: "adapter" = the backend artefact (weights loaded by the backend); "intrinsic" = the user-facing abstraction (helper functions, input/output parsing, classes). Both names stay, with distinct meanings. + - **Original proposal model:** "adapter" replaces "intrinsic" as the primary user-facing term; `Intrinsic` AST class and module path are renamed with shims. + - **Current lean (Jacob's framing):** keep "Intrinsic" — it is IBM's term and must survive. The semantic split is: "adapter" = the backend artefact (weights loaded by the backend); "intrinsic" = the user-facing abstraction (helper functions, input/output parsing, classes). Both names stay, with distinct meanings. - **Paul's preference:** create a new `Adapter` API alongside the existing intrinsic API and deprecate the old — implement new, don't rename. These three models need alignment before §5 Q1 can be finalised. The three original sub-questions remain relevant once the higher-level question is resolved: @@ -368,17 +369,17 @@ score = check_answerability(question, documents, context, backend, **Manual adapter construction** collapses from four classes (`IntrinsicAdapter`, `EmbeddedIntrinsicAdapter`, `CustomIntrinsicAdapter`, abstract base) to one `Adapter` + a binding: ```python -# Stock intrinsic from the catalogue: +# Adapter for the answerability intrinsic (from catalogue): adapter = Adapter(name="answerability", weights=LocalFileBinding.from_catalog("answerability")) -# Custom intrinsic — no catalog monkey-patching: +# Adapter for a custom intrinsic — no catalog monkey-patching: adapter = Adapter(name="my-thing", weights=LocalFileBinding(source="myuser/my-adapter", base_model_name="granite-4.1-3b"), io_contract=IOContract.from_yaml("./io.yaml")) -# Granite Switch embedded: +# Adapter for the Granite Switch embedded variant: adapter = Adapter(name="answerability", weights=EmbeddedBinding.from_base_model(backend)) ``` From 93aa780e2ac46f9dbade192390fde4a3f6ca94cb Mon Sep 17 00:00:00 2001 From: Nigel Jones Date: Fri, 15 May 2026 11:54:02 +0100 Subject: [PATCH 15/29] docs: define schema_version, surface granite-common dependency (#929) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Jake req 1 from his commit-comment review (8d9de33#r184732700): - §7 glossary: add Schema version row, flagged as a proposed addition to io.yaml that requires granite-common / granite-formatters team agreement (they own the io.yaml format). - §17 open questions: new Q8 surfacing the cross-team negotiation — worth suggesting to that team, or do they have another approach? Plus a bundled cleanup of two §7 entries the previous round of contradiction fixes missed: - §7 Adapter row: rewrite to match the new Q5 lean (adapter = backend artefact, not user-facing term). - §7 Intrinsic row: rewrite as user-facing capability, no longer "legacy" terminology. - Order swapped so §7 matches §3 (Intrinsic first, Adapter second). Assisted-by: Claude Code Signed-off-by: Nigel Jones --- docs/dev/proposals/929-adapter-lifecycle.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/docs/dev/proposals/929-adapter-lifecycle.md b/docs/dev/proposals/929-adapter-lifecycle.md index 37c5459dc..d47601e71 100644 --- a/docs/dev/proposals/929-adapter-lifecycle.md +++ b/docs/dev/proposals/929-adapter-lifecycle.md @@ -195,9 +195,10 @@ Names matter because they appear in user-facing error messages, docs, and teleme | Term | Meaning | | --- | --- | | **Base model** | The general-purpose LLM (e.g. `ibm-granite/granite-4.1-3b`) that everything runs on top of. | -| **Adapter** | The user-facing term for a specialised capability added to a base model — answerability, citations, requirement-check, etc. In the redesign, `Adapter` is one class composed of three parts (identity, I/O contract, weights binding). This is the primary noun users and docs should reach for. | -| **Intrinsic** | Legacy Mellea term for the same concept. Still appears in the current class names (`Intrinsic` AST component, `IntrinsicAdapter`, `mellea.stdlib.components.intrinsic` module). The direction of travel is to fold "intrinsic" language into "adapter" — the rename scope is a decision in Part I §5. | +| **Intrinsic** | The user-facing capability: helper functions (`check_answerability`, `requirement_check`, the Guardian helpers), the `Intrinsic` AST component, input/output parsing. Backed by an adapter. The name is kept as IBM's terminology — current Q5 lean (see Part I §5). | +| **Adapter** | The backend artefact: the weights loaded by a backend (LoRA / aLoRA / embedded), with its identity, I/O contract, and weights binding. The user-facing **Intrinsic** wraps an adapter to provide helpers and parsing. In the redesign, the class hierarchy collapses from four (`IntrinsicAdapter` / `EmbeddedIntrinsicAdapter` / `CustomIntrinsicAdapter` + abstract base) to one `Adapter` + a pluggable binding. | | **Identity** | The part of an adapter that says *what it is*: name (e.g. `answerability`), adapter type (`lora` / `alora`), schema version, and optional role. | +| **Schema version** | *Proposed addition; not in `io.yaml` today.* A label in an adapter's `io.yaml` (`schema_version:`, default `v1`) identifying the shape of its output. Bumped when an adapter's output keys, nesting, or types change. The I/O contract dispatches its parser on `(name, schema_version)` so v1 and v2 can coexist; helpers consume a normalised post-parse shape. **Adoption requires agreement from the granite-common / granite-formatters team** (who own the `io.yaml` format). | | **I/O contract** | The parsed `io.yaml` — prompt template, output parser, model-option defaults. Always present, same shape regardless of reality. *Name under discussion: Jacob prefers `io_config`; `io_contract` is used throughout this proposal but is not final.* | | **Weights binding** | The part of an adapter that says *how its weights are made available*. Three subclasses, one per reality. Exposes `prepare`, `activate`, `deactivate`, `release`. | | **Reality A / B / C** | Shorthand for the three "where the weights live" stories: A = local PEFT file, B = shipped with the base model (Granite Switch), C = server-mediated (future OpenAI/vLLM). | @@ -467,6 +468,7 @@ Observability and docs deliverables attach to the phase that first exercises the 5. **Rewind interaction (PR #1028).** Some helpers — specifically `factuality_detection` and `factuality_correction` — need to re-format the conversation so that documents are attached to the *last assistant message* rather than earlier in the history. They currently do this by walking back through `context.previous_node`. Question: does that rewind logic belong on `io_contract.build_prompt` (cleaner separation of concerns) or stay in the helper functions (smaller migration blast radius)? 6. **Telemetry coupling with #1035** (also in Part I §5). 7. **Deprecation window** (also in Part I §5). +8. **`schema_version` field in `io.yaml`.** §4, §9, and §12 all assume the `io.yaml` parsed by granite-common / granite-formatters carries a `schema_version`. It doesn't today, so this is asking that team to add a field. Worth suggesting to them? Or do they have another approach to versioning? --- From 067e31afb946cdda819d2010939d976f2a10f140 Mon Sep 17 00:00:00 2001 From: Nigel Jones Date: Fri, 15 May 2026 11:56:19 +0100 Subject: [PATCH 16/29] docs: sane HF defaults for Adapter construction (#929) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Jake req 2 from his commit-comment review: identity, io_contract, and weights are tightly coupled, so when an adapter's weights come from a HuggingFace repo, the io_contract should default to the io.yaml in that same repo — callers shouldn't have to pass io_contract= explicitly in the common case. - §4: add "Sane defaults" callout after the Adapter composition diagram. - §13: update the custom-intrinsic example to show the default path (no explicit io_contract); the override path is shown as a commented-out variant. Assisted-by: Claude Code Signed-off-by: Nigel Jones --- docs/dev/proposals/929-adapter-lifecycle.md | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/docs/dev/proposals/929-adapter-lifecycle.md b/docs/dev/proposals/929-adapter-lifecycle.md index d47601e71..b9a077e05 100644 --- a/docs/dev/proposals/929-adapter-lifecycle.md +++ b/docs/dev/proposals/929-adapter-lifecycle.md @@ -84,6 +84,8 @@ Adapter └── weights — one of three pluggable bindings (LocalFile, Embedded, ServerMediated) ``` +**Sane defaults:** when an adapter's weights come from a HuggingFace repo, the `io_contract` defaults to the `io.yaml` in that same repo. Callers rarely pass `io_contract=` explicitly. Identity, I/O contract, and weights are tightly coupled by design; the defaults treat them as a unit. + The **weights binding** is where the three realities live. It exposes a single verb set — `prepare`, `activate`, `deactivate`, `release` — that every backend calls uniformly. What each verb does per reality lives in §9; the high-level picture is all three realities converging on one shared `io_contract`: ```mermaid @@ -374,11 +376,12 @@ score = check_answerability(question, documents, context, backend, adapter = Adapter(name="answerability", weights=LocalFileBinding.from_catalog("answerability")) -# Adapter for a custom intrinsic — no catalog monkey-patching: +# Adapter for a custom intrinsic — io.yaml auto-loaded from the same HF repo: adapter = Adapter(name="my-thing", weights=LocalFileBinding(source="myuser/my-adapter", - base_model_name="granite-4.1-3b"), - io_contract=IOContract.from_yaml("./io.yaml")) + base_model_name="granite-4.1-3b")) +# To override io_contract with a local file: +# adapter = Adapter(..., io_contract=IOContract.from_yaml("./io.yaml")) # Adapter for the Granite Switch embedded variant: adapter = Adapter(name="answerability", From 19f99cc9b1f3af7618d506bfefa904927de0b2be Mon Sep 17 00:00:00 2001 From: Nigel Jones Date: Fri, 15 May 2026 12:09:44 +0100 Subject: [PATCH 17/29] =?UTF-8?q?docs:=20weight/schema=20versioning=20?= =?UTF-8?q?=E2=80=94=20HF=20commits=20as=20version,=20schema=5Fversion=20n?= =?UTF-8?q?arrowed=20(#929)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Jake req 3 from his commit-comment review: adapters may update their weights and/or output schemas; helpers must respond. Reframed around HF commit SHA as the existing version mechanism, rather than inventing a Mellea-side concept: - §2.2 (Safe evolution): weights versioned by HF commit SHA; output schemas stable in the common case; rare breaking schema change handled by pinning, schema_version dispatch, or helpers raising on mismatch. - §7 Schema version glossary: narrowed to "parser-dispatch field for breaking schema changes only"; flags that granite-common may already have a versioning mechanism we should reuse. - §9 callout: weight updates use HF commit SHA; prepare() resolves configured revision and refreshes cache; refresh policy open. - §17 Q9: weight-refresh policy as a focused open question (per-session refresh + long-running-process exception). Assisted-by: Claude Code Signed-off-by: Nigel Jones --- docs/dev/proposals/929-adapter-lifecycle.md | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/docs/dev/proposals/929-adapter-lifecycle.md b/docs/dev/proposals/929-adapter-lifecycle.md index b9a077e05..8212c75b2 100644 --- a/docs/dev/proposals/929-adapter-lifecycle.md +++ b/docs/dev/proposals/929-adapter-lifecycle.md @@ -57,7 +57,7 @@ Every thread in #929 is a symptom of not having separated the kinds of adapter a Four outcomes, in order of importance. Detail on each lives in Part II; this list is the ask. 1. **One adapter model, one code path.** Reasonable from the outside, unified from the inside — no more `if backend._uses_embedded_adapters:` branches. -2. **Safe evolution.** Model-option precedence is documented and enforced; adapter output schemas can version without breaking callers. +2. **Safe evolution.** Model-option precedence is documented and enforced. Adapter weights are versioned by HF commit SHA — Mellea can pin to a specific revision for stability or track latest for newest weights (refresh policy in §17 Q9). Output schemas are stable in the common case (new weights, same schema); the rare breaking schema change is handled either by pinning, by `schema_version` parser dispatch (§17 Q8), or by helpers raising on mismatch (Jake req 4). Helpers like `check_answerability` see a normalised result regardless of underlying churn. 3. **First-class customer adapters.** Customers can ship their own against the same API as first-party ones — today it requires patching the catalog or subclassing a self-confessed "temporary hack" ([#424](https://github.com/generative-computing/mellea/issues/424)). 4. **Observable and parity-respecting.** Every lifecycle phase is a distinct span; high-level helpers (`check_answerability` etc.) keep their shape; manual adapter construction becomes simpler, not harder. @@ -200,7 +200,7 @@ Names matter because they appear in user-facing error messages, docs, and teleme | **Intrinsic** | The user-facing capability: helper functions (`check_answerability`, `requirement_check`, the Guardian helpers), the `Intrinsic` AST component, input/output parsing. Backed by an adapter. The name is kept as IBM's terminology — current Q5 lean (see Part I §5). | | **Adapter** | The backend artefact: the weights loaded by a backend (LoRA / aLoRA / embedded), with its identity, I/O contract, and weights binding. The user-facing **Intrinsic** wraps an adapter to provide helpers and parsing. In the redesign, the class hierarchy collapses from four (`IntrinsicAdapter` / `EmbeddedIntrinsicAdapter` / `CustomIntrinsicAdapter` + abstract base) to one `Adapter` + a pluggable binding. | | **Identity** | The part of an adapter that says *what it is*: name (e.g. `answerability`), adapter type (`lora` / `alora`), schema version, and optional role. | -| **Schema version** | *Proposed addition; not in `io.yaml` today.* A label in an adapter's `io.yaml` (`schema_version:`, default `v1`) identifying the shape of its output. Bumped when an adapter's output keys, nesting, or types change. The I/O contract dispatches its parser on `(name, schema_version)` so v1 and v2 can coexist; helpers consume a normalised post-parse shape. **Adoption requires agreement from the granite-common / granite-formatters team** (who own the `io.yaml` format). | +| **Schema version** | *Proposed parser-dispatch field for breaking schema changes only.* For routine weight updates the HF commit SHA is the version (no new field needed). `schema_version` would only earn its keep if the granite team ships a *breaking* output-schema change (different keys, nesting, or types) and unpinned callers need graceful v1↔v2 parser dispatch. **Open** (§17 Q8) — granite-common may already have a versioning mechanism we should reuse instead of inventing this. | | **I/O contract** | The parsed `io.yaml` — prompt template, output parser, model-option defaults. Always present, same shape regardless of reality. *Name under discussion: Jacob prefers `io_config`; `io_contract` is used throughout this proposal but is not final.* | | **Weights binding** | The part of an adapter that says *how its weights are made available*. Three subclasses, one per reality. Exposes `prepare`, `activate`, `deactivate`, `release`. | | **Reality A / B / C** | Shorthand for the three "where the weights live" stories: A = local PEFT file, B = shipped with the base model (Granite Switch), C = server-mediated (future OpenAI/vLLM). | @@ -274,6 +274,8 @@ Each concrete binding implements the four-verb set from Part I §4. The column m > **Which class knows an adapter doesn't need PEFT activation? The binding does — not the backend.** `EmbeddedBinding.activate()` renders `controls` JSON into the chat template; `LocalFileBinding.activate()` calls PEFT `load_adapter`. The backend calls `binding.activate()` uniformly and has no conditional on binding type. This is the mechanism that eliminates the `if getattr(backend, "_uses_embedded_adapters", False):` branch (§11). When embedded-adapter support is later added to `LocalHFBackend` ([#1018](https://github.com/generative-computing/mellea/issues/1018)), the backend does not need to learn about embedding — it calls the same verbs, and `EmbeddedBinding` handles the difference. The backend only needs the verb interface. (Addressing Jacob's review question on backend consumption.) +> **Weight updates:** weights are versioned by HF commit SHA. `prepare()` resolves the configured revision (`main` by default, or a pinned SHA) and refreshes the local cache when upstream has moved. Refresh policy and the long-running-process exception are open (§17 Q9). + ### 9.3 Lifecycle sequence The lifecycle inside `adapter_scope` is the same for every binding — only the verbs do reality-specific work: @@ -472,6 +474,7 @@ Observability and docs deliverables attach to the phase that first exercises the 6. **Telemetry coupling with #1035** (also in Part I §5). 7. **Deprecation window** (also in Part I §5). 8. **`schema_version` field in `io.yaml`.** §4, §9, and §12 all assume the `io.yaml` parsed by granite-common / granite-formatters carries a `schema_version`. It doesn't today, so this is asking that team to add a field. Worth suggesting to them? Or do they have another approach to versioning? +9. **Weight-refresh policy.** Adapter weights are versioned by HF commit SHA. When Mellea is configured to track latest (no pin), how often does `prepare()` re-resolve the upstream revision? Per-session-start is the natural answer; long-running processes (sessions spanning a release) need either an explicit `refresh()` API or accept stale weights until restart. --- From f4d0ca3e9f4c90a1c3a9ff46a5b09dcd529c2ca3 Mon Sep 17 00:00:00 2001 From: Nigel Jones Date: Fri, 15 May 2026 12:12:58 +0100 Subject: [PATCH 18/29] docs: helpers raise on output-schema mismatch (#929) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Jake req 4 from his commit-comment review: top-level helpers (check_answerability etc.) should raise an exception when the adapter's output structure doesn't match the helper's expectations. - §13: add "Validation on parse" paragraph naming AdapterSchemaMismatchError as the exception, with name + observed keys + expected keys in the message. - §14.1 silent-schema-drift bullet: tie the existing parse_failures counter to the new exception — counter is the dashboard signal, exception is the runtime signal. Closes the silent-drift loophole that PR #1008's requirement_check_to_bool regression illustrated. Assisted-by: Claude Code Signed-off-by: Nigel Jones --- docs/dev/proposals/929-adapter-lifecycle.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/docs/dev/proposals/929-adapter-lifecycle.md b/docs/dev/proposals/929-adapter-lifecycle.md index 8212c75b2..05c9e2072 100644 --- a/docs/dev/proposals/929-adapter-lifecycle.md +++ b/docs/dev/proposals/929-adapter-lifecycle.md @@ -371,6 +371,8 @@ score = check_answerability(question, documents, context, backend, model_options={"temperature": 0.1}) ``` +**Validation on parse.** Helpers declare their expected output shape; `io_contract.parse()` validates against it and raises `AdapterSchemaMismatchError` on mismatch — with `name`, observed keys, and expected keys in the message. Schema drift is loud, not silent. (Jake req 4.) + **Manual adapter construction** collapses from four classes (`IntrinsicAdapter`, `EmbeddedIntrinsicAdapter`, `CustomIntrinsicAdapter`, abstract base) to one `Adapter` + a binding: ```python @@ -399,7 +401,7 @@ adapter = Adapter(name="answerability", Adapter calls hide the complexity that matters most when something goes wrong (weight fetching, activation side-effects, schema contracts). Without per-phase instrumentation, four failure modes are hard or impossible to diagnose — and Mellea has already hit the first two in production: 1. **Masked errors.** The `obtain_lora`-always-called bug (#929 point 1b) showed users a misleading download error while the real cause (adapter-type mismatch) stayed invisible. A span at the `prepare` boundary recording the exception would have surfaced the actual cause on first run. -2. **Silent schema drift.** When PR #1008 changed `requirement-check` output from `{"requirement_likelihood": 0.9}` to `{"requirement_check": {"score": 0.9}}`, `requirement_check_to_bool` silently returned `False` for every call until someone noticed. A `parse_failures` counter labelled by `(name, version)` would have climbed immediately; a parse-status span attribute would have shown every call as "parsed with warnings." +2. **Silent schema drift.** When PR #1008 changed `requirement-check` output from `{"requirement_likelihood": 0.9}` to `{"requirement_check": {"score": 0.9}}`, `requirement_check_to_bool` silently returned `False` for every call until someone noticed. Under Jake req 4 (helpers raise on schema mismatch), this would have surfaced as `AdapterSchemaMismatchError` on the first call after the schema change — the caller gets a named error instead of a silently wrong value. The `parse_failures` counter labelled by `(name, version)` is the dashboard signal; the exception is the runtime signal. 3. **Latency attribution.** "`check_answerability` is slow" is unanswerable today — download, PEFT load, generation, and JSON parse collapse into one backend span. Phase-level spans make the culprit obvious in any trace viewer. 4. **Alerting and cost attribution.** OTel `ERROR` status on failed download/activation makes generic dashboards and alerts work. Token counts labelled by adapter answer "which capability is 30% of our spend?" Both impossible today. From e89c8c0a033cc4530a11aaa631fdc22aa200c2c8 Mon Sep 17 00:00:00 2001 From: Nigel Jones Date: Fri, 15 May 2026 12:16:33 +0100 Subject: [PATCH 19/29] docs: version pinning for auto-loaded adapters (#929) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Jake req 5 from his commit-comment review: auto-loaded adapters should pin to a known-good revision rather than blindly tracking upstream's default branch. - §17 Q10: pinning policy as a new open question; recommendation is pin by default via catalogue entries' recorded SHAs, with revision="main" as explicit opt-in to track latest. - §13: catalogue example comment now spells out that the catalogue entry carries a pinned HF commit SHA, with the override pattern. Closes the loop on auto-loading reproducibility — couples cleanly with Q9 weight-refresh policy (pinned adapters skip the refresh check). Assisted-by: Claude Code Signed-off-by: Nigel Jones --- docs/dev/proposals/929-adapter-lifecycle.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/docs/dev/proposals/929-adapter-lifecycle.md b/docs/dev/proposals/929-adapter-lifecycle.md index 05c9e2072..dff348220 100644 --- a/docs/dev/proposals/929-adapter-lifecycle.md +++ b/docs/dev/proposals/929-adapter-lifecycle.md @@ -376,9 +376,11 @@ score = check_answerability(question, documents, context, backend, **Manual adapter construction** collapses from four classes (`IntrinsicAdapter`, `EmbeddedIntrinsicAdapter`, `CustomIntrinsicAdapter`, abstract base) to one `Adapter` + a binding: ```python -# Adapter for the answerability intrinsic (from catalogue): +# Adapter for the answerability intrinsic (auto-loaded from catalogue; pinned revision): adapter = Adapter(name="answerability", weights=LocalFileBinding.from_catalog("answerability")) +# Catalogue entry includes a pinned HF commit SHA (Jake req 5). +# Pass revision="main" to LocalFileBinding directly to override and track latest. # Adapter for a custom intrinsic — io.yaml auto-loaded from the same HF repo: adapter = Adapter(name="my-thing", @@ -477,6 +479,7 @@ Observability and docs deliverables attach to the phase that first exercises the 7. **Deprecation window** (also in Part I §5). 8. **`schema_version` field in `io.yaml`.** §4, §9, and §12 all assume the `io.yaml` parsed by granite-common / granite-formatters carries a `schema_version`. It doesn't today, so this is asking that team to add a field. Worth suggesting to them? Or do they have another approach to versioning? 9. **Weight-refresh policy.** Adapter weights are versioned by HF commit SHA. When Mellea is configured to track latest (no pin), how often does `prepare()` re-resolve the upstream revision? Per-session-start is the natural answer; long-running processes (sessions spanning a release) need either an explicit `refresh()` API or accept stale weights until restart. +10. **Version pinning for auto-loaded adapters.** When an adapter is auto-loaded from the catalogue (caller didn't specify a revision), should Mellea pin to a known-good revision (the catalogue entry's recorded SHA) or track upstream's default branch? **Recommendation:** pin by default — catalogue entries record a pinned SHA; `revision="main"` is an explicit opt-in to track latest. Pinning gives reproducibility; explicit tracking gives latest weights at the cost of behaviour drift between runs. (Jake req 5; coupled to Q9 weight-refresh policy.) --- From 3bd7dd588d1578f72cc031528a03f4eedfc33dcb Mon Sep 17 00:00:00 2001 From: Nigel Jones Date: Fri, 15 May 2026 14:55:18 +0100 Subject: [PATCH 20/29] docs: post-review consistency fixes for schema-version reframing (#929) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Six drift fixes from a coherence review of the doc + one banned-word correction. All flow from the Jake req 1 schema_version reframing (commit 19f99cc9) which left several places still asserting schema_version dispatch as if it were a settled mechanism. - §4 composition diagram: schema version field now qualified as "(proposed; see §7 / §17 Q8)". - §12 row 4c: rewritten to layered framing — HF commit SHA is the version; breaking schema changes handled by pinning + Jake req 4; schema_version dispatch optional layer. - §14.2 metrics text: "without a version bump" rephrased as "at a new HF revision that the local parser doesn't yet handle"; ties parse_failures counter to AdapterSchemaMismatchError. - §6 Risk row: rewritten to layered framing matching §12 row 4c; worked example reframed around AdapterSchemaMismatchError. - §14.2 spans diagram: parse span attributes now show "revision, schema_version*" with asterisk denoting proposed. - §15 tutorial title: "Shipping a new schema version" → "Handling a breaking schema change without breaking users"; body lists the three layers. - §5 Q4 body: pre-existing "can this land" → "can this ship" (banned-word cleanup). Assisted-by: Claude Code Signed-off-by: Nigel Jones --- docs/dev/proposals/929-adapter-lifecycle.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/docs/dev/proposals/929-adapter-lifecycle.md b/docs/dev/proposals/929-adapter-lifecycle.md index dff348220..22bb1ee06 100644 --- a/docs/dev/proposals/929-adapter-lifecycle.md +++ b/docs/dev/proposals/929-adapter-lifecycle.md @@ -79,7 +79,7 @@ An **Adapter** is a small object composed of three parts: ``` Adapter -├── identity — name, adapter type (lora/alora), schema version, optional role +├── identity — name, adapter type (lora/alora), schema version (proposed; see §7 / §17 Q8), optional role ├── io_contract — parsed io.yaml: prompt building, output parsing, model options └── weights — one of three pluggable bindings (LocalFile, Embedded, ServerMediated) ``` @@ -122,7 +122,7 @@ These gate decomposition; everything else can live in sub-issues once these are 1. **Does the end-state shape (§4) hold?** Three realities, `Adapter = identity + io_contract + weights`, role-based lookup for rerouting. Yes / no / what's missing. 2. **Adapter lifecycle default — session-scoped or request-scoped?** Today's HF backend keeps adapters loaded once added; request-scoped load/unload is safer for multi-tenancy but costs latency on a 7B base. **Position received (Jacob):** auto-load yes, auto-unload no — once activated, leave the adapter loaded; let the caller or session teardown trigger explicit `release()`. The multi-tenancy concern is reduced for `LocalHFBackend`, which is primarily a single-user/local backend (see §10). 3. **Reality C target shape.** The aLoRA-on-vLLM path ([#27](https://github.com/generative-computing/mellea/issues/27)) is currently blocked: vLLM has declined to upstream aLoRA support (see §8.3 for history). **Position received (Paul):** leave non-switch vLLM adapters alone; no near-term path there. Recommendation: design the `ServerMediatedBinding` slot so the interface is clean when/if the upstream situation changes, but leave it empty and do not invest in stubs. -4. **Deprecation window.** How long do `IntrinsicAdapter` / `EmbeddedIntrinsicAdapter` / `CustomIntrinsicAdapter` stay as shims before removal? **Working position:** at least one minor release; longer if user impact warrants (Jake's framing). One minor release ≈ 4–6 weeks (Paul). **Sub-question:** can this land without breakage at all? Under the current Q5 lean, `IntrinsicAdapter` could stay as a re-export of `Adapter`, which would remove the deprecation-window pressure entirely. +4. **Deprecation window.** How long do `IntrinsicAdapter` / `EmbeddedIntrinsicAdapter` / `CustomIntrinsicAdapter` stay as shims before removal? **Working position:** at least one minor release; longer if user impact warrants (Jake's framing). One minor release ≈ 4–6 weeks (Paul). **Sub-question:** can this ship without breakage at all? Under the current Q5 lean, `IntrinsicAdapter` could stay as a re-export of `Adapter`, which would remove the deprecation-window pressure entirely. 5. **Terminology rename scope.** Feedback received challenges the framing in this proposal. Three competing positions: - **Original proposal model:** "adapter" replaces "intrinsic" as the primary user-facing term; `Intrinsic` AST class and module path are renamed with shims. - **Current lean (Jacob's framing):** keep "Intrinsic" — it is IBM's term and must survive. The semantic split is: "adapter" = the backend artefact (weights loaded by the backend); "intrinsic" = the user-facing abstraction (helper functions, input/output parsing, classes). Both names stay, with distinct meanings. @@ -181,7 +181,7 @@ Files and modules touched, approximate: `mellea/backends/adapters/{adapter,catal ### Risk - **Biggest unknown**: whether the unified `resolve_model_options` handles every combination currently in use. Mitigation: keep the five-layer precedence explicit, add per-adapter override documentation, and assert resolved values in tests. -- **Second biggest**: schema-version dispatch (§12 and §9 in Part II). Worked example is the [#1008](https://github.com/generative-computing/mellea/pull/1008) `requirement-check` change — verifying v1 and v2 both pass through cleanly gates the parsing refactor. +- **Second biggest**: handling breaking schema changes from upstream. Three layers: pinning (avoid the risk), `schema_version` parser dispatch (§17 Q8, proposed), helpers raising `AdapterSchemaMismatchError` on parse mismatch (Jake req 4, loud safety net). Worked example: the [#1008](https://github.com/generative-computing/mellea/pull/1008) `requirement-check` change would have surfaced as `AdapterSchemaMismatchError` on the first call after the schema change, rather than silently returning `False`. - **Mitigated by**: per-phase test-parity commitment (nothing merges if existing tests regress); observability introduced alongside the refactor so production regressions surface as dashboard signals rather than silent behavioural drift. --- @@ -356,7 +356,7 @@ This is a **backend-keyed dispatch** where the branching key (`_uses_embedded_ad | 3. Naming consistency | Three-axis identity (`name`, `adapter_type`, `version`) plus explicit `role`. | | 4a. `call_intrinsic` assumes one output schema | `io_contract.parse()` dispatches on `(name, version)`; helpers see normalised shape. | | 4b. Per-adapter vs standard schema | `io_contract.parse()` is per-adapter; helpers define the normalised post-parse shape. | -| 4c. Versioning | Schema version declared in `io.yaml` (`schema_version:`); defaults to `v1`. | +| 4c. Versioning | HF commit SHA is the version (every push = new revision; pin via `revision="..."` for stability). Breaking schema changes (rare) handled by pinning + helpers raising `AdapterSchemaMismatchError` on parse mismatch (Jake req 4); `schema_version` parser dispatch (§17 Q8) is an optional layer if the granite team adds the field. | | 5. OpenAI backend support | Ships as one or two `ServerMediatedBinding` subclasses. | | 6. Catalog cleanup | Catalog becomes optional resolver (`LocalFileBinding.from_catalog(name)`). Custom adapters bypass it; no monkey-patching. Duplicate `requirement_check` / `requirement-check` entries collapse into one entry with two schema versions. | | 7. Hardcoded `requirement-check` refs | Callers look up by **role**, not name. | @@ -419,7 +419,7 @@ graph TD root --> prep["intrinsic.prepare
LocalFile: download ms"] root --> act["intrinsic.activate
peft_name / controls / api_id"] root --> gen["intrinsic.generate
(regular backend span:
tokens, latency)
"] - root --> par["intrinsic.parse
schema_version,
parse_ok, raw_len
"] + root --> par["intrinsic.parse
revision, schema_version*,
parse_ok, raw_len
"] root --> deact["intrinsic.deactivate"] ``` @@ -428,7 +428,7 @@ Standard attributes: `intrinsic.name`, `intrinsic.version`, `intrinsic.role`, `i **Metrics** — an `IntrinsicMetricsPlugin` alongside the existing Token / Latency / Error plugins: - `mellea.intrinsic.invocations` — counter labelled by name, version, binding type, adapter type, outcome. - `mellea.intrinsic.phase_duration_ms` — histogram labelled by name, phase. -- `mellea.intrinsic.parse_failures` — counter labelled by name, version. This is the **schema-drift detector**: a climbing counter against a specific `(name, version)` pair means an upstream adapter shipped a schema change without a version bump. +- `mellea.intrinsic.parse_failures` — counter labelled by name, revision. This is the **schema-drift detector**: a climbing counter against a specific `(name, revision)` pair means an upstream adapter pushed a breaking schema change at a new HF revision that the local parser doesn't yet handle. Each increment matches an `AdapterSchemaMismatchError` raised at the call site (Jake req 4). **Content capture** — gated behind PR #1036's `MELLEA_TRACE_CONTENT` flag. Intrinsics emit `intrinsic.input.kwargs` (structured dict), `intrinsic.output.raw` (raw JSON string), and `intrinsic.output.parsed` (normalised shape) as span events. Different shape from chat `gen_ai.*.message` events because intrinsics have different semantics. @@ -451,7 +451,7 @@ Kept cheap (tens of test cases per adapter, not hundreds) so qualitative runs fi **Tutorials** — three worth writing alongside the refactor: - "Adding a custom intrinsic in 20 lines" — replaces the `CustomIntrinsicAdapter` monkey-patch story. -- "Shipping a new schema version without breaking users" — worked example using `requirement-check` v1 → v2. +- "Handling a breaking schema change without breaking users" — worked example using `requirement-check` v1 → v2; covers HF revision pinning, `AdapterSchemaMismatchError` (Jake req 4), and `schema_version` dispatch if §17 Q8 is adopted. - "Reading intrinsic telemetry" — short dashboard-building guide. **Release notes** separate: no-op for high-level helper users; deprecated-but-shimmed for direct adapter constructors; removed at Phase 4 (see below). From 43dc32da3a0374da3d12fe99e2b0ff71d449b6e5 Mon Sep 17 00:00:00 2001 From: Nigel Jones Date: Fri, 15 May 2026 15:00:03 +0100 Subject: [PATCH 21/29] =?UTF-8?q?docs:=20update=20=C2=A716=20phasing=20wit?= =?UTF-8?q?h=20Jake=20reqs=20and=20open-question=20dependencies=20(#929)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Targeted patches to bring §16 in line with the work added in commits 93aa780e through 3bd7dd58: - Phase 0: type list flagged as Q5-conditional (user-facing Intrinsic class only if Jake's split is adopted); catalogue entries gain pinned HF revision SHAs (Jake req 5; Q10). - Phase 1: helpers gain output validation raising AdapterSchemaMismatchError (Jake req 4). - Phase 2: bindings implement the four-verb set; LocalFileBinding.prepare resolves HF revision per Q9 weight-refresh policy. Phase 3 and Phase 4 unchanged. Assisted-by: Claude Code Signed-off-by: Nigel Jones --- docs/dev/proposals/929-adapter-lifecycle.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/dev/proposals/929-adapter-lifecycle.md b/docs/dev/proposals/929-adapter-lifecycle.md index 22bb1ee06..f0efa1708 100644 --- a/docs/dev/proposals/929-adapter-lifecycle.md +++ b/docs/dev/proposals/929-adapter-lifecycle.md @@ -460,9 +460,9 @@ Kept cheap (tens of test cases per adapter, not hundreds) so qualitative runs fi Detail deferred until Part I §5 decisions are agreed, but the intended phasing is: -1. **Phase 0 — parallel types.** Introduce `Adapter` / `WeightsBinding` / `IOContract` alongside existing classes. No call-site changes, tests unchanged. -2. **Phase 1 — callers move.** `_util.call_intrinsic`, requirement rerouting, and each helper switch to new types. Old classes become deprecation shims. -3. **Phase 2 — backends move.** `AdapterMixin` narrows to the new verb set. Backends drop per-call `_simplify_and_merge` in favour of `resolve_model_options`. +1. **Phase 0 — parallel types.** Introduce the new types (`Adapter`, `WeightsBinding`, `IOContract`, plus a user-facing `Intrinsic` class if Q5 is settled on Jake's split) alongside existing classes. Catalogue entries gain pinned HF revision SHAs (Jake req 5; Q10). No call-site changes, tests unchanged. +2. **Phase 1 — callers move.** `_util.call_intrinsic`, requirement rerouting, and each helper switch to new types. Helpers gain output validation raising `AdapterSchemaMismatchError` on parse mismatch (Jake req 4). Old classes become deprecation shims. +3. **Phase 2 — backends move.** `AdapterMixin` narrows to the new verb set. Bindings implement `prepare` / `activate` / `deactivate` / `release` per reality; `LocalFileBinding.prepare` resolves the configured HF revision (Q9 weight-refresh policy). Backends drop per-call `_simplify_and_merge` in favour of `resolve_model_options`. 4. **Phase 3 — Reality C ships.** `ServerMediatedBinding` subclass(es) written; OpenAI backend drops `_uses_embedded_adapters` hard-code. 5. **Phase 4 — shim removal.** After one minor release with deprecation warnings. From ed48a2081e8e144bda8a76e77150eeb2bad0181f Mon Sep 17 00:00:00 2001 From: Nigel Jones Date: Fri, 15 May 2026 15:02:09 +0100 Subject: [PATCH 22/29] =?UTF-8?q?docs:=20resolve=20"version"=20ambiguity?= =?UTF-8?q?=20across=20=C2=A712,=20=C2=A714=20(#929)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Final-review pass caught residual "version" ambiguity from the schema_version reframe. Under the layered model (HF commit SHA = the unambiguous version mechanism; schema_version = proposed parser-dispatch field for breaking schema changes only; AdapterSchemaMismatchError = loud safety net), the doc now uses "revision" consistently for the HF commit SHA and reserves "schema_version" for its narrowed proposed purpose. Touched in this commit: - §12 row 3: identity tuple uses `revision` not `version`. - §12 row 4a: parse() validates per Jake req 4; schema_version dispatch reframed as optional (Q8-conditional) layer. - §12 row 6: catalog cleanup row reframed around the v1→v2 PR #1008 worked example using Jake req 4 + optional schema_version dispatch. - §14.1 silent-schema-drift bullet: parse_failures counter labelled by (name, revision), matching §14.2. - §14.2 root span attributes: revision (not version). - §14.2 standard attributes list: intrinsic.revision (not intrinsic.version). - §14.2 invocations counter: labelled by name, revision, ... Assisted-by: Claude Code Signed-off-by: Nigel Jones --- docs/dev/proposals/929-adapter-lifecycle.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/docs/dev/proposals/929-adapter-lifecycle.md b/docs/dev/proposals/929-adapter-lifecycle.md index f0efa1708..30f18b4d7 100644 --- a/docs/dev/proposals/929-adapter-lifecycle.md +++ b/docs/dev/proposals/929-adapter-lifecycle.md @@ -353,12 +353,12 @@ This is a **backend-keyed dispatch** where the branching key (`_uses_embedded_ad | 1c. Backend- + adapter-type-specific abstraction | `WeightsBinding` is the adapter-type axis; `AdapterMixin` verbs are the backend axis. | | 2a. Intrinsic rewriters overwrite options | `Adapter.resolve_model_options()` replaces the five-place merge with one documented stack. | | 2b/2c. Model-option hierarchy | Five layers enforced in `resolve_model_options` (base model → adapter config → `io.yaml` defaults → `io.yaml` per-intrinsic → caller). | -| 3. Naming consistency | Three-axis identity (`name`, `adapter_type`, `version`) plus explicit `role`. | -| 4a. `call_intrinsic` assumes one output schema | `io_contract.parse()` dispatches on `(name, version)`; helpers see normalised shape. | +| 3. Naming consistency | Three-axis identity (`name`, `adapter_type`, `revision`) plus explicit `role`. | +| 4a. `call_intrinsic` assumes one output schema | `io_contract.parse()` validates the output shape and raises `AdapterSchemaMismatchError` on mismatch (Jake req 4); helpers see a normalised shape. Dispatch on `(name, schema_version)` is an optional layer if §17 Q8 is adopted. | | 4b. Per-adapter vs standard schema | `io_contract.parse()` is per-adapter; helpers define the normalised post-parse shape. | | 4c. Versioning | HF commit SHA is the version (every push = new revision; pin via `revision="..."` for stability). Breaking schema changes (rare) handled by pinning + helpers raising `AdapterSchemaMismatchError` on parse mismatch (Jake req 4); `schema_version` parser dispatch (§17 Q8) is an optional layer if the granite team adds the field. | | 5. OpenAI backend support | Ships as one or two `ServerMediatedBinding` subclasses. | -| 6. Catalog cleanup | Catalog becomes optional resolver (`LocalFileBinding.from_catalog(name)`). Custom adapters bypass it; no monkey-patching. Duplicate `requirement_check` / `requirement-check` entries collapse into one entry with two schema versions. | +| 6. Catalog cleanup | Catalog becomes optional resolver (`LocalFileBinding.from_catalog(name)`). Custom adapters bypass it; no monkey-patching. Duplicate `requirement_check` / `requirement-check` entries collapse into one entry; the v1 → v2 output-schema change (PR #1008) is handled by Jake req 4 + optional `schema_version` dispatch (§17 Q8). | | 7. Hardcoded `requirement-check` refs | Callers look up by **role**, not name. | ## 13. What users see — detailed @@ -403,7 +403,7 @@ adapter = Adapter(name="answerability", Adapter calls hide the complexity that matters most when something goes wrong (weight fetching, activation side-effects, schema contracts). Without per-phase instrumentation, four failure modes are hard or impossible to diagnose — and Mellea has already hit the first two in production: 1. **Masked errors.** The `obtain_lora`-always-called bug (#929 point 1b) showed users a misleading download error while the real cause (adapter-type mismatch) stayed invisible. A span at the `prepare` boundary recording the exception would have surfaced the actual cause on first run. -2. **Silent schema drift.** When PR #1008 changed `requirement-check` output from `{"requirement_likelihood": 0.9}` to `{"requirement_check": {"score": 0.9}}`, `requirement_check_to_bool` silently returned `False` for every call until someone noticed. Under Jake req 4 (helpers raise on schema mismatch), this would have surfaced as `AdapterSchemaMismatchError` on the first call after the schema change — the caller gets a named error instead of a silently wrong value. The `parse_failures` counter labelled by `(name, version)` is the dashboard signal; the exception is the runtime signal. +2. **Silent schema drift.** When PR #1008 changed `requirement-check` output from `{"requirement_likelihood": 0.9}` to `{"requirement_check": {"score": 0.9}}`, `requirement_check_to_bool` silently returned `False` for every call until someone noticed. Under Jake req 4 (helpers raise on schema mismatch), this would have surfaced as `AdapterSchemaMismatchError` on the first call after the schema change — the caller gets a named error instead of a silently wrong value. The `parse_failures` counter labelled by `(name, revision)` is the dashboard signal; the exception is the runtime signal. 3. **Latency attribution.** "`check_answerability` is slow" is unanswerable today — download, PEFT load, generation, and JSON parse collapse into one backend span. Phase-level spans make the culprit obvious in any trace viewer. 4. **Alerting and cost attribution.** OTel `ERROR` status on failed download/activation makes generic dashboards and alerts work. Token counts labelled by adapter answer "which capability is 30% of our spend?" Both impossible today. @@ -415,7 +415,7 @@ Adding instrumentation now costs one span attribute per verb. Retrofitting after ```mermaid graph TD - root["intrinsic.call
name, version, role,
binding_type, adapter_type
"] + root["intrinsic.call
name, revision, role,
binding_type, adapter_type
"] root --> prep["intrinsic.prepare
LocalFile: download ms"] root --> act["intrinsic.activate
peft_name / controls / api_id"] root --> gen["intrinsic.generate
(regular backend span:
tokens, latency)
"] @@ -423,10 +423,10 @@ graph TD root --> deact["intrinsic.deactivate"] ``` -Standard attributes: `intrinsic.name`, `intrinsic.version`, `intrinsic.role`, `intrinsic.adapter_type`, `intrinsic.binding_type`, `intrinsic.source`, `intrinsic.target`. Errors set OTel `ERROR` status (aligns with #1035 gap 4). +Standard attributes: `intrinsic.name`, `intrinsic.revision`, `intrinsic.role`, `intrinsic.adapter_type`, `intrinsic.binding_type`, `intrinsic.source`, `intrinsic.target`. Errors set OTel `ERROR` status (aligns with #1035 gap 4). **Metrics** — an `IntrinsicMetricsPlugin` alongside the existing Token / Latency / Error plugins: -- `mellea.intrinsic.invocations` — counter labelled by name, version, binding type, adapter type, outcome. +- `mellea.intrinsic.invocations` — counter labelled by name, revision, binding type, adapter type, outcome. - `mellea.intrinsic.phase_duration_ms` — histogram labelled by name, phase. - `mellea.intrinsic.parse_failures` — counter labelled by name, revision. This is the **schema-drift detector**: a climbing counter against a specific `(name, revision)` pair means an upstream adapter pushed a breaking schema change at a new HF revision that the local parser doesn't yet handle. Each increment matches an `AdapterSchemaMismatchError` raised at the call site (Jake req 4). From ec9ffeedc4cb7f50621b0b7168c935d2d20489aa Mon Sep 17 00:00:00 2001 From: Nigel Jones Date: Fri, 15 May 2026 15:15:43 +0100 Subject: [PATCH 23/29] =?UTF-8?q?docs:=20aggressive=20resolution=20cleanup?= =?UTF-8?q?=20=E2=80=94=20promote=20leans=20to=20resolved=20(#929)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit After this commit, only 3 questions are genuinely open: Part I §5 Q1 (shape agreement), Part I §5 Q5 (terminology rename scope), and §17 Q4 (schema_version cross-team with granite-common). Everything else has either explicit reviewer agreement or is a working position reviewers can push back on. Part I §5: - Q2 Lifecycle: promoted "Lean" → "Resolved (Jake)" — auto-load yes, auto-unload no. Body rewritten to lead with the resolution. - Q3 Reality C: promoted "Lean" → "Resolved (Paul)" — design slot, leave empty; vLLM blocked confirmed. - Q4 Deprecation: promoted "Lean" → "Resolved (Paul, Jake)" — 1 minor release ≈ 4–6 weeks, longer if user impact warrants. Sub-question on breakage-free shipping kept open. - TL;DR table updated to match. §17 (now "Open questions and implementation positions"): - Removed redundant pointer rows (old Q2, Q4, Q6, Q7 — all back-refs to Part I §5). - Old Q1 (WeightsBinding name) → new Q1 [Position]. - Old Q3 (role vs name) → new Q2 [Position] — free-form string with advisory known-roles registry; pure enum rejected. - Old Q5 (rewind interaction) → new Q3 [Resolved] — Jake confirmed on PR #1080 the helpers placement is reasonable. - Old Q8 (schema_version cross-team) → new Q4 [Open] — only genuinely open §17 item. - Old Q9 (weight-refresh policy) → new Q5 [Position] — per-session-start + explicit refresh() for long-running. - Old Q10 (pinning default) → new Q6 [Position] — pin by default. - Body cross-refs throughout updated for renumbering. Each Resolved/Position entry now cites who agreed (where applicable) so reviewers see what's been discussed vs unilaterally asserted. Assisted-by: Claude Code Signed-off-by: Nigel Jones --- docs/dev/proposals/929-adapter-lifecycle.md | 60 ++++++++++----------- 1 file changed, 29 insertions(+), 31 deletions(-) diff --git a/docs/dev/proposals/929-adapter-lifecycle.md b/docs/dev/proposals/929-adapter-lifecycle.md index 30f18b4d7..6fe00051f 100644 --- a/docs/dev/proposals/929-adapter-lifecycle.md +++ b/docs/dev/proposals/929-adapter-lifecycle.md @@ -30,11 +30,11 @@ The `weights_binding` is pluggable — `LocalFileBinding`, `EmbeddedBinding`, or | # | Question | Status | | --- | --- | --- | -| Q1 | Does the `Adapter = identity + io_contract + weights` shape hold? | Open | -| Q2 | Lifecycle default: session-scoped (no auto-unload) or request-scoped? | Lean: session-scoped, no auto-unload | -| Q3 | Reality C (server-mediated): design slot now, fill later — or leave fully empty? | Lean: design slot, leave empty | -| Q4 | Deprecation window for old classes? | Lean: 1 minor release (~4–6 weeks) | -| Q5 | Terminology: replace "intrinsic" with "adapter", or keep both with distinct meanings? | Open — competing positions received | +| Q1 | Does the `Adapter = identity + io_contract + weights` shape hold? | **Open** — Jake wants more discussion | +| Q2 | Lifecycle default | **Resolved**: session-scoped, no auto-unload (Jake) | +| Q3 | Reality C (server-mediated): design slot or leave empty? | **Resolved**: design slot, leave empty (Paul; vLLM blocked) | +| Q4 | Deprecation window for old classes | **Resolved**: 1 minor release ≈ 4–6 weeks; longer if user impact warrants (Paul, Jake) | +| Q5 | Terminology: replace "intrinsic" with "adapter", or keep both with distinct meanings? | **Open** — three competing positions | Detail on each in §5. @@ -57,7 +57,7 @@ Every thread in #929 is a symptom of not having separated the kinds of adapter a Four outcomes, in order of importance. Detail on each lives in Part II; this list is the ask. 1. **One adapter model, one code path.** Reasonable from the outside, unified from the inside — no more `if backend._uses_embedded_adapters:` branches. -2. **Safe evolution.** Model-option precedence is documented and enforced. Adapter weights are versioned by HF commit SHA — Mellea can pin to a specific revision for stability or track latest for newest weights (refresh policy in §17 Q9). Output schemas are stable in the common case (new weights, same schema); the rare breaking schema change is handled either by pinning, by `schema_version` parser dispatch (§17 Q8), or by helpers raising on mismatch (Jake req 4). Helpers like `check_answerability` see a normalised result regardless of underlying churn. +2. **Safe evolution.** Model-option precedence is documented and enforced. Adapter weights are versioned by HF commit SHA — Mellea can pin to a specific revision for stability or track latest for newest weights (refresh policy in §17 Q5). Output schemas are stable in the common case (new weights, same schema); the rare breaking schema change is handled either by pinning, by `schema_version` parser dispatch (§17 Q4), or by helpers raising on mismatch (Jake req 4). Helpers like `check_answerability` see a normalised result regardless of underlying churn. 3. **First-class customer adapters.** Customers can ship their own against the same API as first-party ones — today it requires patching the catalog or subclassing a self-confessed "temporary hack" ([#424](https://github.com/generative-computing/mellea/issues/424)). 4. **Observable and parity-respecting.** Every lifecycle phase is a distinct span; high-level helpers (`check_answerability` etc.) keep their shape; manual adapter construction becomes simpler, not harder. @@ -79,7 +79,7 @@ An **Adapter** is a small object composed of three parts: ``` Adapter -├── identity — name, adapter type (lora/alora), schema version (proposed; see §7 / §17 Q8), optional role +├── identity — name, adapter type (lora/alora), schema version (proposed; see §7 / §17 Q4), optional role ├── io_contract — parsed io.yaml: prompt building, output parsing, model options └── weights — one of three pluggable bindings (LocalFile, Embedded, ServerMediated) ``` @@ -120,9 +120,9 @@ Of Mellea's five backends (`LocalHFBackend`, `OpenAIBackend`, `OllamaBackend`, ` These gate decomposition; everything else can live in sub-issues once these are agreed. 1. **Does the end-state shape (§4) hold?** Three realities, `Adapter = identity + io_contract + weights`, role-based lookup for rerouting. Yes / no / what's missing. -2. **Adapter lifecycle default — session-scoped or request-scoped?** Today's HF backend keeps adapters loaded once added; request-scoped load/unload is safer for multi-tenancy but costs latency on a 7B base. **Position received (Jacob):** auto-load yes, auto-unload no — once activated, leave the adapter loaded; let the caller or session teardown trigger explicit `release()`. The multi-tenancy concern is reduced for `LocalHFBackend`, which is primarily a single-user/local backend (see §10). -3. **Reality C target shape.** The aLoRA-on-vLLM path ([#27](https://github.com/generative-computing/mellea/issues/27)) is currently blocked: vLLM has declined to upstream aLoRA support (see §8.3 for history). **Position received (Paul):** leave non-switch vLLM adapters alone; no near-term path there. Recommendation: design the `ServerMediatedBinding` slot so the interface is clean when/if the upstream situation changes, but leave it empty and do not invest in stubs. -4. **Deprecation window.** How long do `IntrinsicAdapter` / `EmbeddedIntrinsicAdapter` / `CustomIntrinsicAdapter` stay as shims before removal? **Working position:** at least one minor release; longer if user impact warrants (Jake's framing). One minor release ≈ 4–6 weeks (Paul). **Sub-question:** can this ship without breakage at all? Under the current Q5 lean, `IntrinsicAdapter` could stay as a re-export of `Adapter`, which would remove the deprecation-window pressure entirely. +2. **Adapter lifecycle — session-scoped, no auto-unload.** **Resolved (Jake):** auto-load yes, auto-unload no. Once activated, the adapter stays loaded; the caller or session teardown triggers explicit `release()`. The multi-tenancy concern is reduced because `LocalHFBackend` is primarily a single-user/local backend (see §10). Request-scoped lifecycle remains an opt-in for deployments that need per-call isolation. +3. **Reality C target shape — design slot, leave empty.** **Resolved (Paul):** the aLoRA-on-vLLM path ([#27](https://github.com/generative-computing/mellea/issues/27)) is currently blocked — vLLM has declined to upstream aLoRA support (see §8.3 for history). The `ServerMediatedBinding` slot is designed so the interface is clean if the upstream situation ever changes, but the implementation stays empty and we don't invest in stubs. +4. **Deprecation window — at least 1 minor release; longer if user impact warrants.** **Resolved (Paul, Jake):** Paul confirms 1 minor release ≈ 4–6 weeks is sufficient, extendable if needed; Jake notes the final length depends on how many users are impacted. **Sub-question (open):** can this ship without breakage at all? Under Q5's current lean, `IntrinsicAdapter` could stay as a re-export of `Adapter`, which would remove the deprecation-window pressure entirely. 5. **Terminology rename scope.** Feedback received challenges the framing in this proposal. Three competing positions: - **Original proposal model:** "adapter" replaces "intrinsic" as the primary user-facing term; `Intrinsic` AST class and module path are renamed with shims. - **Current lean (Jacob's framing):** keep "Intrinsic" — it is IBM's term and must survive. The semantic split is: "adapter" = the backend artefact (weights loaded by the backend); "intrinsic" = the user-facing abstraction (helper functions, input/output parsing, classes). Both names stay, with distinct meanings. @@ -181,7 +181,7 @@ Files and modules touched, approximate: `mellea/backends/adapters/{adapter,catal ### Risk - **Biggest unknown**: whether the unified `resolve_model_options` handles every combination currently in use. Mitigation: keep the five-layer precedence explicit, add per-adapter override documentation, and assert resolved values in tests. -- **Second biggest**: handling breaking schema changes from upstream. Three layers: pinning (avoid the risk), `schema_version` parser dispatch (§17 Q8, proposed), helpers raising `AdapterSchemaMismatchError` on parse mismatch (Jake req 4, loud safety net). Worked example: the [#1008](https://github.com/generative-computing/mellea/pull/1008) `requirement-check` change would have surfaced as `AdapterSchemaMismatchError` on the first call after the schema change, rather than silently returning `False`. +- **Second biggest**: handling breaking schema changes from upstream. Three layers: pinning (avoid the risk), `schema_version` parser dispatch (§17 Q4, proposed), helpers raising `AdapterSchemaMismatchError` on parse mismatch (Jake req 4, loud safety net). Worked example: the [#1008](https://github.com/generative-computing/mellea/pull/1008) `requirement-check` change would have surfaced as `AdapterSchemaMismatchError` on the first call after the schema change, rather than silently returning `False`. - **Mitigated by**: per-phase test-parity commitment (nothing merges if existing tests regress); observability introduced alongside the refactor so production regressions surface as dashboard signals rather than silent behavioural drift. --- @@ -200,7 +200,7 @@ Names matter because they appear in user-facing error messages, docs, and teleme | **Intrinsic** | The user-facing capability: helper functions (`check_answerability`, `requirement_check`, the Guardian helpers), the `Intrinsic` AST component, input/output parsing. Backed by an adapter. The name is kept as IBM's terminology — current Q5 lean (see Part I §5). | | **Adapter** | The backend artefact: the weights loaded by a backend (LoRA / aLoRA / embedded), with its identity, I/O contract, and weights binding. The user-facing **Intrinsic** wraps an adapter to provide helpers and parsing. In the redesign, the class hierarchy collapses from four (`IntrinsicAdapter` / `EmbeddedIntrinsicAdapter` / `CustomIntrinsicAdapter` + abstract base) to one `Adapter` + a pluggable binding. | | **Identity** | The part of an adapter that says *what it is*: name (e.g. `answerability`), adapter type (`lora` / `alora`), schema version, and optional role. | -| **Schema version** | *Proposed parser-dispatch field for breaking schema changes only.* For routine weight updates the HF commit SHA is the version (no new field needed). `schema_version` would only earn its keep if the granite team ships a *breaking* output-schema change (different keys, nesting, or types) and unpinned callers need graceful v1↔v2 parser dispatch. **Open** (§17 Q8) — granite-common may already have a versioning mechanism we should reuse instead of inventing this. | +| **Schema version** | *Proposed parser-dispatch field for breaking schema changes only.* For routine weight updates the HF commit SHA is the version (no new field needed). `schema_version` would only earn its keep if the granite team ships a *breaking* output-schema change (different keys, nesting, or types) and unpinned callers need graceful v1↔v2 parser dispatch. **Open** (§17 Q4) — granite-common may already have a versioning mechanism we should reuse instead of inventing this. | | **I/O contract** | The parsed `io.yaml` — prompt template, output parser, model-option defaults. Always present, same shape regardless of reality. *Name under discussion: Jacob prefers `io_config`; `io_contract` is used throughout this proposal but is not final.* | | **Weights binding** | The part of an adapter that says *how its weights are made available*. Three subclasses, one per reality. Exposes `prepare`, `activate`, `deactivate`, `release`. | | **Reality A / B / C** | Shorthand for the three "where the weights live" stories: A = local PEFT file, B = shipped with the base model (Granite Switch), C = server-mediated (future OpenAI/vLLM). | @@ -274,7 +274,7 @@ Each concrete binding implements the four-verb set from Part I §4. The column m > **Which class knows an adapter doesn't need PEFT activation? The binding does — not the backend.** `EmbeddedBinding.activate()` renders `controls` JSON into the chat template; `LocalFileBinding.activate()` calls PEFT `load_adapter`. The backend calls `binding.activate()` uniformly and has no conditional on binding type. This is the mechanism that eliminates the `if getattr(backend, "_uses_embedded_adapters", False):` branch (§11). When embedded-adapter support is later added to `LocalHFBackend` ([#1018](https://github.com/generative-computing/mellea/issues/1018)), the backend does not need to learn about embedding — it calls the same verbs, and `EmbeddedBinding` handles the difference. The backend only needs the verb interface. (Addressing Jacob's review question on backend consumption.) -> **Weight updates:** weights are versioned by HF commit SHA. `prepare()` resolves the configured revision (`main` by default, or a pinned SHA) and refreshes the local cache when upstream has moved. Refresh policy and the long-running-process exception are open (§17 Q9). +> **Weight updates:** weights are versioned by HF commit SHA. `prepare()` resolves the configured revision (`main` by default, or a pinned SHA) and refreshes the local cache when upstream has moved. Refresh policy and the long-running-process exception are open (§17 Q5). ### 9.3 Lifecycle sequence @@ -354,11 +354,11 @@ This is a **backend-keyed dispatch** where the branching key (`_uses_embedded_ad | 2a. Intrinsic rewriters overwrite options | `Adapter.resolve_model_options()` replaces the five-place merge with one documented stack. | | 2b/2c. Model-option hierarchy | Five layers enforced in `resolve_model_options` (base model → adapter config → `io.yaml` defaults → `io.yaml` per-intrinsic → caller). | | 3. Naming consistency | Three-axis identity (`name`, `adapter_type`, `revision`) plus explicit `role`. | -| 4a. `call_intrinsic` assumes one output schema | `io_contract.parse()` validates the output shape and raises `AdapterSchemaMismatchError` on mismatch (Jake req 4); helpers see a normalised shape. Dispatch on `(name, schema_version)` is an optional layer if §17 Q8 is adopted. | +| 4a. `call_intrinsic` assumes one output schema | `io_contract.parse()` validates the output shape and raises `AdapterSchemaMismatchError` on mismatch (Jake req 4); helpers see a normalised shape. Dispatch on `(name, schema_version)` is an optional layer if §17 Q4 is adopted. | | 4b. Per-adapter vs standard schema | `io_contract.parse()` is per-adapter; helpers define the normalised post-parse shape. | -| 4c. Versioning | HF commit SHA is the version (every push = new revision; pin via `revision="..."` for stability). Breaking schema changes (rare) handled by pinning + helpers raising `AdapterSchemaMismatchError` on parse mismatch (Jake req 4); `schema_version` parser dispatch (§17 Q8) is an optional layer if the granite team adds the field. | +| 4c. Versioning | HF commit SHA is the version (every push = new revision; pin via `revision="..."` for stability). Breaking schema changes (rare) handled by pinning + helpers raising `AdapterSchemaMismatchError` on parse mismatch (Jake req 4); `schema_version` parser dispatch (§17 Q4) is an optional layer if the granite team adds the field. | | 5. OpenAI backend support | Ships as one or two `ServerMediatedBinding` subclasses. | -| 6. Catalog cleanup | Catalog becomes optional resolver (`LocalFileBinding.from_catalog(name)`). Custom adapters bypass it; no monkey-patching. Duplicate `requirement_check` / `requirement-check` entries collapse into one entry; the v1 → v2 output-schema change (PR #1008) is handled by Jake req 4 + optional `schema_version` dispatch (§17 Q8). | +| 6. Catalog cleanup | Catalog becomes optional resolver (`LocalFileBinding.from_catalog(name)`). Custom adapters bypass it; no monkey-patching. Duplicate `requirement_check` / `requirement-check` entries collapse into one entry; the v1 → v2 output-schema change (PR #1008) is handled by Jake req 4 + optional `schema_version` dispatch (§17 Q4). | | 7. Hardcoded `requirement-check` refs | Callers look up by **role**, not name. | ## 13. What users see — detailed @@ -451,7 +451,7 @@ Kept cheap (tens of test cases per adapter, not hundreds) so qualitative runs fi **Tutorials** — three worth writing alongside the refactor: - "Adding a custom intrinsic in 20 lines" — replaces the `CustomIntrinsicAdapter` monkey-patch story. -- "Handling a breaking schema change without breaking users" — worked example using `requirement-check` v1 → v2; covers HF revision pinning, `AdapterSchemaMismatchError` (Jake req 4), and `schema_version` dispatch if §17 Q8 is adopted. +- "Handling a breaking schema change without breaking users" — worked example using `requirement-check` v1 → v2; covers HF revision pinning, `AdapterSchemaMismatchError` (Jake req 4), and `schema_version` dispatch if §17 Q4 is adopted. - "Reading intrinsic telemetry" — short dashboard-building guide. **Release notes** separate: no-op for high-level helper users; deprecated-but-shimmed for direct adapter constructors; removed at Phase 4 (see below). @@ -460,26 +460,24 @@ Kept cheap (tens of test cases per adapter, not hundreds) so qualitative runs fi Detail deferred until Part I §5 decisions are agreed, but the intended phasing is: -1. **Phase 0 — parallel types.** Introduce the new types (`Adapter`, `WeightsBinding`, `IOContract`, plus a user-facing `Intrinsic` class if Q5 is settled on Jake's split) alongside existing classes. Catalogue entries gain pinned HF revision SHAs (Jake req 5; Q10). No call-site changes, tests unchanged. +1. **Phase 0 — parallel types.** Introduce the new types (`Adapter`, `WeightsBinding`, `IOContract`, plus a user-facing `Intrinsic` class if Q5 is settled on Jake's split) alongside existing classes. Catalogue entries gain pinned HF revision SHAs (Jake req 5; §17 Q6). No call-site changes, tests unchanged. 2. **Phase 1 — callers move.** `_util.call_intrinsic`, requirement rerouting, and each helper switch to new types. Helpers gain output validation raising `AdapterSchemaMismatchError` on parse mismatch (Jake req 4). Old classes become deprecation shims. -3. **Phase 2 — backends move.** `AdapterMixin` narrows to the new verb set. Bindings implement `prepare` / `activate` / `deactivate` / `release` per reality; `LocalFileBinding.prepare` resolves the configured HF revision (Q9 weight-refresh policy). Backends drop per-call `_simplify_and_merge` in favour of `resolve_model_options`. +3. **Phase 2 — backends move.** `AdapterMixin` narrows to the new verb set. Bindings implement `prepare` / `activate` / `deactivate` / `release` per reality; `LocalFileBinding.prepare` resolves the configured HF revision (§17 Q5 weight-refresh policy). Backends drop per-call `_simplify_and_merge` in favour of `resolve_model_options`. 4. **Phase 3 — Reality C ships.** `ServerMediatedBinding` subclass(es) written; OpenAI backend drops `_uses_embedded_adapters` hard-code. 5. **Phase 4 — shim removal.** After one minor release with deprecation warnings. Observability and docs deliverables attach to the phase that first exercises them. -## 17. Open questions (full list) - -1. **Naming.** `WeightsBinding` vs `ResourceStrategy` vs `AdapterProvider`. Pick one; the term leaks into error messages. -2. **Lifecycle default** — session-scoped or request-scoped (also in Part I §5). -3. **Role vs name.** Free-form `role` string, or a small enum so users can't invent roles backends don't honour? -4. **Reality C idiom.** Back-reference to Part I §5 Q3 — no separate question here; the sub-case framing (C1 = vLLM-backed, C2 = commercial fine-tunes) is in §8.3. -5. **Rewind interaction (PR #1028).** Some helpers — specifically `factuality_detection` and `factuality_correction` — need to re-format the conversation so that documents are attached to the *last assistant message* rather than earlier in the history. They currently do this by walking back through `context.previous_node`. Question: does that rewind logic belong on `io_contract.build_prompt` (cleaner separation of concerns) or stay in the helper functions (smaller migration blast radius)? -6. **Telemetry coupling with #1035** (also in Part I §5). -7. **Deprecation window** (also in Part I §5). -8. **`schema_version` field in `io.yaml`.** §4, §9, and §12 all assume the `io.yaml` parsed by granite-common / granite-formatters carries a `schema_version`. It doesn't today, so this is asking that team to add a field. Worth suggesting to them? Or do they have another approach to versioning? -9. **Weight-refresh policy.** Adapter weights are versioned by HF commit SHA. When Mellea is configured to track latest (no pin), how often does `prepare()` re-resolve the upstream revision? Per-session-start is the natural answer; long-running processes (sessions spanning a release) need either an explicit `refresh()` API or accept stale weights until restart. -10. **Version pinning for auto-loaded adapters.** When an adapter is auto-loaded from the catalogue (caller didn't specify a revision), should Mellea pin to a known-good revision (the catalogue entry's recorded SHA) or track upstream's default branch? **Recommendation:** pin by default — catalogue entries record a pinned SHA; `revision="main"` is an explicit opt-in to track latest. Pinning gives reproducibility; explicit tracking gives latest weights at the cost of behaviour drift between runs. (Jake req 5; coupled to Q9 weight-refresh policy.) +## 17. Open questions and implementation positions + +Items marked **[Open]** need decision; **[Position]** is the proposal's working answer (reviewers can push back); **[Resolved]** has explicit reviewer agreement. Decisions that gate decomposition are in Part I §5; this section is for implementation-level questions and positions. + +1. **Naming `WeightsBinding`** [Position]. Used throughout the doc; alternatives `ResourceStrategy` / `AdapterProvider` were considered. `WeightsBinding` is concrete (says what it binds) and unambiguous in error messages. +2. **Role vs name** [Position]. `role` is a free-form string with an advisory known-roles registry (e.g., `mellea.backends.adapters.roles.KNOWN_ROLES`). Backends warn on unknown roles but accept any string. Pure enum was considered but rejected — it would lock role names at library-release time. +3. **Rewind interaction (PR #1028)** [Resolved] (Jake on PR #1080). The rewind logic in `_resolve_question` / `_resolve_response` stays in the helpers. Phase 1 of this refactor can revisit moving it to `io_contract.build_prompt` if cleaner separation is wanted; not gating. +4. **`schema_version` field in `io.yaml`** [Open] — cross-team. §4, §7, §9, and §12 all assume the `io.yaml` parsed by granite-common / granite-formatters carries a `schema_version`. It doesn't today, so this is asking that team to add a field. Worth suggesting to them? Or do they have another approach to versioning? +5. **Weight-refresh policy** [Position]. Adapter weights are versioned by HF commit SHA. `prepare()` re-resolves the upstream revision at session start; long-running processes (sessions spanning a release) opt into an explicit `refresh()` API. Default cadence matches the session-scoped lifecycle (Part I §5 Q2 Resolved). +6. **Version pinning for auto-loaded adapters** [Position]. When an adapter is auto-loaded from the catalogue (caller didn't specify a revision), Mellea pins to the catalogue entry's recorded SHA. `revision="main"` is an explicit opt-in to track latest. Pinning gives reproducibility; explicit tracking gives latest weights at the cost of behaviour drift between runs. (Jake req 5; coupled to Q5 weight-refresh policy.) --- From be93aafb9b14f64306193f3f4e1129b853f9325a Mon Sep 17 00:00:00 2001 From: Nigel Jones Date: Fri, 15 May 2026 15:29:47 +0100 Subject: [PATCH 24/29] =?UTF-8?q?docs:=20make=20Q4=E2=86=94Q5=20deprecatio?= =?UTF-8?q?n=20coupling=20explicit=20(#929)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit §6 and §16 Phase 4 previously asserted "shimmed for one release" / "shim removal after one minor release" unconditionally. But under Q5's current lean (Jake's split — Intrinsic stays as the user-facing class, Adapter is the backend artefact), the old types can stay as re-exports indefinitely with no deprecation needed at all. Two small additions surface this coupling: - §6 API surface: the deprecated-but-shimmed bullet now notes it applies only if Q5 settles on rename or new-API-alongside. - §16 Phase 4: shim removal now flagged as skipped if Q5 settles on Jake's split. Q4 itself remains Resolved (≥1 minor release if there is a deprecation); the new framing just makes clear that whether there's deprecation at all is Q5-conditional. Assisted-by: Claude Code Signed-off-by: Nigel Jones --- docs/dev/proposals/929-adapter-lifecycle.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/dev/proposals/929-adapter-lifecycle.md b/docs/dev/proposals/929-adapter-lifecycle.md index 6fe00051f..55823ff49 100644 --- a/docs/dev/proposals/929-adapter-lifecycle.md +++ b/docs/dev/proposals/929-adapter-lifecycle.md @@ -142,7 +142,7 @@ Scope of this refactor in concrete terms so reviewers can weigh the cost. ### API surface - **Unchanged** — every high-level helper (`check_answerability` etc.) keeps its signature. `m.instruct`, `m.validate`, `m.chat` unaffected. The `model_options=` addition from [#1003](https://github.com/generative-computing/mellea/issues/1003) arrives on top, not instead. -- **Deprecated but shimmed for one release** — `IntrinsicAdapter`, `EmbeddedIntrinsicAdapter`, `CustomIntrinsicAdapter` public classes. Direct users get `DeprecationWarning` pointing to the new constructor. +- **Deprecated but shimmed for one release** — `IntrinsicAdapter`, `EmbeddedIntrinsicAdapter`, `CustomIntrinsicAdapter` public classes. Direct users get `DeprecationWarning` pointing to the new constructor. *(Applies if Q5 settles on rename or new-API-alongside; under Jake's split, the old types stay as re-exports indefinitely with no deprecation needed.)* - **Optional, was mandatory** — the adapter catalogue. Callers no longer have to register custom adapters in `catalog.py` before use; the catalogue stays as a convenience resolver for first-party names, not a precondition. - **Possibly moved/renamed** — depends on §5 Q5 (terminology rename scope). @@ -464,7 +464,7 @@ Detail deferred until Part I §5 decisions are agreed, but the intended phasing 2. **Phase 1 — callers move.** `_util.call_intrinsic`, requirement rerouting, and each helper switch to new types. Helpers gain output validation raising `AdapterSchemaMismatchError` on parse mismatch (Jake req 4). Old classes become deprecation shims. 3. **Phase 2 — backends move.** `AdapterMixin` narrows to the new verb set. Bindings implement `prepare` / `activate` / `deactivate` / `release` per reality; `LocalFileBinding.prepare` resolves the configured HF revision (§17 Q5 weight-refresh policy). Backends drop per-call `_simplify_and_merge` in favour of `resolve_model_options`. 4. **Phase 3 — Reality C ships.** `ServerMediatedBinding` subclass(es) written; OpenAI backend drops `_uses_embedded_adapters` hard-code. -5. **Phase 4 — shim removal.** After one minor release with deprecation warnings. +5. **Phase 4 — shim removal.** After one minor release with deprecation warnings. *(Skipped if Q5 settles on Jake's split — re-exports stay.)* Observability and docs deliverables attach to the phase that first exercises them. From 7bfbbbba21d6eea2e00af6f401f5be549aef505f Mon Sep 17 00:00:00 2001 From: Nigel Jones Date: Fri, 15 May 2026 15:33:48 +0100 Subject: [PATCH 25/29] docs: collapse Q5 to a binary endpoint choice (#929) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The original three Q5 positions (Original proposal / Jake's split / Paul's preference) effectively split into two endpoints: - Original and Paul share the same end-state (Adapter as the user-facing term). They differ only on transition mechanism — rename with deprecation shims (Original) vs create-new-alongside with gradual deprecation (Paul). - Jake's split is the only structurally different endpoint (both names survive permanently with distinct meanings). Q5 now reads as a binary endpoint choice with a sub-decision on transition mechanism if the first endpoint wins. This makes the actual fork visible to reviewers instead of asking them to weigh in on three near-equivalents. - TL;DR row Q5 status: "two endpoints; mechanism sub-decision" - §5 Q5 body: restructured from three bullets to two endpoints + an inline mechanism sub-decision under endpoint 1. No reviewer feedback dropped — Original's framing and Paul's framing are both preserved as named mechanism variants. Assisted-by: Claude Code Signed-off-by: Nigel Jones --- docs/dev/proposals/929-adapter-lifecycle.md | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/docs/dev/proposals/929-adapter-lifecycle.md b/docs/dev/proposals/929-adapter-lifecycle.md index 55823ff49..89897f050 100644 --- a/docs/dev/proposals/929-adapter-lifecycle.md +++ b/docs/dev/proposals/929-adapter-lifecycle.md @@ -34,7 +34,7 @@ The `weights_binding` is pluggable — `LocalFileBinding`, `EmbeddedBinding`, or | Q2 | Lifecycle default | **Resolved**: session-scoped, no auto-unload (Jake) | | Q3 | Reality C (server-mediated): design slot or leave empty? | **Resolved**: design slot, leave empty (Paul; vLLM blocked) | | Q4 | Deprecation window for old classes | **Resolved**: 1 minor release ≈ 4–6 weeks; longer if user impact warrants (Paul, Jake) | -| Q5 | Terminology: replace "intrinsic" with "adapter", or keep both with distinct meanings? | **Open** — three competing positions | +| Q5 | Terminology: replace "intrinsic" with "adapter", or keep both with distinct meanings? | **Open** — two endpoints; mechanism sub-decision | Detail on each in §5. @@ -123,15 +123,16 @@ These gate decomposition; everything else can live in sub-issues once these are 2. **Adapter lifecycle — session-scoped, no auto-unload.** **Resolved (Jake):** auto-load yes, auto-unload no. Once activated, the adapter stays loaded; the caller or session teardown triggers explicit `release()`. The multi-tenancy concern is reduced because `LocalHFBackend` is primarily a single-user/local backend (see §10). Request-scoped lifecycle remains an opt-in for deployments that need per-call isolation. 3. **Reality C target shape — design slot, leave empty.** **Resolved (Paul):** the aLoRA-on-vLLM path ([#27](https://github.com/generative-computing/mellea/issues/27)) is currently blocked — vLLM has declined to upstream aLoRA support (see §8.3 for history). The `ServerMediatedBinding` slot is designed so the interface is clean if the upstream situation ever changes, but the implementation stays empty and we don't invest in stubs. 4. **Deprecation window — at least 1 minor release; longer if user impact warrants.** **Resolved (Paul, Jake):** Paul confirms 1 minor release ≈ 4–6 weeks is sufficient, extendable if needed; Jake notes the final length depends on how many users are impacted. **Sub-question (open):** can this ship without breakage at all? Under Q5's current lean, `IntrinsicAdapter` could stay as a re-export of `Adapter`, which would remove the deprecation-window pressure entirely. -5. **Terminology rename scope.** Feedback received challenges the framing in this proposal. Three competing positions: - - **Original proposal model:** "adapter" replaces "intrinsic" as the primary user-facing term; `Intrinsic` AST class and module path are renamed with shims. +5. **Terminology — "Adapter" alone, or both names with split meaning?** Two competing endpoints: + - **"Adapter" as the user-facing term going forward.** "Adapter" replaces "Intrinsic" in user-facing code. *Transition mechanism is a sub-decision:* + - *Rename with shims* (original proposal): rename `Intrinsic` AST class and module path to `Adapter` (or similar); old names aliased as deprecation shims for one release. + - *Create new alongside* (Paul's preference): stand up a new `Adapter` API beside the existing intrinsic API; old API gradually deprecated. Don't rename. - **Current lean (Jacob's framing):** keep "Intrinsic" — it is IBM's term and must survive. The semantic split is: "adapter" = the backend artefact (weights loaded by the backend); "intrinsic" = the user-facing abstraction (helper functions, input/output parsing, classes). Both names stay, with distinct meanings. - - **Paul's preference:** create a new `Adapter` API alongside the existing intrinsic API and deprecate the old — implement new, don't rename. - These three models need alignment before §5 Q1 can be finalised. The three original sub-questions remain relevant once the higher-level question is resolved: - - **Q5a. Prose rename** — shift docs, error messages, help text to "adapter." Zero breakage. Likely agreed regardless of model chosen. + These two endpoints need alignment before §5 Q1 can be finalised. The three original sub-questions remain relevant if the first endpoint is chosen: + - **Q5a. Prose rename** — shift docs, error messages, help text to "adapter." Zero breakage. Likely agreed regardless of endpoint chosen. - **Q5b. Module rename** — rename `mellea.stdlib.components.intrinsic` → `mellea.stdlib.components.adapter`, with the old path re-exported for one release. Breaking for submodule importers. - - **Q5c. AST class rename** — rename `Intrinsic` → something like `AdapterCall`, with `Intrinsic` as an alias. If Jacob's model is adopted, Q5c answer is "no rename" — `Intrinsic` stays as the AST component name and receives a precise definition alongside the new `Adapter` class. + - **Q5c. AST class rename** — rename `Intrinsic` → something like `AdapterCall`, with `Intrinsic` as an alias. If Jacob's split is adopted, Q5c answer is "no rename" — `Intrinsic` stays as the AST component name and receives a precise definition alongside the new `Adapter` class. > **Implementation note, not a reviewer question:** intrinsic-level observability (§14) should coordinate with the in-flight [#1035](https://github.com/generative-computing/mellea/issues/1035) / [PR #1036](https://github.com/generative-computing/mellea/pull/1036) work so content capture uses the same `MELLEA_TRACE_CONTENT` flag and doesn't get designed twice. Flagged here for awareness; sequenced during implementation. From ca9c86efe3797c5673b0e39e87dfe0e51516386e Mon Sep 17 00:00:00 2001 From: Nigel Jones Date: Mon, 18 May 2026 08:58:53 +0100 Subject: [PATCH 26/29] docs: fold #1003 / closed PR #1028 work into this epic (#929) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit PR #1028 (akihikokuroda — implementing issue #1003) was closed 2026-05-15 in favour of folding the work into this epic. What's preserved from #1003 / PR #1028: - Phase 1 absorbs the helper signature work: model_options= on all top-level helpers; documents= keyword-only on factuality_detection / factuality_correction. - The contested portion of PR #1028 (manual-Message document detection via _resolve_response scanning Message._docs when documents=None) is now an explicit Phase 1 design call as §17 Q3 sub-question (b) — "scan context for Messages with _docs" vs (a) "require explicit documents=". Doc edits: - §4, §6 (twice), §13: reframe model_options= as folded in from #1003, not arriving on top from a separate PR. Add documents= alongside. - §16 Phase 1: explicitly name the #1003 helper signature work. - §17 Q3: restructure into two sub-parts — rewind placement (Resolved per Jake on PR #1080) and manual-Message detection (Open). - Appendix: update #1003 row, add a row for closed PR #1028. Assisted-by: Claude Code Signed-off-by: Nigel Jones --- docs/dev/proposals/929-adapter-lifecycle.md | 17 ++++++++++------- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/docs/dev/proposals/929-adapter-lifecycle.md b/docs/dev/proposals/929-adapter-lifecycle.md index 89897f050..3dfb2c2c1 100644 --- a/docs/dev/proposals/929-adapter-lifecycle.md +++ b/docs/dev/proposals/929-adapter-lifecycle.md @@ -107,7 +107,7 @@ flowchart LR Adapter invocation becomes one flow, with no branches on backend type. From this shape, the seven threads of #929 resolve cleanly. The simplified invocation pseudocode, the per-binding verb semantics, the lifecycle sequence diagram, and the thread-by-thread mapping are in Part II (§9 and §12). -**What users see:** high-level helpers (`check_answerability` etc.) keep their current shape, with the `model_options=` addition that PR #1003 is introducing. Manual adapter construction collapses from four classes to one, with the binding as the pluggable part. Custom intrinsics no longer require monkey-patching the catalog. Detail in Part II §13. +**What users see:** high-level helpers (`check_answerability` etc.) keep their current shape, with the `model_options=` and `documents=` additions that fold in here from #1003 (PR #1028 was closed 2026-05-15 in favour of this epic). Manual adapter construction collapses from four classes to one, with the binding as the pluggable part. Custom intrinsics no longer require monkey-patching the catalog. Detail in Part II §13. **What cross-cutting concerns look like:** observability (spans + a schema-drift metric), docs rewrite (`intrinsics_and_adapters.md` is 39 lines describing classes this renames), and a test-parity commitment travel **with** the refactor, not after it. Detail in Part II §14–§15. @@ -142,7 +142,7 @@ Scope of this refactor in concrete terms so reviewers can weigh the cost. ### API surface -- **Unchanged** — every high-level helper (`check_answerability` etc.) keeps its signature. `m.instruct`, `m.validate`, `m.chat` unaffected. The `model_options=` addition from [#1003](https://github.com/generative-computing/mellea/issues/1003) arrives on top, not instead. +- **Unchanged** — every high-level helper (`check_answerability` etc.) keeps its signature. `m.instruct`, `m.validate`, `m.chat` unaffected. The `model_options=` and `documents=` additions from [#1003](https://github.com/generative-computing/mellea/issues/1003) (PR #1028 closed 2026-05-15 in favour of this epic) ship as part of Phase 1. - **Deprecated but shimmed for one release** — `IntrinsicAdapter`, `EmbeddedIntrinsicAdapter`, `CustomIntrinsicAdapter` public classes. Direct users get `DeprecationWarning` pointing to the new constructor. *(Applies if Q5 settles on rename or new-API-alongside; under Jake's split, the old types stay as re-exports indefinitely with no deprecation needed.)* - **Optional, was mandatory** — the adapter catalogue. Callers no longer have to register custom adapters in `catalog.py` before use; the catalogue stays as a convenience resolver for first-party names, not a precondition. - **Possibly moved/renamed** — depends on §5 Q5 (terminology rename scope). @@ -151,7 +151,7 @@ Scope of this refactor in concrete terms so reviewers can weigh the cost. | Audience | Impact | | --- | --- | -| Helper user (`check_answerability`-style calls) | None beyond the `model_options=` addition from [#1003](https://github.com/generative-computing/mellea/issues/1003) and clearer error messages. | +| Helper user (`check_answerability`-style calls) | None beyond the `model_options=` / `documents=` additions from [#1003](https://github.com/generative-computing/mellea/issues/1003) and clearer error messages. | | Advanced user constructing adapters directly | One release of deprecation warnings, then adopt the new `Adapter(name=…, weights=…)` constructor. | | Customer writing their own adapter | First-class path; no more `CustomIntrinsicAdapter` monkey-patching; no forced catalogue upload. Resolves [#424](https://github.com/generative-computing/mellea/issues/424). | | Backend author | `AdapterMixin` verb set narrows to the natural operations each backend can perform; existing implementations update or use shim methods. | @@ -364,7 +364,7 @@ This is a **backend-keyed dispatch** where the branching key (`_uses_embedded_ad ## 13. What users see — detailed -**High-level helpers** keep their signatures. The `model_options=` parameter is added via PR #1003: +**High-level helpers** keep their signatures. The `model_options=` parameter (and `documents=` keyword on `factuality_detection` / `factuality_correction`) is added in Phase 1, folding in #1003 (PR #1028 closed in favour of this epic): ```python score = check_answerability(question, documents, context, backend) @@ -462,7 +462,7 @@ Kept cheap (tens of test cases per adapter, not hundreds) so qualitative runs fi Detail deferred until Part I §5 decisions are agreed, but the intended phasing is: 1. **Phase 0 — parallel types.** Introduce the new types (`Adapter`, `WeightsBinding`, `IOContract`, plus a user-facing `Intrinsic` class if Q5 is settled on Jake's split) alongside existing classes. Catalogue entries gain pinned HF revision SHAs (Jake req 5; §17 Q6). No call-site changes, tests unchanged. -2. **Phase 1 — callers move.** `_util.call_intrinsic`, requirement rerouting, and each helper switch to new types. Helpers gain output validation raising `AdapterSchemaMismatchError` on parse mismatch (Jake req 4). Old classes become deprecation shims. +2. **Phase 1 — callers move.** `_util.call_intrinsic`, requirement rerouting, and each helper switch to new types. Helpers gain output validation raising `AdapterSchemaMismatchError` on parse mismatch (Jake req 4). #1003 helper signature work folds in here: `model_options=` on all top-level helpers; `documents=` keyword-only on `factuality_detection` / `factuality_correction`. Old classes become deprecation shims. 3. **Phase 2 — backends move.** `AdapterMixin` narrows to the new verb set. Bindings implement `prepare` / `activate` / `deactivate` / `release` per reality; `LocalFileBinding.prepare` resolves the configured HF revision (§17 Q5 weight-refresh policy). Backends drop per-call `_simplify_and_merge` in favour of `resolve_model_options`. 4. **Phase 3 — Reality C ships.** `ServerMediatedBinding` subclass(es) written; OpenAI backend drops `_uses_embedded_adapters` hard-code. 5. **Phase 4 — shim removal.** After one minor release with deprecation warnings. *(Skipped if Q5 settles on Jake's split — re-exports stay.)* @@ -475,7 +475,9 @@ Items marked **[Open]** need decision; **[Position]** is the proposal's working 1. **Naming `WeightsBinding`** [Position]. Used throughout the doc; alternatives `ResourceStrategy` / `AdapterProvider` were considered. `WeightsBinding` is concrete (says what it binds) and unambiguous in error messages. 2. **Role vs name** [Position]. `role` is a free-form string with an advisory known-roles registry (e.g., `mellea.backends.adapters.roles.KNOWN_ROLES`). Backends warn on unknown roles but accept any string. Pure enum was considered but rejected — it would lock role names at library-release time. -3. **Rewind interaction (PR #1028)** [Resolved] (Jake on PR #1080). The rewind logic in `_resolve_question` / `_resolve_response` stays in the helpers. Phase 1 of this refactor can revisit moving it to `io_contract.build_prompt` if cleaner separation is wanted; not gating. +3. **Rewind interaction (formerly PR #1028).** Two parts: + - **Where rewind logic lives** [Resolved] (Jake on PR #1080): the rewind in `_resolve_question` / `_resolve_response` stays in the helpers. Phase 1 can revisit moving it to `io_contract.build_prompt` if cleaner separation is wanted; not gating. + - **Manual-Message document detection** [Open] — Phase 1 design call. When `documents=None` is passed to `factuality_detection` / `factuality_correction`, how should the helper find user-supplied documents? Two approaches: (a) require explicit `documents=` and don't scan the conversation; (b) scan the context for `Message`s with `_docs` and extract from there. PR #1028 implemented (b) via `_resolve_response`'s manual-Message fallback; the design call was deferred when #1028 closed (2026-05-15) so the work isn't lost. 4. **`schema_version` field in `io.yaml`** [Open] — cross-team. §4, §7, §9, and §12 all assume the `io.yaml` parsed by granite-common / granite-formatters carries a `schema_version`. It doesn't today, so this is asking that team to add a field. Worth suggesting to them? Or do they have another approach to versioning? 5. **Weight-refresh policy** [Position]. Adapter weights are versioned by HF commit SHA. `prepare()` re-resolves the upstream revision at session start; long-running processes (sessions spanning a release) opt into an explicit `refresh()` API. Default cadence matches the session-scoped lifecycle (Part I §5 Q2 Resolved). 6. **Version pinning for auto-loaded adapters** [Position]. When an adapter is auto-loaded from the catalogue (caller didn't specify a revision), Mellea pins to the catalogue entry's recorded SHA. `revision="main"` is an explicit opt-in to track latest. Pinning gives reproducibility; explicit tracking gives latest weights at the cost of behaviour drift between runs. (Jake req 5; coupled to Q5 weight-refresh policy.) @@ -528,7 +530,8 @@ Seven recent fix-up commits in the adapter area, all symptomatic of the design g | Ref | Title | Role in this doc | | --- | --- | --- | -| [#1003](https://github.com/generative-computing/mellea/issues/1003) | fix: intrinsic function signatures (model_options on helpers) | high-level helper signature evolution | +| [#1003](https://github.com/generative-computing/mellea/issues/1003) | fix: intrinsic function signatures | folded into Phase 1 of this epic; PR #1028 closed 2026-05-15 | +| [PR #1028](https://github.com/generative-computing/mellea/pull/1028) | feat: normalize intrinsics interfaces | closed 2026-05-15 in favour of folding into this epic; manual-Message detection carried forward as Phase 1 design call (§17 Q3) | | [#1035](https://github.com/generative-computing/mellea/issues/1035) | OTel emission gaps | parent for telemetry coordination | | [PR #1036](https://github.com/generative-computing/mellea/pull/1036) | feat(telemetry): close five OTel GenAI semconv gaps | in-flight telemetry work to coordinate with | From 4ee6961b236d26aa15c7c5c409063a1b06fc6f37 Mon Sep 17 00:00:00 2001 From: Nigel Jones Date: Tue, 19 May 2026 21:19:01 +0100 Subject: [PATCH 27/29] docs: incorporate Jake's review round 2 (#929) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit - Q1: mark Resolved — shape holds; divergence between identity/io_contract/weights is the point - Q2: reword to reflect two-level lifecycle (session-scoped prepare/release; per-call auto-activate/deactivate) - Q5: IBM retiring "Intrinsic"; adopt AdapterBasedComponent as agreed placeholder name; update preamble, §3, §7, Q4 sub-question - §9.1: clarify build_prompt() returns a Component-compatible object - §15: remove TestBasedEval/BenchDrift recommendation; goal stays, approach left open Assisted-by: Claude Code --- docs/dev/proposals/929-adapter-lifecycle.md | 43 ++++++++------------- 1 file changed, 17 insertions(+), 26 deletions(-) diff --git a/docs/dev/proposals/929-adapter-lifecycle.md b/docs/dev/proposals/929-adapter-lifecycle.md index 3dfb2c2c1..1b58b642a 100644 --- a/docs/dev/proposals/929-adapter-lifecycle.md +++ b/docs/dev/proposals/929-adapter-lifecycle.md @@ -6,7 +6,7 @@ > > **Structure:** **Part I** covers the problem, goals, terminology, end state, and the decisions that gate decomposition. **Part II** contains supporting detail — read after Part I is agreed, not before. > -> **Terminology:** **"Intrinsic"** is the user-facing term: helpers, input/output parsing, the `Intrinsic` AST component (matching IBM's terminology). **"Adapter"** is the backend artefact: the weights loaded by a backend. +> **Terminology:** **"Adapter"** is the backend artefact: the weights loaded by a backend. The user-facing layer — helpers, input/output parsing, the AST component — is referred to as **`AdapterBasedComponent`** throughout this document. This is a placeholder name: IBM is retiring "Intrinsic" but has not yet confirmed the replacement; Mellea will adopt whatever name is settled upstream. > > **Related issues and prior work:** see the appendix at the end of this document for a linked index with annotations. @@ -30,11 +30,11 @@ The `weights_binding` is pluggable — `LocalFileBinding`, `EmbeddedBinding`, or | # | Question | Status | | --- | --- | --- | -| Q1 | Does the `Adapter = identity + io_contract + weights` shape hold? | **Open** — Jake wants more discussion | -| Q2 | Lifecycle default | **Resolved**: session-scoped, no auto-unload (Jake) | +| Q1 | Does the `Adapter = identity + io_contract + weights` shape hold? | **Resolved** (Jake): shape holds | +| Q2 | Lifecycle default | **Resolved** (Jake): session-scoped loading; per-call auto-activate/deactivate | | Q3 | Reality C (server-mediated): design slot or leave empty? | **Resolved**: design slot, leave empty (Paul; vLLM blocked) | | Q4 | Deprecation window for old classes | **Resolved**: 1 minor release ≈ 4–6 weeks; longer if user impact warrants (Paul, Jake) | -| Q5 | Terminology: replace "intrinsic" with "adapter", or keep both with distinct meanings? | **Open** — two endpoints; mechanism sub-decision | +| Q5 | Terminology: name for the user-facing layer | **Resolved** (Jake): two-layer split adopted; user-facing layer named `AdapterBasedComponent` as placeholder pending IBM's final decision | Detail on each in §5. @@ -65,7 +65,7 @@ Four outcomes, in order of importance. Detail on each lives in Part II; this lis Only the few terms needed to read Part I: -- **Intrinsic** — the user-facing capability: helper functions like `check_answerability`, the `Intrinsic` AST component, and input/output parsing. Implemented by an adapter. +- **AdapterBasedComponent** *(placeholder name)* — the user-facing capability: helper functions like `check_answerability`, the AST component, and input/output parsing. Implemented by an adapter. IBM is retiring "Intrinsic" and the replacement name is not yet confirmed; this document uses `AdapterBasedComponent` until that decision lands. - **Adapter** — the backend artefact: the weights loaded by a backend (LoRA / aLoRA / embedded), with its identity and I/O contract. - **Base model** — the general-purpose LLM everything runs on top of (e.g. `ibm-granite/granite-4.1-3b`). - **LoRA / aLoRA** — the two PEFT technologies adapters use. Both are supported. @@ -119,20 +119,16 @@ Of Mellea's five backends (`LocalHFBackend`, `OpenAIBackend`, `OllamaBackend`, ` These gate decomposition; everything else can live in sub-issues once these are agreed. -1. **Does the end-state shape (§4) hold?** Three realities, `Adapter = identity + io_contract + weights`, role-based lookup for rerouting. Yes / no / what's missing. -2. **Adapter lifecycle — session-scoped, no auto-unload.** **Resolved (Jake):** auto-load yes, auto-unload no. Once activated, the adapter stays loaded; the caller or session teardown triggers explicit `release()`. The multi-tenancy concern is reduced because `LocalHFBackend` is primarily a single-user/local backend (see §10). Request-scoped lifecycle remains an opt-in for deployments that need per-call isolation. +1. **Does the end-state shape (§4) hold?** Three realities, `Adapter = identity + io_contract + weights`, role-based lookup for rerouting. **Resolved (Jake):** shape holds. In most cases identity / io_contract / weights will be colocated, but allowing divergence is the point — it enables separation of how weights are fetched from the adapter's functional definition. +2. **Adapter lifecycle — session-scoped loading, per-call activate/deactivate.** **Resolved (Jake):** two-level lifecycle. Weight loading (`prepare`) is session-scoped — the adapter is loaded once and held until explicit `release()` at session teardown. Activation/deactivation (`activate`/`deactivate`) is call-scoped — auto-wrapped around each generation. This matches the §9.3 sequence diagram. The multi-tenancy concern is reduced because `LocalHFBackend` is primarily a single-user/local backend (see §10). Request-scoped lifecycle (including `prepare`/`release` per call) remains an opt-in for deployments that need per-call isolation. 3. **Reality C target shape — design slot, leave empty.** **Resolved (Paul):** the aLoRA-on-vLLM path ([#27](https://github.com/generative-computing/mellea/issues/27)) is currently blocked — vLLM has declined to upstream aLoRA support (see §8.3 for history). The `ServerMediatedBinding` slot is designed so the interface is clean if the upstream situation ever changes, but the implementation stays empty and we don't invest in stubs. -4. **Deprecation window — at least 1 minor release; longer if user impact warrants.** **Resolved (Paul, Jake):** Paul confirms 1 minor release ≈ 4–6 weeks is sufficient, extendable if needed; Jake notes the final length depends on how many users are impacted. **Sub-question (open):** can this ship without breakage at all? Under Q5's current lean, `IntrinsicAdapter` could stay as a re-export of `Adapter`, which would remove the deprecation-window pressure entirely. -5. **Terminology — "Adapter" alone, or both names with split meaning?** Two competing endpoints: - - **"Adapter" as the user-facing term going forward.** "Adapter" replaces "Intrinsic" in user-facing code. *Transition mechanism is a sub-decision:* - - *Rename with shims* (original proposal): rename `Intrinsic` AST class and module path to `Adapter` (or similar); old names aliased as deprecation shims for one release. - - *Create new alongside* (Paul's preference): stand up a new `Adapter` API beside the existing intrinsic API; old API gradually deprecated. Don't rename. - - **Current lean (Jacob's framing):** keep "Intrinsic" — it is IBM's term and must survive. The semantic split is: "adapter" = the backend artefact (weights loaded by the backend); "intrinsic" = the user-facing abstraction (helper functions, input/output parsing, classes). Both names stay, with distinct meanings. - - These two endpoints need alignment before §5 Q1 can be finalised. The three original sub-questions remain relevant if the first endpoint is chosen: - - **Q5a. Prose rename** — shift docs, error messages, help text to "adapter." Zero breakage. Likely agreed regardless of endpoint chosen. - - **Q5b. Module rename** — rename `mellea.stdlib.components.intrinsic` → `mellea.stdlib.components.adapter`, with the old path re-exported for one release. Breaking for submodule importers. - - **Q5c. AST class rename** — rename `Intrinsic` → something like `AdapterCall`, with `Intrinsic` as an alias. If Jacob's split is adopted, Q5c answer is "no rename" — `Intrinsic` stays as the AST component name and receives a precise definition alongside the new `Adapter` class. +4. **Deprecation window — at least 1 minor release; longer if user impact warrants.** **Resolved (Paul, Jake):** Paul confirms 1 minor release ≈ 4–6 weeks is sufficient, extendable if needed; Jake notes the final length depends on how many users are impacted. **Sub-question (open):** can this ship without breakage at all? IBM is retiring "Intrinsic" (see Q5), so `IntrinsicAdapter` cannot stay as a permanent re-export — it will eventually need to go. The question is whether the deprecation shim for `IntrinsicAdapter` → `AdapterBasedComponent` (placeholder) can be deferred until the upstream name is confirmed, effectively separating the structural refactor from the naming change. +5. **Terminology — two-layer split, with `AdapterBasedComponent` as placeholder.** **Resolved (Jake):** the conceptual split is agreed. "Adapter" is the backend artefact (weights loaded by a backend). The user-facing layer — helper functions, input/output parsing, the AST component — is a distinct abstraction. IBM is retiring the "Intrinsic" name but has not yet confirmed its replacement; until that decision lands, Mellea will use **`AdapterBasedComponent`** as the working placeholder name throughout the codebase and docs. + + Three implementation sub-questions follow from this: + - **Q5a. Prose rename** — shift docs, error messages, and help text away from "Intrinsic" to `AdapterBasedComponent` (or the final IBM name once known). Zero breakage. + - **Q5b. Module rename** — rename `mellea.stdlib.components.intrinsic` → `mellea.stdlib.components.adapter_based_component` (placeholder path), with the old path re-exported for one release. Breaking for submodule importers. + - **Q5c. AST class rename** — rename `Intrinsic` AST component → `AdapterBasedComponent`, with `Intrinsic` as a deprecation alias for one release. > **Implementation note, not a reviewer question:** intrinsic-level observability (§14) should coordinate with the in-flight [#1035](https://github.com/generative-computing/mellea/issues/1035) / [PR #1036](https://github.com/generative-computing/mellea/pull/1036) work so content capture uses the same `MELLEA_TRACE_CONTENT` flag and doesn't get designed twice. Flagged here for awareness; sequenced during implementation. @@ -198,7 +194,7 @@ Names matter because they appear in user-facing error messages, docs, and teleme | Term | Meaning | | --- | --- | | **Base model** | The general-purpose LLM (e.g. `ibm-granite/granite-4.1-3b`) that everything runs on top of. | -| **Intrinsic** | The user-facing capability: helper functions (`check_answerability`, `requirement_check`, the Guardian helpers), the `Intrinsic` AST component, input/output parsing. Backed by an adapter. The name is kept as IBM's terminology — current Q5 lean (see Part I §5). | +| **AdapterBasedComponent** *(placeholder)* | The user-facing capability: helper functions (`check_answerability`, `requirement_check`, the Guardian helpers), the AST component, input/output parsing. Backed by an adapter. IBM is retiring "Intrinsic" and has not yet confirmed the replacement name; `AdapterBasedComponent` is used throughout this document as a placeholder (see Part I §5). | | **Adapter** | The backend artefact: the weights loaded by a backend (LoRA / aLoRA / embedded), with its identity, I/O contract, and weights binding. The user-facing **Intrinsic** wraps an adapter to provide helpers and parsing. In the redesign, the class hierarchy collapses from four (`IntrinsicAdapter` / `EmbeddedIntrinsicAdapter` / `CustomIntrinsicAdapter` + abstract base) to one `Adapter` + a pluggable binding. | | **Identity** | The part of an adapter that says *what it is*: name (e.g. `answerability`), adapter type (`lora` / `alora`), schema version, and optional role. | | **Schema version** | *Proposed parser-dispatch field for breaking schema changes only.* For routine weight updates the HF commit SHA is the version (no new field needed). `schema_version` would only earn its keep if the granite team ships a *breaking* output-schema change (different keys, nesting, or types) and unpinned callers need graceful v1↔v2 parser dispatch. **Open** (§17 Q4) — granite-common may already have a versioning mechanism we should reuse instead of inventing this. | @@ -259,7 +255,7 @@ return adapter.io_contract.parse(raw) Every verb that varies per reality lives inside `adapter_scope` (see §9.3); the outer flow is the same whether the adapter is a local PEFT file, an embedded Granite Switch adapter, or a server-mediated one. -> **Boundary constraint:** `io_contract.build_prompt()` and `io_contract.parse()` must delegate to `granite-common` / `granite-formatters` for all `io.yaml` handling and parsing. The `IOContract` class in Mellea wraps these libraries; it does not re-implement their logic. (Jacob's requirement — keep `io.yaml` parsing in the granite-common / granite-formatters boundary.) +> **Boundary constraint:** `io_contract.build_prompt()` and `io_contract.parse()` must delegate to `granite-common` / `granite-formatters` for all `io.yaml` handling and parsing. The `IOContract` class in Mellea wraps these libraries; it does not re-implement their logic. (Jacob's requirement — keep `io.yaml` parsing in the granite-common / granite-formatters boundary.) `build_prompt()` returns a `Component`-compatible prompt object — not a raw string — consistent with the rest of Mellea's prompt pipeline. ### 9.2 Weights binding verbs per reality @@ -443,12 +439,7 @@ First-class deliverables, not afterthoughts. **Qualitative effectiveness suite (optional, per-adapter).** The tests above verify plumbing. They do *not* answer "does the answerability adapter actually judge answerability correctly?" A per-adapter qualitative suite (`@pytest.mark.qualitative`, opt-in, kept out of the fast loop) takes a small canonical dataset per adapter and asserts an accuracy floor on its outputs. Without this, a refactor can pass every structural test while silently degrading the behaviour users care about. -Two existing tools — already part of Mellea's broader LLM-unit-testing conversation rather than bespoke to this refactor — fit naturally here and should be preferred over ad-hoc harnesses: - -- **`TestBasedEval`** (in-tree — `mellea/templates/prompts/default/TestBasedEval.jinja2`, documented at `docs.mellea.ai/how-to/unit-test-generative-code`) is Mellea's own LLM-as-judge component. Each adapter gets a JSON file of test cases (`{input, target, guidelines}`); a judge model returns `{"score": 0|1, "justification": "..."}`. Runnable from the CLI (`m eval run tests/eval_data/.json --backend ollama --model granite4.1:3b`) so the same fixtures power both CI and interactive debugging. This is the default mechanism for per-adapter qualitative coverage. -- **BenchDrift** (`github.com/IBM/BenchDrift`) addresses a second failure mode: an adapter that works on its canonical phrasing but breaks on semantically-equivalent rephrasings. BenchDrift generates syntactic variations of each test case while preserving meaning, then scores consistency across variations. Worth applying to the adapters where phrasing-invariance is a real concern — answerability, context-relevance, requirement-check, and the Guardian family all qualify. Optional extension rather than baseline coverage, but enabling it per-adapter is a one-config-file step once the `TestBasedEval` fixtures exist. - -Kept cheap (tens of test cases per adapter, not hundreds) so qualitative runs fit in a reasonable nightly-CI budget. +The implementation approach for this suite is intentionally left open — start simple and file a separate proposal if a more structured approach (e.g. `TestBasedEval`, `BenchDrift`) is warranted. **Tutorials** — three worth writing alongside the refactor: - "Adding a custom intrinsic in 20 lines" — replaces the `CustomIntrinsicAdapter` monkey-patch story. From 59d7a35a03c6629fd75efcede2be4f0b4da0ce95 Mon Sep 17 00:00:00 2001 From: Nigel Jones Date: Thu, 21 May 2026 18:40:53 +0100 Subject: [PATCH 28/29] =?UTF-8?q?docs:=20resolve=20=C2=A717=20Q3=20+=20Q4;?= =?UTF-8?q?=20defer=20schema=5Fversion=20to=20#1111=20(#929)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Q3 (auto-context document discovery): closed [Open] → [Resolved] per Jake on PR #1080, 2026-05-20. Helpers read documents from ordinary conversation context; PR #1028's `_docs`-scanning fallback is shelved. Direction lifted; mechanism refined. Q4 (`schema_version` in io.yaml): closed [Open] → [Resolved — Defer to #1111]. Mellea does not introduce `schema_version`. Forward- compatibility preserved: helpers raise `AdapterSchemaMismatchError` only when parse cannot yield the helper's declared output contract; benign additions do not trigger. Output-schema versioning is tracked separately in #1111. Cascaded consistency edits across §3, §6 risk, §7 terminology table (removed `Schema version` row, simplified Identity), §12 mapping, §13 user-visible behaviour, §14.1 telemetry diagram, §15 documentation, §16 phasing, §18 references — all replace `schema_version` parser-dispatch references with the contract- mismatch + pinning story. Assisted-by: Claude Code --- docs/dev/proposals/929-adapter-lifecycle.md | 33 ++++++++++----------- 1 file changed, 16 insertions(+), 17 deletions(-) diff --git a/docs/dev/proposals/929-adapter-lifecycle.md b/docs/dev/proposals/929-adapter-lifecycle.md index 1b58b642a..9aea3021c 100644 --- a/docs/dev/proposals/929-adapter-lifecycle.md +++ b/docs/dev/proposals/929-adapter-lifecycle.md @@ -57,7 +57,7 @@ Every thread in #929 is a symptom of not having separated the kinds of adapter a Four outcomes, in order of importance. Detail on each lives in Part II; this list is the ask. 1. **One adapter model, one code path.** Reasonable from the outside, unified from the inside — no more `if backend._uses_embedded_adapters:` branches. -2. **Safe evolution.** Model-option precedence is documented and enforced. Adapter weights are versioned by HF commit SHA — Mellea can pin to a specific revision for stability or track latest for newest weights (refresh policy in §17 Q5). Output schemas are stable in the common case (new weights, same schema); the rare breaking schema change is handled either by pinning, by `schema_version` parser dispatch (§17 Q4), or by helpers raising on mismatch (Jake req 4). Helpers like `check_answerability` see a normalised result regardless of underlying churn. +2. **Safe evolution.** Model-option precedence is documented and enforced. Adapter weights are versioned by HF commit SHA — Mellea can pin to a specific revision for stability or track latest for newest weights (refresh policy in §17 Q5). Output schemas are stable in the common case (new weights, same schema); the rare breaking schema change is handled by pinning the HF revision and by helpers raising `AdapterSchemaMismatchError` when parse cannot yield the helper's declared output contract (Jake req 4). Forward-compatible additions (e.g. an extra optional field) do not trigger the error — only contract-breaking deltas do. Helpers like `check_answerability` see a normalised result regardless of underlying churn. Output-schema versioning beyond this is tracked in [#1111](https://github.com/generative-computing/mellea/issues/1111) (§17 Q4). 3. **First-class customer adapters.** Customers can ship their own against the same API as first-party ones — today it requires patching the catalog or subclassing a self-confessed "temporary hack" ([#424](https://github.com/generative-computing/mellea/issues/424)). 4. **Observable and parity-respecting.** Every lifecycle phase is a distinct span; high-level helpers (`check_answerability` etc.) keep their shape; manual adapter construction becomes simpler, not harder. @@ -79,7 +79,7 @@ An **Adapter** is a small object composed of three parts: ``` Adapter -├── identity — name, adapter type (lora/alora), schema version (proposed; see §7 / §17 Q4), optional role +├── identity — name, adapter type (lora/alora), optional role ├── io_contract — parsed io.yaml: prompt building, output parsing, model options └── weights — one of three pluggable bindings (LocalFile, Embedded, ServerMediated) ``` @@ -178,7 +178,7 @@ Files and modules touched, approximate: `mellea/backends/adapters/{adapter,catal ### Risk - **Biggest unknown**: whether the unified `resolve_model_options` handles every combination currently in use. Mitigation: keep the five-layer precedence explicit, add per-adapter override documentation, and assert resolved values in tests. -- **Second biggest**: handling breaking schema changes from upstream. Three layers: pinning (avoid the risk), `schema_version` parser dispatch (§17 Q4, proposed), helpers raising `AdapterSchemaMismatchError` on parse mismatch (Jake req 4, loud safety net). Worked example: the [#1008](https://github.com/generative-computing/mellea/pull/1008) `requirement-check` change would have surfaced as `AdapterSchemaMismatchError` on the first call after the schema change, rather than silently returning `False`. +- **Second biggest**: handling breaking schema changes from upstream. Two layers: pinning (avoid the risk), and helpers raising `AdapterSchemaMismatchError` when parse cannot yield the helper's declared output contract (Jake req 4, loud safety net). Forward-compatible additions do not trigger the error. Worked example: the [#1008](https://github.com/generative-computing/mellea/pull/1008) `requirement-check` change would have surfaced as `AdapterSchemaMismatchError` on the first call after the schema change, rather than silently returning `False`. (Output-schema versioning is tracked separately in [#1111](https://github.com/generative-computing/mellea/issues/1111) — §17 Q4.) - **Mitigated by**: per-phase test-parity commitment (nothing merges if existing tests regress); observability introduced alongside the refactor so production regressions surface as dashboard signals rather than silent behavioural drift. --- @@ -196,8 +196,7 @@ Names matter because they appear in user-facing error messages, docs, and teleme | **Base model** | The general-purpose LLM (e.g. `ibm-granite/granite-4.1-3b`) that everything runs on top of. | | **AdapterBasedComponent** *(placeholder)* | The user-facing capability: helper functions (`check_answerability`, `requirement_check`, the Guardian helpers), the AST component, input/output parsing. Backed by an adapter. IBM is retiring "Intrinsic" and has not yet confirmed the replacement name; `AdapterBasedComponent` is used throughout this document as a placeholder (see Part I §5). | | **Adapter** | The backend artefact: the weights loaded by a backend (LoRA / aLoRA / embedded), with its identity, I/O contract, and weights binding. The user-facing **Intrinsic** wraps an adapter to provide helpers and parsing. In the redesign, the class hierarchy collapses from four (`IntrinsicAdapter` / `EmbeddedIntrinsicAdapter` / `CustomIntrinsicAdapter` + abstract base) to one `Adapter` + a pluggable binding. | -| **Identity** | The part of an adapter that says *what it is*: name (e.g. `answerability`), adapter type (`lora` / `alora`), schema version, and optional role. | -| **Schema version** | *Proposed parser-dispatch field for breaking schema changes only.* For routine weight updates the HF commit SHA is the version (no new field needed). `schema_version` would only earn its keep if the granite team ships a *breaking* output-schema change (different keys, nesting, or types) and unpinned callers need graceful v1↔v2 parser dispatch. **Open** (§17 Q4) — granite-common may already have a versioning mechanism we should reuse instead of inventing this. | +| **Identity** | The part of an adapter that says *what it is*: name (e.g. `answerability`), adapter type (`lora` / `alora`), and optional role. | | **I/O contract** | The parsed `io.yaml` — prompt template, output parser, model-option defaults. Always present, same shape regardless of reality. *Name under discussion: Jacob prefers `io_config`; `io_contract` is used throughout this proposal but is not final.* | | **Weights binding** | The part of an adapter that says *how its weights are made available*. Three subclasses, one per reality. Exposes `prepare`, `activate`, `deactivate`, `release`. | | **Reality A / B / C** | Shorthand for the three "where the weights live" stories: A = local PEFT file, B = shipped with the base model (Granite Switch), C = server-mediated (future OpenAI/vLLM). | @@ -351,11 +350,11 @@ This is a **backend-keyed dispatch** where the branching key (`_uses_embedded_ad | 2a. Intrinsic rewriters overwrite options | `Adapter.resolve_model_options()` replaces the five-place merge with one documented stack. | | 2b/2c. Model-option hierarchy | Five layers enforced in `resolve_model_options` (base model → adapter config → `io.yaml` defaults → `io.yaml` per-intrinsic → caller). | | 3. Naming consistency | Three-axis identity (`name`, `adapter_type`, `revision`) plus explicit `role`. | -| 4a. `call_intrinsic` assumes one output schema | `io_contract.parse()` validates the output shape and raises `AdapterSchemaMismatchError` on mismatch (Jake req 4); helpers see a normalised shape. Dispatch on `(name, schema_version)` is an optional layer if §17 Q4 is adopted. | +| 4a. `call_intrinsic` assumes one output schema | `io_contract.parse()` validates the output shape and raises `AdapterSchemaMismatchError` when parse cannot yield the declared contract (Jake req 4); forward-compatible additions do not trigger the error. Helpers see a normalised shape. | | 4b. Per-adapter vs standard schema | `io_contract.parse()` is per-adapter; helpers define the normalised post-parse shape. | -| 4c. Versioning | HF commit SHA is the version (every push = new revision; pin via `revision="..."` for stability). Breaking schema changes (rare) handled by pinning + helpers raising `AdapterSchemaMismatchError` on parse mismatch (Jake req 4); `schema_version` parser dispatch (§17 Q4) is an optional layer if the granite team adds the field. | +| 4c. Versioning | HF commit SHA is the version (every push = new revision; pin via `revision="..."` for stability). Breaking schema changes (rare) handled by pinning and by helpers raising `AdapterSchemaMismatchError` when parse cannot yield the declared contract (Jake req 4). Output-schema versioning beyond this is tracked separately in [#1111](https://github.com/generative-computing/mellea/issues/1111) (§17 Q4). | | 5. OpenAI backend support | Ships as one or two `ServerMediatedBinding` subclasses. | -| 6. Catalog cleanup | Catalog becomes optional resolver (`LocalFileBinding.from_catalog(name)`). Custom adapters bypass it; no monkey-patching. Duplicate `requirement_check` / `requirement-check` entries collapse into one entry; the v1 → v2 output-schema change (PR #1008) is handled by Jake req 4 + optional `schema_version` dispatch (§17 Q4). | +| 6. Catalog cleanup | Catalog becomes optional resolver (`LocalFileBinding.from_catalog(name)`). Custom adapters bypass it; no monkey-patching. Duplicate `requirement_check` / `requirement-check` entries collapse into one entry; the v1 → v2 output-schema change (PR #1008) is handled by Jake req 4 (helper raises when parse cannot yield the declared contract); pinning the prior HF revision is the avoidance path. | | 7. Hardcoded `requirement-check` refs | Callers look up by **role**, not name. | ## 13. What users see — detailed @@ -368,7 +367,7 @@ score = check_answerability(question, documents, context, backend, model_options={"temperature": 0.1}) ``` -**Validation on parse.** Helpers declare their expected output shape; `io_contract.parse()` validates against it and raises `AdapterSchemaMismatchError` on mismatch — with `name`, observed keys, and expected keys in the message. Schema drift is loud, not silent. (Jake req 4.) +**Validation on parse.** Helpers declare their expected output shape; `io_contract.parse()` validates against it and raises `AdapterSchemaMismatchError` when the parse cannot yield the helper's declared output contract — with `name`, observed keys, and expected keys in the message. Forward-compatible additions (an extra optional field the parser ignores) do not trigger the error; contract-breaking deltas (missing required field, type change on a depended-on key) do. Schema drift is loud, not silent. (Jake req 4.) **Manual adapter construction** collapses from four classes (`IntrinsicAdapter`, `EmbeddedIntrinsicAdapter`, `CustomIntrinsicAdapter`, abstract base) to one `Adapter` + a binding: @@ -400,7 +399,7 @@ adapter = Adapter(name="answerability", Adapter calls hide the complexity that matters most when something goes wrong (weight fetching, activation side-effects, schema contracts). Without per-phase instrumentation, four failure modes are hard or impossible to diagnose — and Mellea has already hit the first two in production: 1. **Masked errors.** The `obtain_lora`-always-called bug (#929 point 1b) showed users a misleading download error while the real cause (adapter-type mismatch) stayed invisible. A span at the `prepare` boundary recording the exception would have surfaced the actual cause on first run. -2. **Silent schema drift.** When PR #1008 changed `requirement-check` output from `{"requirement_likelihood": 0.9}` to `{"requirement_check": {"score": 0.9}}`, `requirement_check_to_bool` silently returned `False` for every call until someone noticed. Under Jake req 4 (helpers raise on schema mismatch), this would have surfaced as `AdapterSchemaMismatchError` on the first call after the schema change — the caller gets a named error instead of a silently wrong value. The `parse_failures` counter labelled by `(name, revision)` is the dashboard signal; the exception is the runtime signal. +2. **Silent schema drift.** When PR #1008 changed `requirement-check` output from `{"requirement_likelihood": 0.9}` to `{"requirement_check": {"score": 0.9}}`, `requirement_check_to_bool` silently returned `False` for every call until someone noticed. Under Jake req 4 (helpers raise when parse cannot yield the declared contract), this would have surfaced as `AdapterSchemaMismatchError` on the first call after the schema change — the caller gets a named error instead of a silently wrong value. The `parse_failures` counter labelled by `(name, revision)` is the dashboard signal; the exception is the runtime signal. 3. **Latency attribution.** "`check_answerability` is slow" is unanswerable today — download, PEFT load, generation, and JSON parse collapse into one backend span. Phase-level spans make the culprit obvious in any trace viewer. 4. **Alerting and cost attribution.** OTel `ERROR` status on failed download/activation makes generic dashboards and alerts work. Token counts labelled by adapter answer "which capability is 30% of our spend?" Both impossible today. @@ -416,7 +415,7 @@ graph TD root --> prep["intrinsic.prepare
LocalFile: download ms"] root --> act["intrinsic.activate
peft_name / controls / api_id"] root --> gen["intrinsic.generate
(regular backend span:
tokens, latency)
"] - root --> par["intrinsic.parse
revision, schema_version*,
parse_ok, raw_len
"] + root --> par["intrinsic.parse
revision, parse_ok, raw_len"] root --> deact["intrinsic.deactivate"] ``` @@ -443,7 +442,7 @@ The implementation approach for this suite is intentionally left open — start **Tutorials** — three worth writing alongside the refactor: - "Adding a custom intrinsic in 20 lines" — replaces the `CustomIntrinsicAdapter` monkey-patch story. -- "Handling a breaking schema change without breaking users" — worked example using `requirement-check` v1 → v2; covers HF revision pinning, `AdapterSchemaMismatchError` (Jake req 4), and `schema_version` dispatch if §17 Q4 is adopted. +- "Handling a breaking schema change without breaking users" — worked example using `requirement-check` v1 → v2; covers HF revision pinning and `AdapterSchemaMismatchError` (Jake req 4). - "Reading intrinsic telemetry" — short dashboard-building guide. **Release notes** separate: no-op for high-level helper users; deprecated-but-shimmed for direct adapter constructors; removed at Phase 4 (see below). @@ -453,7 +452,7 @@ The implementation approach for this suite is intentionally left open — start Detail deferred until Part I §5 decisions are agreed, but the intended phasing is: 1. **Phase 0 — parallel types.** Introduce the new types (`Adapter`, `WeightsBinding`, `IOContract`, plus a user-facing `Intrinsic` class if Q5 is settled on Jake's split) alongside existing classes. Catalogue entries gain pinned HF revision SHAs (Jake req 5; §17 Q6). No call-site changes, tests unchanged. -2. **Phase 1 — callers move.** `_util.call_intrinsic`, requirement rerouting, and each helper switch to new types. Helpers gain output validation raising `AdapterSchemaMismatchError` on parse mismatch (Jake req 4). #1003 helper signature work folds in here: `model_options=` on all top-level helpers; `documents=` keyword-only on `factuality_detection` / `factuality_correction`. Old classes become deprecation shims. +2. **Phase 1 — callers move.** `_util.call_intrinsic`, requirement rerouting, and each helper switch to new types. Helpers gain output validation raising `AdapterSchemaMismatchError` when parse cannot yield the declared contract (Jake req 4). #1003 helper signature work folds in here: `model_options=` on all top-level helpers; `documents=` keyword-only on `factuality_detection` / `factuality_correction`. Auto-context document discovery for `documents=None` lifted from PR #1028 (§17 Q3); mechanism refined per intrinsics-team guidance — helpers read documents from ordinary conversation context, not from a `_docs`-specific scan path. Old classes become deprecation shims. 3. **Phase 2 — backends move.** `AdapterMixin` narrows to the new verb set. Bindings implement `prepare` / `activate` / `deactivate` / `release` per reality; `LocalFileBinding.prepare` resolves the configured HF revision (§17 Q5 weight-refresh policy). Backends drop per-call `_simplify_and_merge` in favour of `resolve_model_options`. 4. **Phase 3 — Reality C ships.** `ServerMediatedBinding` subclass(es) written; OpenAI backend drops `_uses_embedded_adapters` hard-code. 5. **Phase 4 — shim removal.** After one minor release with deprecation warnings. *(Skipped if Q5 settles on Jake's split — re-exports stay.)* @@ -468,8 +467,8 @@ Items marked **[Open]** need decision; **[Position]** is the proposal's working 2. **Role vs name** [Position]. `role` is a free-form string with an advisory known-roles registry (e.g., `mellea.backends.adapters.roles.KNOWN_ROLES`). Backends warn on unknown roles but accept any string. Pure enum was considered but rejected — it would lock role names at library-release time. 3. **Rewind interaction (formerly PR #1028).** Two parts: - **Where rewind logic lives** [Resolved] (Jake on PR #1080): the rewind in `_resolve_question` / `_resolve_response` stays in the helpers. Phase 1 can revisit moving it to `io_contract.build_prompt` if cleaner separation is wanted; not gating. - - **Manual-Message document detection** [Open] — Phase 1 design call. When `documents=None` is passed to `factuality_detection` / `factuality_correction`, how should the helper find user-supplied documents? Two approaches: (a) require explicit `documents=` and don't scan the conversation; (b) scan the context for `Message`s with `_docs` and extract from there. PR #1028 implemented (b) via `_resolve_response`'s manual-Message fallback; the design call was deferred when #1028 closed (2026-05-15) so the work isn't lost. -4. **`schema_version` field in `io.yaml`** [Open] — cross-team. §4, §7, §9, and §12 all assume the `io.yaml` parsed by granite-common / granite-formatters carries a `schema_version`. It doesn't today, so this is asking that team to add a field. Worth suggesting to them? Or do they have another approach to versioning? + - **Document discovery when `documents=None`** [Resolved] (Jake on PR #1080, 2026-05-20). When `documents=None` is passed to `factuality_detection` / `factuality_correction`, the helper auto-discovers user-supplied documents from the conversation context rather than requiring an explicit `documents=` argument. The auto-discovery direction was the contribution of PR #1028; intrinsics-team guidance refines the *mechanism*: documents flow through ordinary conversation context (no `_docs`-scanning fallback path needed). Phase 1 implements helpers that read whatever documents are present in the context they receive; populating that context is the caller's responsibility (explicit `documents=`, prior `Message`s, retrieval, …). PR #1028's specific `_resolve_response` `_docs`-scanning code is shelved. +4. **Output-schema versioning** [Resolved — Defer to [#1111](https://github.com/generative-computing/mellea/issues/1111)]. This refactor assumes `io.yaml` does **not** carry a `schema_version` field, and Mellea does not introduce one. Forward-compatibility is preserved: helpers only raise `AdapterSchemaMismatchError` when parse cannot yield the declared contract, not on benign additions. Versioning is tracked separately in #1111; promote to in-progress when the trigger conditions documented there are hit. 5. **Weight-refresh policy** [Position]. Adapter weights are versioned by HF commit SHA. `prepare()` re-resolves the upstream revision at session start; long-running processes (sessions spanning a release) opt into an explicit `refresh()` API. Default cadence matches the session-scoped lifecycle (Part I §5 Q2 Resolved). 6. **Version pinning for auto-loaded adapters** [Position]. When an adapter is auto-loaded from the catalogue (caller didn't specify a revision), Mellea pins to the catalogue entry's recorded SHA. `revision="main"` is an explicit opt-in to track latest. Pinning gives reproducibility; explicit tracking gives latest weights at the cost of behaviour drift between runs. (Jake req 5; coupled to Q5 weight-refresh policy.) @@ -500,7 +499,7 @@ Linked index of every issue, PR, and commit cited in this document. Use this to | [#979](https://github.com/generative-computing/mellea/pull/979) | fix: key in json returned by policy_guardrails intrinsic | rework evidence for output parsing | | [#986](https://github.com/generative-computing/mellea/pull/986) | fix: issues introduced by intrinsic changes | rework evidence | | [#994](https://github.com/generative-computing/mellea/pull/994) | fix: default intrinsic adapter types; granite-switch tests | rework evidence | -| [#1008](https://github.com/generative-computing/mellea/pull/1008) | fix: rewrite requirement_check_to_bool for new schema | worked example for the schema-version story | +| [#1008](https://github.com/generative-computing/mellea/pull/1008) | fix: rewrite requirement_check_to_bool for new schema | worked example for the contract-mismatch story (Jake req 4) | | [#1028](https://github.com/generative-computing/mellea/pull/1028) | feat: normalize intrinsics interfaces | introduces the factuality rewind path | #### Rework evidence in detail @@ -522,7 +521,7 @@ Seven recent fix-up commits in the adapter area, all symptomatic of the design g | Ref | Title | Role in this doc | | --- | --- | --- | | [#1003](https://github.com/generative-computing/mellea/issues/1003) | fix: intrinsic function signatures | folded into Phase 1 of this epic; PR #1028 closed 2026-05-15 | -| [PR #1028](https://github.com/generative-computing/mellea/pull/1028) | feat: normalize intrinsics interfaces | closed 2026-05-15 in favour of folding into this epic; manual-Message detection carried forward as Phase 1 design call (§17 Q3) | +| [PR #1028](https://github.com/generative-computing/mellea/pull/1028) | feat: normalize intrinsics interfaces | closed 2026-05-15 in favour of folding into this epic. Two threads inherited: (1) #1003 helper signatures → Phase 1 (already scoped); (2) auto-context document discovery → Phase 1 (§17 Q3 Resolved 2026-05-20; mechanism refined to ordinary-context reading, not `_docs` scanning). | | [#1035](https://github.com/generative-computing/mellea/issues/1035) | OTel emission gaps | parent for telemetry coordination | | [PR #1036](https://github.com/generative-computing/mellea/pull/1036) | feat(telemetry): close five OTel GenAI semconv gaps | in-flight telemetry work to coordinate with | From 3c95223bea24b6fc36c6b09588b261fc062e7031 Mon Sep 17 00:00:00 2001 From: Nigel Jones Date: Thu, 21 May 2026 21:03:50 +0100 Subject: [PATCH 29/29] docs: add issue breakdown plan for epic #929 Combines decomposition rationale, Mermaid dependency diagram, filing waves, pre-flight checklist, and full issue bodies for all 11 sub-issues (v3.1) into a single planning document alongside the design proposal. Refs #929 Assisted-by: Claude Code --- docs/dev/proposals/929-issue-plan.md | 1275 ++++++++++++++++++++++++++ 1 file changed, 1275 insertions(+) create mode 100644 docs/dev/proposals/929-issue-plan.md diff --git a/docs/dev/proposals/929-issue-plan.md b/docs/dev/proposals/929-issue-plan.md new file mode 100644 index 000000000..2403671ba --- /dev/null +++ b/docs/dev/proposals/929-issue-plan.md @@ -0,0 +1,1275 @@ +# Epic #929 — Issue Breakdown Plan + +> **Companion to:** [929-adapter-lifecycle.md](./929-adapter-lifecycle.md) (the design proposal) +> **Parent epic:** [#929](https://github.com/generative-computing/mellea/issues/929) +> **PR:** [#1080](https://github.com/generative-computing/mellea/pull/1080) +> **Status:** review draft — issues to be filed after final approval +> +> This document combines the decomposition rationale, dependency diagram, +> filing plan, and full issue bodies for all 11 sub-issues. After issues +> are filed and PR #1080 closes, this becomes a historical artefact. + +--- + +# Part 1 — Overview + +**Parent:** [#929](https://github.com/generative-computing/mellea/issues/929) — Fix Intrinsic Adapter Lifecycle & Consistency in Mellea +**Design proposal:** PR [#1080](https://github.com/generative-computing/mellea/pull/1080) +**Detailed breakdown:** [`proposed-issues.v3.md`](./proposed-issues.v3.md) (full issue bodies, acceptance criteria, test plans) +**Date:** 2026-05-21 + +--- + +## Background and rationale + +The adapter/intrinsic area has produced seven fix-up commits in a short period, three silent bugs, and blocked two features (#1018, #27). The root cause is a single structural problem: code that should know what *kind* of adapter it is (local PEFT file, Granite Switch embedded, server-mediated) spreads that decision across every call site via `isinstance` branching. Adding a new backend reality forces a new branch everywhere. + +The refactor replaces a four-class hierarchy (`IntrinsicAdapter`, `EmbeddedIntrinsicAdapter`, `CustomIntrinsicAdapter`, abstract `Adapter`) with a single composable type: + +``` +Adapter +├── identity — name, adapter_type (lora|alora), optional role +├── io_contract — wraps granite-common / granite-formatters; builds prompt, parses output +└── weights — pluggable WeightsBinding: LocalFile / Embedded / ServerMediated +``` + +The binding says what kind of adapter it is. The backend executes four verbs (`prepare`, `activate`, `deactivate`, `release`) uniformly — no branching. + +### Why decompose at all? + +A single "refactor everything" PR would be: unreviable (thousands of LOC), all-or-nothing (any regression blocks the whole change), impossible to parallelise, and catastrophic to rebase if upstream keeps moving. The decomposition ensures: + +- **Each PR is reviewable** — scoped to one file, one abstraction surface, or one coherent concern. Reviewers see a clear before/after, not a diff that touches 30 files. +- **Breakage is isolated** — each phase keeps existing tests green; a regression in one PR does not block all other work. +- **Work can be parallelised** — after the two bottleneck PRs (1.A and 2.1) merge, three to four independent PRs can proceed simultaneously. +- **Stacked PRs are minimised** — the only true stacks are 0.1→1.A and 1.A→2.1. Everything else in each wave is independent. +- **Incremental user-visible improvement** — each phase delivers working, shippable code; the epic does not need to land as a big-bang release. + +--- + +## Decomposition shape — 11 issues + +| Issue | Phase | Title | Depends on | Blocks | +|---|---|---|---|---| +| **0.1** | 0 | Introduce `Adapter` / `Identity` / `IOContract` / `WeightsBinding` scaffolding (+ `AdapterBasedComponent` placeholder, `KNOWN_ROLES`) | — | 1.A, 2.1, 2.2, 2.3 | +| **0.2** | 0 | Pin catalogue entries to HF revision SHAs (+ deduplicate requirement_check entries) | — | 2.2 | +| **1.A** | 1 | Internal migration with shims — old classes inherit from new `Adapter`; `call_intrinsic` rewritten | 0.1 | 1.B, 1.C, 1.D, 2.1 | +| **1.B** | 1 | RAG helpers (`rag.py` whole-file, 6 helpers) migrate to new types | 1.A | — | +| **1.C** | 1 | `requirement_check` / `requirement_check_to_bool` migrate to new types; schema-mismatch becomes loud | 1.A | — | +| **1.D** | 1 | `guardian.py` whole-file migration (4 helpers + `documents=` keyword-only + auto-context discovery) | 1.A | — | +| **2.1** | 2 | `AdapterMixin` verb rename/narrow + `resolve_model_options` centralisation + `IntrinsicMetricsPlugin` | 1.A | 2.2, 2.3 | +| **2.2** | 2 | `LocalFileBinding` implements verbs (PEFT/aLoRA path) + `from_catalog()` + spans + integration tests | 0.1, 0.2, 2.1 | 4.1 | +| **2.3** | 2 | `EmbeddedBinding` implements verbs (Granite Switch path) + spans | 0.1, 2.1 | 4.1 | +| **3.1** | 3 | `ServerMediatedBinding` implementation — **blocked on upstream #27 (vLLM aLoRA)** | Phase 2 + #27 | — | +| **4.1** | 4 | Remove deprecation shims; rewrite `docs/dev/intrinsics_and_adapters.md`; write 3 tutorials | 1.A + Phase 2 + 1 minor release | — | +| **[#1111](https://github.com/generative-computing/mellea/issues/1111)** | cross | Output-schema versioning (already filed, deferred) | 0.1 (needs `AdapterSchemaMismatchError`) + trigger conditions | — | + +--- + +## Dependency diagram + +```mermaid +graph TD + A["0.1 — New types scaffolding\n(Adapter/Identity/IOContract/WeightsBinding\nAdapterBasedComponent placeholder\nKNOWN_ROLES registry)"] + B["0.2 — Catalogue HF revision pinning\n(+ deduplicate requirement_check entries)"] + C["1.A — Internal migration + shims\n(call_intrinsic rewritten\nold classes inherit from new Adapter\nadapter_scope hook point)"] + D["1.B — rag.py whole-file migration\n(6 helpers + model_options=\n+ output validation)"] + E["1.C — requirement_check migration\n(loud schema-mismatch\nbreaking: no more silent False)"] + F["1.D — guardian.py whole-file migration\n(4 helpers + documents= kw-only\n+ auto-context discovery)"] + G["2.1 — AdapterMixin verb rename/narrow\n(+ resolve_model_options\n+ IntrinsicMetricsPlugin\n+ adapter_observability.md)"] + H["2.2 — LocalFileBinding implements verbs\n(PEFT/aLoRA path\n+ from_catalog() classmethod\n+ spans + integration tests\n+ docs/examples update)"] + I["2.3 — EmbeddedBinding implements verbs\n(Granite Switch path\n+ spans\n+ AGENTS.md update)"] + J["3.1 — ServerMediatedBinding\n⛔ blocked on upstream #27"] + K["4.1 — Remove shims\n+ rewrite intrinsics_and_adapters.md\n+ 3 tutorials\n(after 1 minor release)"] + L["#1111 — Output-schema versioning\n(already filed, deferred)"] + + A --> C + A --> G + A --> H + A --> I + A --> L + + B --> H + + C --> D + C --> E + C --> F + C --> G + C --> K + + G --> H + G --> I + + H --> K + I --> K + + I --> J + H --> J + + style J fill:#ffcccc,stroke:#cc0000 + style K fill:#ffe0b2,stroke:#e65100 + style L fill:#e8f5e9,stroke:#388e3c +``` + +--- + +## Filing waves + +For a single developer, the waves define the serialisation constraint. Within each wave, issues are independent — the first to get a PR open does not block the others. + +**Wave 1** (start immediately, parallel): +→ 0.1, 0.2 — both independent; no shared files. + +**Wave 2** (after 0.1 has a draft PR): +→ 1.A — the only Phase 1 bottleneck. Kept deliberately narrow (~300–500 LOC) so it merges quickly. + +**Wave 3** (after 1.A merges — four issues run fully in parallel): +→ 1.B, 1.C, 1.D (independent helper files), and 2.1 (backend mixin) in parallel. + +**Wave 4** (after 2.1 merges): +→ 2.2, 2.3 — independent bindings for the two active realities. + +**Tracking / deferred** (file any time, mark blocked): +→ 3.1 (blocked on #27), 4.1 (deferred one minor release after shims introduced in 1.A). + +The two bottlenecks (1.A, 2.1) are the only serialisation points. Kept small by design. + +--- + +## Pre-flight — open issues and PRs to resolve first + +| Item | Overlap | Action | +|---|---|---| +| **PR #935** — Guardian docs migration | Touches same docs as 1.D/2.2/2.3 | Merge before starting 1.D | +| **PR #1078** — intrinsic tests / safeguards (fixes #1029) | Adds formatter test data that 0.2/1.C build on | Merge before starting 0.2/1.C; close #1029 | +| **#1094** — Migrate session example off deprecated GuardianCheck | Same file as 1.D scope | Close with 1.D (include in 1.D scope) | +| **#1071** — Guardian-backed Requirement subclass (new feature) | Adds guardian code in old style | Do not start before 1.D merges | + +--- + +## What each phase delivers + +| Phase | Delivered | Breakage risk | +|---|---|---| +| 0 | New type vocabulary; catalogue with pinned SHAs; `KNOWN_ROLES` advisory registry | None for users — purely additive | +| 1 | All helpers on new types; `model_options=` on all helpers; loud schema-mismatch errors; deprecation warnings on old construction | `requirement_check_to_bool` stops returning silent `False` (intentional breaking change; flagged in changelog) | +| 2 | Fully working `LocalFileBinding` and `EmbeddedBinding`; observability spans + metrics; `from_catalog()` API; backend mixin cleaned up | `AdapterMixin` verb rename (downstream backends must update; migration table in changelog) | +| 3 | `ServerMediatedBinding` (when unblocked) | Internal to OpenAI backend | +| 4 | Shims removed; full docs rewrite; three tutorials | Old class imports break (by design, after deprecation window) | + +--- + +## Cross-cutting deliverables (not phase-specific) + +Every issue includes its own tests. The end state across all phases also delivers: + +- **Telemetry:** `IntrinsicMetricsPlugin` (2.1) + per-verb OTel spans `intrinsic.call / prepare / activate / parse / deactivate` (2.2, 2.3). `parse_failures` counter is the schema-drift detector — a climbing count against `(name, revision)` means an upstream adapter pushed a breaking schema change. Content capture via `MELLEA_TRACE_CONTENT` gate (consistent with #1035 / PR #1036). +- **Docs:** `docs/dev/requirement_aLoRA_rerouting.md` (1.A), `docs/docs/advanced/intrinsics.md` (2.2, 2.3), `docs/dev/adapter_observability.md` (2.1+), `AGENTS.md §13` (2.3), full rewrite of `docs/dev/intrinsics_and_adapters.md` (4.1), three tutorials (4.1). +- **Examples:** `docs/examples/intrinsics/` updated at every helper-migration PR (1.B, 1.C, 1.D) and the new construction pattern added at 2.2. +- **Output-schema versioning:** tracked separately in [#1111](https://github.com/generative-computing/mellea/issues/1111); unblocked when `AdapterSchemaMismatchError` exists (0.1). + +--- + +*Detailed issue bodies (problem, agreed design, scope, out of scope, acceptance criteria, test plan, risks, breaking changes, impinging issues, references) are in [`proposed-issues.v3.md`](./proposed-issues.v3.md) in this directory.* + +--- + +# Part 2 — Full issue bodies + +# Per-helper-file migration template + +Issues 1.B, 1.C, 1.D share the same shape. Each issue body says "follows the per-helper-file migration template" plus the file-specific deltas. + +### Common shape per migrated file + +Each per-file PR does four things, in this order, on a single helper file: + +1. **Migrate construction.** Replace internal use of `IntrinsicAdapter(...)` / `EmbeddedIntrinsicAdapter(...)` with direct construction of `Adapter(identity=..., io_contract=..., weights=...)` from the new types introduced in 0.1. +2. **Normalise signature.** Add `model_options: dict | None = None` as a keyword argument to every helper in the file. (File-specific: 1.D's factuality helpers also add `documents=` keyword-only — see 1.D.) +3. **Add output validation (Jake req 4).** Declare each helper's expected output contract; wire `io_contract.parse()` to raise `AdapterSchemaMismatchError` when parse cannot yield that contract. Forward-compatible additions (extra optional fields the parser ignores) do NOT raise — only contract-breaking deltas (missing required field, type change on a depended-on key) do. +4. **Update docs and examples.** The Phase 1 per-file PRs are when helpers gain new parameters and contracts; docs and examples must ship with the code, not after. Each per-file PR is responsible for: (a) updating the docstring for every helper it touches with the declared output contract and any new parameters; (b) updating `docs/examples/intrinsics/` examples that call helpers in this file; (c) adding a brief note to the PR description pointing at any user-facing docs page that needs a follow-up rewrite in Phase 2. + +### Common acceptance criteria + +- [ ] All helpers in the file construct their `Adapter` using the new types (no `IntrinsicAdapter(...)` calls in helper code) +- [ ] All helpers accept `model_options: dict | None = None` (file-specific extras as noted in each issue) +- [ ] Each helper's output is validated against a declared contract; `AdapterSchemaMismatchError` raised on contract-break, NOT on benign additions +- [ ] Existing helper tests pass (behavioural neutrality is the bar) +- [ ] New tests cover: (a) declared contract enforced — feed a synthetic output missing a required field, assert the error; (b) forward-compat — feed an output with an extra optional field, assert it does NOT raise +- [ ] Docstrings updated: every helper in the file documents its declared output contract and any new parameters +- [ ] `docs/examples/intrinsics/` examples that call helpers in this file updated to use `model_options=` where applicable; examples pass `uv run pytest docs/examples/intrinsics/` (or skipped with correct marker if backend-gated) +- [ ] `ruff format`, `ruff check`, `mypy` clean +- [ ] DeprecationWarning suppression: callers can still construct `IntrinsicAdapter(...)` (shim from 1.A) but emit a `DeprecationWarning` pointing at the new construction pattern + +### Common test plan + +- Existing happy-path tests pass unchanged +- New test: `AdapterSchemaMismatchError` raised on synthetic missing-field output (per helper) +- New test: forward-compatible addition (extra optional field) does NOT raise (per helper) +- Docstring spot-check: `help(check_answerability)` (or equivalent for this file's helpers) shows the declared contract and new parameters + +### Why per-file PRs + +Each PR is scoped to a single helper file (one coherent concern: "this file's helpers are now on the new types"). They run in parallel after 1.A merges, share no files, and the first one to merge sets the pattern reviewers can lean on for the rest. + +--- + +# Phase 0 — parallel types + +## 0.1 — Introduce `Adapter` / `Identity` / `IOContract` / `WeightsBinding` scaffolding (+ `AdapterBasedComponent` placeholder) + +**Parent:** #929 · **Blocks:** 1.A, 2.1, 2.2, 2.3 · **Phase:** 0 + +### Problem + +Today's adapter hierarchy is `IntrinsicAdapter` / `EmbeddedIntrinsicAdapter` / `CustomIntrinsicAdapter` plus an abstract base class `Adapter` (in `mellea/backends/adapters/adapter.py`). The split is by *where the weights live* (local PEFT file vs embedded in the base model vs server-mediated), but that distinction leaks into every caller as `isinstance` branching: `_util.call_intrinsic`, requirement rerouting, every helper, and the backends themselves all have separate code paths per subclass. + +This branchy structure was the root cause of seven recent fix-up commits in the adapter area (`8b6b8d55`, `c57aba1d`, `8577d092`, `4d372b0e`, `0617bd96`, `75465d29`, `1734900d`). It also blocks adding new realities cleanly — see #1018 (granite-switch on HF backend). + +Separately, IBM is retiring the term "Intrinsic" but has not confirmed the replacement name. Mellea agreed to use **`AdapterBasedComponent`** as a placeholder until that decision lands upstream. + +### Agreed design + +Replace the four-class hierarchy with a single `Adapter` composed of three parts plus a pluggable weights binding: + +``` +Adapter (new shape) +├── identity — name, adapter_type (lora|alora), optional role +├── io_contract — input/output handling; wraps granite-common / granite-formatters +└── weights — pluggable WeightsBinding subclass (LocalFile / Embedded / ServerMediated) +``` + +**Naming-collision note (critical for implementer):** the existing `Adapter` ABC at `mellea/backends/adapters/adapter.py:24` is the same name as the proposed new type. The new type is introduced under a new module (`_core.py` or equivalent) and re-exported. Until 4.1 deletes the old shims, the old `Adapter` ABC and the new `Adapter` dataclass coexist. Implementer may either (a) introduce the new type under a different module path and alias as `Adapter` in the public surface, or (b) move the old ABC to a private name. Document the choice in the PR description. + +**Types to introduce:** + +- `Adapter` — dataclass holding `identity`, `io_contract`, `weights` +- `Identity` — dataclass holding `name: str`, `adapter_type: Literal["lora", "alora"]`, `role: str | None = None` +- `IOContract` — ABC with two methods: + - `build_prompt(...) -> Component` — builds the prompt object (must return a `Component`-compatible object, not a raw string). Delegates `io.yaml` handling to granite-common / granite-formatters; does not re-implement that logic. + - `parse(raw: str) -> dict` — parses adapter output. Raises `AdapterSchemaMismatchError` only when parse cannot yield the helper's declared output contract. Forward-compatible additions do NOT raise. +- `WeightsBinding` — ABC with four verbs (all abstract): + - `prepare(self) -> None` — fetches/loads weights + - `activate(self, ctx) -> None` — switches the adapter on for a generation + - `deactivate(self, ctx) -> None` — switches it off + - `release(self) -> None` — drops weights at session teardown +- Three stub subclasses raising `NotImplementedError` on each verb: + - `LocalFileBinding` — Reality A + - `EmbeddedBinding` — Reality B + - `ServerMediatedBinding` — Reality C (blocked on #27) +- `AdapterSchemaMismatchError` exception class with attributes: `name`, `observed_keys`, `expected_keys`. Message format: `"Adapter '{name}' output cannot satisfy declared contract. Observed keys: {observed_keys}; expected: {expected_keys}."` + +**Placeholder module (folded in from former issue 0.3):** + +- New module path: `mellea.stdlib.components.adapter_based_component` (placeholder) +- Re-exports today's `Intrinsic` class as `AdapterBasedComponent` +- Old import path `mellea.stdlib.components.intrinsic` stays valid +- Module docstring notes the placeholder rationale and that the module will be renamed when IBM confirms the post-"Intrinsic" name, with one minor release of overlap + +**`KNOWN_ROLES` advisory registry (§17 Q2):** + +- New constant: `mellea/backends/adapters/roles.py` — `KNOWN_ROLES: frozenset[str]` containing the initial known role strings (e.g. `"requirement-check"`, `"answerability"`, `"guardian"`, `"factuality"`) +- `Identity` construction warns (`UserWarning`) when `role` is set to a value not in `KNOWN_ROLES`; does not reject it — `role` stays free-form +- The registry is advisory: downstream code and new adapter authors consult it to avoid typos; it is not a schema enforcement point + +### Scope + +- New module: `mellea/backends/adapters/_core.py` (or equivalent) with the new types +- New module: `mellea/backends/adapters/roles.py` with `KNOWN_ROLES` +- New module: `mellea/stdlib/components/adapter_based_component/__init__.py` re-exporting `Intrinsic` as `AdapterBasedComponent` +- Imports the new types and `KNOWN_ROLES` into `mellea/backends/adapters/__init__.py` for downstream use +- Existing `IntrinsicAdapter` / `EmbeddedIntrinsicAdapter` / `CustomIntrinsicAdapter` and the existing `Adapter` ABC are **not** modified in this issue (1.A handles those) + +### Out of scope + +- Any caller migration (1.A and per-file issues) +- Any binding verb implementation beyond `NotImplementedError` (Phase 2) +- Removal of old classes (4.1) +- Catalogue revision pinning (0.2) +- Renaming the AST class itself or rewriting prose (sequenced when IBM confirms the post-"Intrinsic" name) +- Observability spans on the verbs (added when Phase 2 implements them) + +### Acceptance criteria + +- [ ] `Adapter`, `Identity`, `IOContract`, `WeightsBinding` types exist and are importable from `mellea.backends.adapters` +- [ ] `IOContract` ABC enforces both `build_prompt` and `parse` as abstract +- [ ] `WeightsBinding` ABC enforces all four verbs as abstract +- [ ] `LocalFileBinding`, `EmbeddedBinding`, `ServerMediatedBinding` exist as concrete subclasses, each raising `NotImplementedError` on each verb +- [ ] `AdapterSchemaMismatchError` exists, carries the three attributes, formats messages correctly +- [ ] `from mellea.stdlib.components.adapter_based_component import AdapterBasedComponent` works +- [ ] `AdapterBasedComponent is Intrinsic` evaluates True (same class object, not a wrapper) +- [ ] Existing imports from `mellea.stdlib.components.intrinsic` continue to work +- [ ] Naming-collision resolution (old `Adapter` ABC vs new `Adapter` dataclass) documented in PR description +- [ ] `KNOWN_ROLES` importable from `mellea.backends.adapters`; `Identity(role="unknown-role")` emits a `UserWarning`; `Identity(role="answerability")` does not warn +- [ ] Unit tests cover: type construction, ABC enforcement (cannot instantiate without overriding), `AdapterSchemaMismatchError` formatting, both placeholder import paths, `KNOWN_ROLES` warning behaviour +- [ ] Existing tests pass unchanged (no caller migration in this issue) +- [ ] `ruff format`, `ruff check`, `mypy` clean + +### Test plan + +New tests under `test/backends/adapters/test_core_types.py`: +- `test_adapter_dataclass_construction` +- `test_identity_validation` — adapter_type literal enforcement +- `test_io_contract_abc_enforcement` — cannot instantiate without overriding methods +- `test_weights_binding_abc_enforcement` — same for the four verbs +- `test_stub_binding_subclasses_raise_not_implemented` — each verb on each subclass +- `test_adapter_schema_mismatch_error_format` — message string includes name + observed + expected keys + +New test under `test/stdlib/components/test_adapter_based_component.py`: +- `test_adapter_based_component_is_intrinsic` — alias and original are the same class +- `test_both_import_paths_work` — old and new module imports succeed + +New tests under `test/backends/adapters/test_roles.py`: +- `test_known_roles_is_frozenset` +- `test_unknown_role_warns` — `Identity(name="x", adapter_type="lora", role="typo-role")` emits `UserWarning` +- `test_known_role_does_not_warn` +- `test_none_role_does_not_warn` — role is optional + +### Risks & Mitigations + +| Risk | Mitigation | +|---|---| +| Naming collision: existing `Adapter` ABC and new `Adapter` dataclass share the name | Implementer documents the chosen resolution (alias vs old-rename) in PR description; reviewers confirm before merge | +| New types diverge subtly from granite-common / granite-formatters expectations | `IOContract.build_prompt` delegates rather than re-implements; tests assert delegation, not re-implementation | +| `AdapterSchemaMismatchError` swallowed somewhere upstream and turned back into silent False | Exception attributes are deliberate; 1.C tests will assert it propagates through `requirement_check_to_bool` | +| `AdapterBasedComponent` placeholder name leaks into user-facing prose / docs prematurely | Module docstring explicitly tags it as a placeholder; prose rewrites are out-of-scope here | + +### Breaking Changes + +None at the public-API level. Internal contributors who import `Adapter` from `mellea.backends.adapters.adapter` (the old ABC) may need to update if the implementer chooses to move that ABC to a private name — flagged in PR description. + +### Impinging Issues / PRs + +- #1018 — granite-switch on HF backend; blocked behind the new types becoming the canonical extension point +- #1080 (this proposal) — closes once issues filed +- #1111 — already filed (output-schema versioning); the `AdapterSchemaMismatchError` introduced here is the surface that #1111's versioning will eventually wrap + +### References + +- PR #1080 design proposal §4 (rough end result), §9 (end-state design detail), §9.2 (weights binding verbs per reality), Part I §5 Q5 (placeholder rationale) +- Jake req 4 (helpers raise on contract mismatch) — see also #1111 + +--- + +## 0.2 — Pin catalogue entries to HF revision SHAs + +**Parent:** #929 · **Blocks:** 2.2 (revision-aware `prepare`) · **Phase:** 0 + +### Problem + +The intrinsic catalogue (`mellea/backends/adapters/catalog.py`) does not record which *revision* of an upstream HF repository it expects. When upstream pushes new weights, every Mellea install silently picks them up — and if those weights have a different output schema, the helper that depends on the old schema breaks silently. + +PR #1008 is the worked example: `requirement-check` output changed from `{"requirement_likelihood": 0.9}` to `{"requirement_check": {"score": 0.9}}` upstream. `requirement_check_to_bool` returned `False` for every call until someone noticed. + +Verified state on main (2026-05-21): `IntriniscsCatalogEntry` has fields `name`, `internal_name`, `repo_id`, `adapter_types` — no `revision`. There are 14 catalogue entries across `_RAG_REPO`, `_CORE_REPO`, `_CORE_R1_REPO`, `_GUARDIAN_REPO`. Note: the type name typo `IntriniscsCatalogEntry` (missing `i`) is intentional convention to preserve — do not "fix" as part of this issue. + +### Agreed design + +Each catalogue entry gains a `revision` field pinned to a specific 40-character HF commit SHA. Mellea pins to that SHA when auto-loading the adapter. Callers can opt into tracking-latest by passing `revision="main"` explicitly, accepting the behavioural-drift risk. + +This is Jake req 5. + +**Catalogue deduplication (thread 6):** The catalogue currently carries two entries for the same adapter: `requirement_check` (underscore) and `requirement-check` (hyphen). The design resolves thread 6 by making the catalogue an optional resolver, keeping one canonical entry. Collapse to a single `requirement_check` entry with `role="requirement-check"` set on the `Identity`; the role-based lookup introduced in 1.A routes correctly regardless of the key used at construction time. This removes dead dead-state from the catalogue before 1.A adds the role-based lookup. + +### Scope + +- `mellea/backends/adapters/catalog.py` — add `revision` field to `IntriniscsCatalogEntry`, populate every entry with the current upstream HF SHA at the time of this PR +- Validation: `revision` must be a 40-character lowercase hex string OR the literal `"main"` +- Validation function lives next to the catalogue type +- Collapse the `requirement_check` / `requirement-check` duplicate catalogue entries into one canonical `requirement_check` entry (with `role="requirement-check"`) +- Update any catalogue-construction examples in `docs/examples/` and `test/` to include the new field and remove the duplicate entry reference + +### Out of scope + +- Any change to `prepare()` behaviour (issue 2.2 implements `LocalFileBinding.prepare` to *use* the pinned revision) +- Refresh policies for long-running sessions (issue 2.2; see also #1111) +- Auto-bumping the SHA when upstream pushes (manual maintenance for now) +- "Fixing" the `IntriniscsCatalogEntry` typo (preserve as-is) + +### Acceptance criteria + +- [ ] All 14 catalogue entries have a `revision` field with a 40-char hex SHA matching the current upstream +- [ ] Revision validation rejects malformed values (too short, non-hex, etc.) with a clear error +- [ ] `"main"` is accepted as the explicit opt-in for tracking-latest +- [ ] Tests cover: valid SHA accepted, malformed SHA rejected, `"main"` accepted, `None` handling documented (implementer's choice — accept-as-main or reject-with-error — must be tested explicitly) +- [ ] `requirement_check` and `requirement-check` catalogue entries collapsed to one (`requirement_check` with `role="requirement-check"`); no duplicate key +- [ ] Existing tests pass; helpers continue to function unchanged (the new field is metadata only at this stage) +- [ ] `ruff format`, `ruff check`, `mypy` clean + +### Test plan + +New tests under `test/backends/adapters/test_catalog_revision.py`: +- `test_catalog_entries_have_revision` — every entry has the field set to a valid value +- `test_revision_validation_rejects_malformed` — short, long, non-hex +- `test_revision_validation_accepts_main_literal` +- `test_revision_round_trip` — construct entry, retrieve, assert preserved +- `test_no_duplicate_requirement_check_entry` — catalogue has exactly one entry matching the requirement_check/requirement-check family + +### Risks & Mitigations + +| Risk | Mitigation | +|---|---| +| SHA pinned at PR-write time is already stale by merge time | Reviewer re-fetches upstream HEAD just before merge; documented in PR description | +| Implementer pins to a revision whose schema is ALREADY broken vs current Mellea code | Validation is mechanical; behavioural correctness verified by existing helper tests passing post-merge | +| Field added but no enforcement until 2.2 ships | Documented as "metadata-only at this stage" in module docstring; 2.2 dependency relationship called out | +| Pydantic validator perf hit if called on every catalogue access | Catalogue is constructed once at import; validator runs at construction, not access | + +### Breaking Changes + +None for end users. Anyone constructing `IntriniscsCatalogEntry` directly (downstream forks, tests outside Mellea) must add the new field or accept the implementer's `None` handling. + +### Impinging Issues / PRs + +- PR #1008 — the schema flip that motivated this; reference in PR description +- #1111 — versioning for output schemas (deferred); this issue is the upstream half (pin the input weights), #1111 is the downstream half (version the output contract) + +### References + +- PR #1080 design proposal §17 Q6 (version pinning for auto-loaded adapters), §6 risk discussion +- PR #1008 — worked example of silent schema drift +- Jake req 5 + +--- + +# Phase 1 — callers move + +## 1.A — Internal migration with shims (Phase 1 foundation) + +**Parent:** #929 · **Depends on:** 0.1 · **Blocks:** 1.B, 1.C, 1.D, 2.1, 4.1 · **Phase:** 1 + +### Problem + +`mellea/stdlib/components/intrinsic/_util.py:call_intrinsic` and the requirement-rerouting code in `mellea/stdlib/requirements/requirement.py` both branch on the old `IntrinsicAdapter` / `EmbeddedIntrinsicAdapter` subclasses. Until these internal callers operate on the new `Adapter` type from 0.1, no helper can migrate. + +External users may also be constructing the old classes directly (e.g. for custom intrinsics). Migrating internal callers without a backward-compat path would break them. + +### Agreed design + +This issue does two tightly-coupled things in one PR — splitting them creates ordering pain and conflicting branch state. Combined, they form a single coherent change: "internal code now operates on `Adapter`; old constructors keep working via subclass shims." + +**(a) Old classes become inheriting shims.** `IntrinsicAdapter`, `EmbeddedIntrinsicAdapter`, and `CustomIntrinsicAdapter` are restructured so they: + +- **Inherit from `Adapter`** (the new dataclass from 0.1) — `isinstance(x, IntrinsicAdapter)` continues to work, and any `isinstance(x, Adapter)` check is also satisfied. +- Translate constructor arguments into the equivalent `Identity` + `IOContract` + `WeightsBinding` triple, then call `Adapter.__init__` with that triple. +- Emit a `DeprecationWarning` once per construction site (`stacklevel=2`), pointing at the new construction pattern. +- Carry no behavioural state of their own — every method delegates to the inherited `Adapter` machinery. + +**(b) Internal callers operate on `Adapter`.** `_util.call_intrinsic` and requirement rerouting are rewritten: + +```python +adapter = backend.resolve_adapter(name) +with backend.adapter_scope(adapter): + raw = backend.generate(adapter.io_contract.build_prompt(...)) +return adapter.io_contract.parse(raw) +``` + +`adapter_scope` wraps `activate()` / `deactivate()` per call. `prepare()` happens at session start (issue 2.2); `release()` at session teardown. + +Role-based lookup for requirement rerouting uses `Identity.role` instead of `isinstance` branching on subclass. + +Backend-side: `resolve_adapter` and `adapter_scope` are new methods on the abstract backend. Real implementations come in Phase 2; for now, existing backends grow stub implementations that delegate to the old code paths via a temporary internal shim. This stub is intentional — it lets internal callers migrate while Phase 2 fills in the backend verbs. + +### Why this is one issue / one PR + +- Splitting (a) and (b) creates an ordering question with no clean answer. +- Combined, the change is ~300–500 LOC across `_util.py`, `requirement.py`, three old class definitions, and the abstract backend stub. Reviewable as a single coherent PR. +- Once merged, every Phase 1 per-file PR becomes a small, independent change touching one helper file each. + +### Scope + +- `mellea/backends/adapters/__init__.py` — old classes restructured as shims inheriting from `Adapter` +- `mellea/stdlib/components/intrinsic/_util.py` — `call_intrinsic` rewritten +- `mellea/stdlib/requirements/requirement.py` — rerouting rewritten +- Abstract backend (and concrete backend stubs): `resolve_adapter`, `adapter_scope` methods added; backed by temporary internal delegation to old code paths. `adapter_scope` is **the future telemetry parent** — Phase 2 will wrap it in an `intrinsic.call` OTel span; stubs here do not add instrumentation yet, but the hook point must exist at the right call-site boundary so Phase 2 can instrument in one place. +- `docs/dev/requirement_aLoRA_rerouting.md` — update to describe role-based lookup (using `Identity.role`) instead of the previous hardcoded `requirement-check` string; this is the direct resolution of thread 7 in the design proposal's thread mapping + +### Out of scope + +- Any helper file migration (1.B–D) +- Backend verb implementations (2.2, 2.3) +- `AdapterMixin` rename/narrow (2.1) +- Final shim removal (4.1) + +### Acceptance criteria + +- [ ] `IntrinsicAdapter(...)` returns a subclass instance that satisfies both `isinstance(x, IntrinsicAdapter)` and `isinstance(x, Adapter)` +- [ ] Same for `EmbeddedIntrinsicAdapter` and `CustomIntrinsicAdapter` +- [ ] Each old constructor emits exactly one `DeprecationWarning` per call (not per import), with `stacklevel=2` +- [ ] `_util.call_intrinsic` operates on `Adapter`; no `isinstance` branching on old subclasses +- [ ] Requirement rerouting uses `Identity.role` +- [ ] `adapter_scope` exists at the correct call-site boundary (ready for Phase 2 span wrapping); implementation is a pass-through context manager at this stage +- [ ] `docs/dev/requirement_aLoRA_rerouting.md` updated to describe role-based lookup; markdownlint passes +- [ ] All existing tests pass — behavioural neutrality is the bar +- [ ] Explicit test for "external user constructs old class" path still works (drop-in replaceability) +- [ ] `ruff format`, `ruff check`, `mypy` clean + +### Test plan + +- Existing helper and backend tests pass without modification +- New tests under `test/backends/adapters/test_old_class_shims.py`: + - `test_old_classes_inherit_from_adapter` + - `test_old_constructor_emits_deprecation_warning` — once per call, `stacklevel=2` + - `test_old_constructor_drop_in_replaceable` — construct via old API, assert behaviour matches direct `Adapter(...)` construction +- New tests for the role-based lookup with multiple registered roles + +### Risks & Mitigations + +| Risk | Mitigation | +|---|---| +| Shims silently change behaviour vs old classes | Behavioural-neutrality test suite (existing tests) is the gate; explicit drop-in replaceability test | +| `DeprecationWarning` spam for users with deep call stacks | `stacklevel=2`; documentation of the migration path in the warning message | +| `resolve_adapter` / `adapter_scope` stubs in backends drift from real Phase-2 behaviour | Stubs delegate to old code paths so behaviour is unchanged; Phase-2 PRs replace internals while keeping the surface stable | +| External users override `_simplify_and_merge` or other internals on old classes | Document in PR that subclassing old classes is unsupported for the deprecation period; surface in changelog | +| 300–500 LOC PR runs into review fatigue | PR description leads with "this is the only Phase-1 bottleneck"; reviewers know everything else is parallel after this | + +### Breaking Changes + +- **Subclassing the old classes** — anyone subclassing `IntrinsicAdapter` etc. and overriding internal methods may break. The public constructor signature is preserved; only internal structure changes. +- **`isinstance` checks against the old `Adapter` ABC** — if anyone external relies on the abstract base class identity, the implementer's resolution from 0.1 (alias vs rename) determines whether this breaks. + +### Impinging Issues / PRs + +- #1018 — granite-switch on HF backend; uses the new `resolve_adapter` / `adapter_scope` surface introduced here +- PR #972 — historic precedence bug in `_simplify_and_merge`; this issue does not touch that code path (2.1 does) +- #1080 §17 Q3 — auto-context discovery decision; consumed by 1.D, not here + +### References + +- PR #1080 §9 (end-state design detail), §11 (why current code is tangled), §16 Phase 1 first step +- #929 thread mapping rows 1, 2, 3 + +--- + +## 1.B — RAG helpers (`rag.py` whole-file) per-file migration + +**Parent:** #929 · **Depends on:** 1.A · **Phase:** 1 + +### Problem + +`mellea/stdlib/components/intrinsic/rag.py` contains six helpers in one file: `check_answerability`, `rewrite_question`, `clarify_query`, `find_citations`, `check_context_relevance`, `flag_hallucinated_content`. All currently construct via `IntrinsicAdapter(...)` (now a deprecation shim after 1.A). Signatures do not consistently accept `model_options=`. None has output validation — schema drift from upstream `_RAG_REPO` weights would silently change behaviour. + +Per-helper PRs would all touch the same file and serialise behind one another. One PR migrates the whole file. + +### Agreed design + +**Follows the per-helper-file migration template** (top of this document). File-specific deltas: + +- **Six helpers, six declared output contracts** — each helper declares its own contract; implementer confirms each against current weights before writing the contract. +- **Forward-compat tolerance applies per helper** — extra optional fields in upstream output do NOT raise. +- **No `documents=` parameter on any of these** — that's specific to factuality (1.D). + +### Scope + +- `mellea/stdlib/components/intrinsic/rag.py` +- Tests under `test/stdlib/components/intrinsic/test_rag.py` (or split per helper if that file exists) + +### Out of scope + +- Other helper files (1.C, 1.D) +- Backend changes (Phase 2) +- Removing the old `IntrinsicAdapter` shim (4.1) +- "Fixing" or refactoring helper signatures beyond the additive `model_options=` (preserve existing positional args) + +### Acceptance criteria + +See common acceptance criteria. Plus, for each of the six helpers: + +- [ ] Constructs via `Adapter(identity=..., io_contract=..., weights=...)` +- [ ] Accepts `model_options: dict | None = None` +- [ ] Has a declared output contract documented in its docstring +- [ ] Forward-compat: synthetic output with an extra optional field does NOT raise +- [ ] Contract-break: synthetic output missing a required field raises `AdapterSchemaMismatchError` + +### Test plan + +See common test plan, applied per helper. Each helper gets its own pair of contract tests (pass and fail). + +### Risks & Mitigations + +| Risk | Mitigation | +|---|---| +| Implementer guesses an output contract that doesn't match current weights | Each helper's contract is verified against current `_RAG_REPO` weights before writing the test; PR description documents the verification | +| Six contract changes in one PR creates a large diff | Diff is mechanical (one pattern repeated six times); reviewer can spot-check 1-2 helpers and trust the rest | +| `flag_hallucinated_content` returns a boolean-like that downstream callers coerce informally | Document expected output type explicitly in helper docstring; coercion responsibility stays with caller | + +### Breaking Changes + +None at signature level (additive `model_options=`). Behaviour is neutral; only adds the `AdapterSchemaMismatchError` that previously did not exist (callers that swallowed `KeyError` will now see `AdapterSchemaMismatchError` — surface in changelog). + +### Impinging Issues / PRs + +- PR #1080 §13 (what users see) — design surface for these helpers +- Any open issue against a specific RAG helper (implementer should `gh issue list --search "rag" --state open` before starting and reference any active threads) + +### References + +- PR #1080 §13, Jake req 4 +- Per-helper-file migration template (top of this document) + +--- + +## 1.C — `requirement_check` per-helper migration + +**Parent:** #929 · **Depends on:** 1.A · **Phase:** 1 + +### Problem + +`requirement_check` and `requirement_check_to_bool` (in `mellea/stdlib/requirements/requirement.py`) currently route through `IntrinsicAdapter(...)`. PR #1008 is the canonical evidence that schema drift here is real and previously silent — the output schema flipped from `{"requirement_likelihood": 0.9}` to `{"requirement_check": {"score": 0.9}}`, and `requirement_check_to_bool` returned `False` for every call until someone noticed. + +Today (verified on main 2026-05-21): `requirement_check_to_bool` uses the post-#1008 schema `req_dict["requirement_check"]["score"]` and returns `False` with a warning on schema mismatch — silent failure mode. + +### Agreed design + +**Follows the per-helper-file migration template.** File-specific deltas: + +- **Output contract:** the helper's declared output is the post-#1008 shape `{"requirement_check": {"score": float}}`. Implementer confirms against current weights. +- **Schema-mismatch is loud here in particular** — this helper is the design's worked example. If a future upstream change breaks the contract, callers must see `AdapterSchemaMismatchError` immediately rather than silently-wrong booleans. +- **`requirement_check_to_bool` wrapper** stops returning `False` on parse failure — it now propagates `AdapterSchemaMismatchError`. This IS a behavioural change vs main; flagged below as a breaking change. + +### Scope + +- `mellea/stdlib/components/intrinsic/` — wherever `requirement_check` lives +- `mellea/stdlib/requirements/requirement.py` — `requirement_check_to_bool` +- Tests covering both functions + +### Out of scope + +- Other helpers (1.B, 1.D) +- Auto-bumping catalogue revision when upstream pushes (project-wide, not here) +- Catalogue deduplication — the `requirement_check` / `requirement-check` duplicate entries are collapsed in issue 0.2; by the time this issue is worked, only the `requirement_check` entry (with `role="requirement-check"`) exists in the catalogue. This issue resolves whichever canonical entry the role-based lookup (from 1.A) surfaces. + +### Acceptance criteria + +See common acceptance criteria. Plus: + +- [ ] `requirement_check(...)` returns the expected shape on happy path +- [ ] `requirement_check_to_bool(...)` returns `bool` on happy path +- [ ] `requirement_check_to_bool(...)` propagates `AdapterSchemaMismatchError` on schema-break (NOT silent `False`) +- [ ] Synthetic output `{"requirement_check": {"score": 0.9}}` parses correctly +- [ ] Synthetic output `{"requirement_likelihood": 0.9}` (the pre-#1008 shape) raises `AdapterSchemaMismatchError` + +### Test plan + +See common test plan. Plus a regression test specifically named after #1008 demonstrating that the pre-#1008 shape now raises rather than silently coerces to `False`. + +### Risks & Mitigations + +| Risk | Mitigation | +|---|---| +| Callers depending on the silent-`False` behaviour break | Flagged as breaking change in changelog; behaviour change documented in PR description; prior silent failure was a bug, not a feature | +| Both `requirement_check` and `requirement-check` catalogue entries exist (3.2/3.3 compat) | Out-of-scope to change; helper resolves whichever entry the running model needs | +| `requirement_check_to_bool` is on a hot path (every requirement check) | Validation is one extra dict access on the parsed output; negligible perf cost | + +### Breaking Changes + +- **`requirement_check_to_bool` no longer returns silent `False` on parse failure** — it now raises `AdapterSchemaMismatchError`. Any caller that wraps this in a try/except and treats failure as "requirement not met" must update to handle the exception explicitly. Surface in changelog. + +### Impinging Issues / PRs + +- PR #1008 — the original schema flip; reference in PR description +- **PR #1078** — open PR adding canned I/O test data for `requirement-check` and `uncertainty` formatters (under `test/formatters/granite/`). Merge before starting 1.C; this PR's test data is the foundation the 1.C tests build on. +- 0.2 — catalogue revision pinning; once 0.2 ships, the upstream half of "no silent drift" is in place; 1.C is the downstream half + +### References + +- PR #1080 §14.1 #2 (silent schema drift worked example) +- PR #1008 — the original schema flip +- Jake req 4 + +--- + +## 1.D — `guardian.py` whole-file migration (Guardian + factuality + auto-context) + +**Parent:** #929 · **Depends on:** 1.A · **Phase:** 1 + +### Problem + +`mellea/stdlib/components/intrinsic/guardian.py` (verified on main, 221 lines) contains FOUR helpers in one file: + +- `policy_guardrails` +- `guardian_check` (Guardian family core) +- `factuality_detection` +- `factuality_correction` + +All currently construct via the old API. Splitting into separate PRs (Guardian vs factuality) would force same-file conflicts. One PR migrates the whole file. + +Additional file-specific concerns: + +- Factuality helpers historically take `documents` as a positional argument — inconsistent with kw-only patterns elsewhere. +- Factuality return type changed from `float` to `str` per #1003 (closed PR #1028 inherited the direction). +- When `documents=None`, callers are forced to pass documents explicitly even when those documents are already in conversation context. + +### Agreed design + +**Follows the per-helper-file migration template.** File-specific deltas — beyond the template, this PR also does: + +**(i) Family-level migration of all four helpers in `guardian.py`.** Each helper has its own declared output contract; validate per-helper. + +**(ii) `documents=` keyword-only on factuality helpers.** `factuality_detection` and `factuality_correction` accept `documents: list[str] | None = None` as keyword-only. Default `None` triggers auto-discovery (see iv). Other Guardian helpers (`policy_guardrails`, `guardian_check`) do NOT take `documents=`. + +**(iii) Factuality return type `str`.** Per #1003 / closed PR #1028, both factuality functions return `str` (was `float`). Update tests accordingly. + +**(iv) Auto-context document discovery (factuality only).** When `documents=None`, the helper auto-discovers user-supplied documents from the conversation context. **Mechanism:** documents flow through ordinary conversation context (e.g. as part of a `Message`'s content). The helper reads whatever documents are present in the context it receives. The caller is responsible for *populating* that context — via explicit `documents=`, prior `Message`s, retrieval, etc. **No `_docs`-specific extraction path.** This adopts the direction of PR #1028 but not its specific code path; per intrinsics-team guidance recorded in PR #1080 §17 Q3 (2026-05-20). + +### Why one PR + +These four helpers all live in `guardian.py`. Three theoretical PRs (Guardian-core / factuality-pair / auto-context) would conflict on the same file. Combined, the work is one PR ~300–400 LOC, reviewable as one coherent change anchored on "everything in `guardian.py` is on the new types and consistent." + +### Scope + +- `mellea/stdlib/components/intrinsic/guardian.py` +- Tests under `test/stdlib/components/intrinsic/test_guardian.py` (or split per helper if that structure already exists) + +### Out of scope + +- Other helper files (1.B, 1.C) +- Reviving #1028's `_docs` scanning code (explicitly shelved) +- Modifying how callers populate context (caller's responsibility, by design) +- Adding new Guardian capabilities + +### Acceptance criteria + +See common acceptance criteria. Plus, per helper: + +**`policy_guardrails`, `guardian_check`:** +- [ ] Construct via new types +- [ ] Accept `model_options: dict | None = None` +- [ ] Each has a declared output contract; contract-break raises, forward-compat does not +- [ ] No `documents=` parameter + +**`factuality_detection`, `factuality_correction`:** +- [ ] Construct via new types +- [ ] Accept `model_options: dict | None = None` +- [ ] `documents: list[str] | None = None` is keyword-only; positional second arg raises `TypeError` +- [ ] Return type is `str` (per #1003) +- [ ] `documents=[...]` works (explicit pass-through, auto-discovery skipped) +- [ ] `documents=None` with documents in conversation context works (auto-discovery picks them up) +- [ ] No `_docs`-specific code path exists in this file +- [ ] Behaviour when no documents are anywhere (`documents=None` and no context documents) is documented and tested explicitly — implementer's choice between sentinel return and explicit error + +### Test plan + +See common test plan, applied per helper. Plus: + +- `test_factuality_detection_explicit_documents` — pass-through +- `test_factuality_detection_auto_discovery_from_context` — picks up documents from prior Messages +- `test_factuality_detection_no_documents_anywhere` — implementer-chosen behaviour, documented in test name +- `test_factuality_documents_kwonly` — positional second arg raises `TypeError` +- `test_factuality_return_type_is_str` + +### Risks & Mitigations + +| Risk | Mitigation | +|---|---| +| Auto-context discovery picks up documents the user did not intend as factuality sources | Helper docstring documents the contract: "any document present in the context is a candidate"; caller-side responsibility framing | +| `documents=` becoming kw-only is a breaking change for callers using positional form | Surface in changelog; deprecation period not feasible (kw-only is mechanical, can't dual-support cleanly without runtime introspection); call out as a breaking change with migration example | +| Return-type change `float → str` is already in place from #1003; this PR formalises the contract | Test specifically asserts `str`; fail-loud if a future change tries to revert | +| Four helpers in one PR creates a wide diff | Reviewer can spot-check Guardian helpers and factuality helpers as two coherent sections | +| Auto-discovery interacts poorly with #1080's broader context model | Document the mechanism explicitly; if the broader context model changes, this helper updates with it | + +### Breaking Changes + +- **`factuality_detection` / `factuality_correction` `documents=` becomes keyword-only.** Positional callers must update. +- **Return type already changed in main per #1003** — this PR codifies it as a contract; no further user impact. + +### Impinging Issues / PRs + +- PR #1080 §17 Q3 (resolved 2026-05-20) — auto-context decision +- PR #1028 (closed) — direction inherited, mechanism shelved +- #1003 — original signature/return-type scope +- intrinsics-team guidance recorded in §17 Q3 +- **#1094** — "Migrate creating_a_new_type_of_session.py example off deprecated GuardianCheck" — close with this issue; include the example migration in this PR's scope (add `docs/examples/sessions/creating_a_new_type_of_session.py` to scope) +- **#1071** — "feat: intrinsic-backed Requirement subclass for Guardian safety validation" — DO NOT start before this issue merges; once merged, the new `Adapter` types from Phase 0/1 are the right foundation for that feature +- **PR #935** — open docs PR touching `docs/docs/advanced/intrinsics.md` and guardian examples; coordinate or merge before starting this issue to avoid doc conflicts + +### References + +- PR #1080 §13 (what users see), §17 Q3 +- Per-helper-file migration template (top of this document) + +--- + +# Phase 2 — backends move + +## 2.1 — `AdapterMixin` verb rename/narrow + `resolve_model_options` centralisation + +**Parent:** #929 · **Depends on:** 1.A · **Blocks:** 2.2, 2.3 · **Phase:** 2 + +### Problem + +Two related backend-surface concerns ship together: + +**(a) Mixin verb mismatch.** Today's `AdapterMixin` (verified on main, `mellea/backends/adapters/adapter.py:240`) exposes five verbs: `base_model_name`, `add_adapter`, `load_adapter`, `unload_adapter`, `list_adapters`. The proposed verbs in PR #1080 §13 are different in shape and naming: `load_peft_adapter`, `unload_peft_adapter`, `render_controls`, `set_request_adapter`. This is a **rename + introduce + narrow**, not a pure narrow. The existing verbs do not map 1:1 onto the proposed verbs. + +**(b) Per-call option merging.** Each backend calls `_simplify_and_merge` on every adapter call to combine model options from various sources (caller, helper defaults, backend defaults). Duplicated logic and a known source of precedence bugs (PR #972 was a fix). + +Both concerns touch the same backend mixin surface and the same set of backend implementations (`LocalHFBackend`, `OpenAIBackend`). Combining keeps the backend-surface change as one coherent PR. + +### Agreed design + +**(a) Mixin verb rename/narrow.** `AdapterMixin` ends up with the proposed verb set: + +``` +AdapterMixin (post-2.1): + load_peft_adapter / unload_peft_adapter # for LocalFile reality + render_controls # for Embedded reality + set_request_adapter # for ServerMediated reality +``` + +Implementer maps existing implementations onto the new verbs (`load_adapter` → `load_peft_adapter` is the obvious starting point; map case-by-case). `base_model_name` and `list_adapters` are removed if unused after 1.A; otherwise relocated or kept with documented justification. + +**Bindings (Phase 2.2/2.3) call into these from their `prepare`/`activate`/etc. implementations.** + +**(b) Option resolution.** New utility `mellea/backends/_options.py:resolve_model_options` (or similar) documents and implements the precedence: explicit caller-passed `model_options=` > helper defaults > backend defaults. Each adapter-supporting backend's adapter call path replaces `_simplify_and_merge` with `resolve_model_options`. + +### Scope + +- `mellea/backends/adapters/adapter.py` — `AdapterMixin` verb rename + narrow +- New utility: `mellea/backends/_options.py:resolve_model_options` +- Backends affected: `LocalHFBackend`, `OpenAIBackend` (the two adapter-supporting backends per PR #1080 §10) — verb implementations renamed/narrowed; `_simplify_and_merge` calls replaced +- Backends NOT affected for adapters: `OllamaBackend`, `WatsonxBackend`, `LiteLLMBackend` (no adapter support) +- **`IntrinsicMetricsPlugin`** — new plugin at `mellea/core/plugins/intrinsic_metrics.py` alongside the existing `TokenMetricsPlugin` / `LatencyMetricsPlugin` / error plugins. Registers three OTel metrics: + - `mellea.intrinsic.invocations` — counter; labels: `name`, `revision`, `binding_type`, `adapter_type`, `outcome` (`success` / `schema_error` / `error`) + - `mellea.intrinsic.phase_duration_ms` — histogram; labels: `name`, `phase` (`prepare` / `activate` / `generate` / `parse` / `deactivate`) + - `mellea.intrinsic.parse_failures` — counter; labels: `name`, `revision`. This is the **schema-drift detector**: a climbing counter against a specific `(name, revision)` pair means a breaking schema change was pushed upstream without revision-pinning catching it. Each increment corresponds to an `AdapterSchemaMismatchError` at the call site. Auto-wired via the existing `TokenMetricsPlugin` registration pattern. +- New dev doc: `docs/dev/adapter_observability.md` — documents the span tree, metric labels, `parse_failures` schema-drift detector pattern, and `MELLEA_TRACE_CONTENT` content-capture gate. Phase 2.2/2.3 will add span emission details; this issue writes the structure document. + +### Out of scope + +- Binding implementations (2.2, 2.3) +- Span instrumentation (lives in 2.2/2.3 where the verbs that emit them are implemented) +- HF backend embedded-adapter support (#1018) + +### Acceptance criteria + +- [ ] `AdapterMixin` exposes exactly the four verbs documented above; old verbs removed or relocated with justification +- [ ] Both adapter-supporting backends implement the new verb set +- [ ] `resolve_model_options` exists, documented, tested +- [ ] Both adapter-supporting backends call `resolve_model_options` instead of `_simplify_and_merge` on the adapter call path +- [ ] `IntrinsicMetricsPlugin` exists and can be registered like existing plugins +- [ ] All three metrics are registered with correct label sets +- [ ] `mellea.intrinsic.parse_failures` increments on each `AdapterSchemaMismatchError` (wired via the same hook as `schema_error` outcome on `invocations`) +- [ ] `docs/dev/adapter_observability.md` written, covers span tree structure, metric labels, `parse_failures` pattern, `MELLEA_TRACE_CONTENT` gate; passes markdownlint +- [ ] Existing tests pass (precedence behaviour unchanged; behavioural neutrality) +- [ ] New unit tests cover precedence: caller > helper > backend +- [ ] New test asserts `AdapterMixin` exposes exactly the four verbs (catches accidental re-additions) +- [ ] Unit tests for `IntrinsicMetricsPlugin`: assert each metric emits with the correct label on a synthetic call; assert `parse_failures` increments on `AdapterSchemaMismatchError` +- [ ] `ruff format`, `ruff check`, `mypy` clean + +### Test plan + +- Existing backend tests pass +- Unit tests for `resolve_model_options` covering each precedence pair (caller-vs-helper, caller-vs-backend, helper-vs-backend) +- New tests asserting the verb set on `AdapterMixin` +- Verb-rename: each old verb's test is renamed/relocated to the new verb name; coverage stays intact +- Unit tests for `IntrinsicMetricsPlugin` using a synthetic OTel exporter (not real infra): `test_invocations_counter_emits_on_success`, `test_invocations_counter_emits_schema_error_outcome`, `test_parse_failures_counter_increments`, `test_phase_duration_histogram_emits_for_each_phase` + +### Risks & Mitigations + +| Risk | Mitigation | +|---|---| +| Verb rename misses a subclass implementation, leaving an abstract method unimplemented at instantiation | `mypy` catches abstract-method-not-implemented; CI gate | +| External implementations of `AdapterMixin` (downstream forks) break | Surface as breaking change in changelog with migration table (old → new verb names) | +| `resolve_model_options` introduces a precedence regression vs `_simplify_and_merge` | Existing tests validate precedence; new explicit precedence tests as belt-and-braces | +| Combining (a) and (b) widens the diff vs splitting | Both touch the same files (backend adapter surface); splitting would cause same-file rebase pain | +| `base_model_name` / `list_adapters` removal breaks introspection helpers | Search before removal; relocate if any internal helper depends on them | + +### Breaking Changes + +- **`AdapterMixin` verb rename** — downstream backends extending `AdapterMixin` must update to the new verb names. Migration table in changelog. +- **`_simplify_and_merge` removal from adapter call path** — internal API change; not part of the public surface, but flagged for any unusual downstream code. + +### Impinging Issues / PRs + +- PR #972 — historic precedence bug, evidence the centralisation matters; reference in PR description +- #1018 — HF backend embedded adapters; will use `render_controls` post-2.1 +- PR #881 — Reality B added to OpenAI backend; uses what becomes `render_controls` + +### References + +- PR #1080 §13 (what users see — detailed), §16 Phase 2 final step +- PR #972 — precedence bug +- Verified state on main: `mellea/backends/adapters/adapter.py:240` (AdapterMixin) + +--- + +## 2.2 — `LocalFileBinding` implements verbs (PEFT/aLoRA path) + +**Parent:** #929 · **Depends on:** 0.1, 0.2, 2.1 · **Blocks:** 4.1 · **Phase:** 2 + +### Problem + +`LocalFileBinding` is currently a stub from issue 0.1, raising `NotImplementedError` on each verb. Reality A (today's `IntrinsicAdapter`) needs the four verbs working before the binding is usable. + +### Agreed design + +Implement the four verbs: + +- `prepare()` — resolves the configured HF revision (uses `revision` field from issue 0.2; defaults to the catalogue-pinned SHA if not specified). Downloads the PEFT weights via the existing HF download path. Registers with the backend via `AdapterMixin.load_peft_adapter`. +- `activate(ctx)` — switches the adapter on in the backend (PEFT layer enabled for the next generation). Uses backend's existing PEFT activation primitives. +- `deactivate(ctx)` — switches the adapter off. Auto-called after each generation by `adapter_scope` (set up in 1.A). +- `release()` — removes the PEFT adapter (`AdapterMixin.unload_peft_adapter`). Called at session teardown. + +`prepare` is session-scoped — called once per session (or per explicit `release()`+`prepare()` cycle for refresh). `activate`/`deactivate` are call-scoped. + +### Scope + +- `LocalFileBinding` class — verb implementations +- Backend integration via existing `AdapterMixin.load_peft_adapter` / `unload_peft_adapter` (renamed in 2.1) +- `LocalFileBinding.from_catalog(name: str) -> LocalFileBinding` classmethod — convenience constructor that looks up the catalogue entry by name (post-0.2 canonical entry, with pinned revision) and returns a fully configured binding. This is the user-facing "standard path" shown in the design proposal's §13 examples: `Adapter(name="answerability", weights=LocalFileBinding.from_catalog("answerability"))`. +- **Span instrumentation.** `adapter_scope` (from 1.A) is now wrapped in an `intrinsic.call` OTel span. `LocalFileBinding` emits child spans: + - `intrinsic.prepare` — attributes: `intrinsic.name`, `intrinsic.revision` (resolved SHA, not "main"), `intrinsic.binding_type="local_file"`, `intrinsic.source` (HF repo ID), download duration + - `intrinsic.activate` — attribute: `intrinsic.peft_name` + - `intrinsic.deactivate` + - `intrinsic.parse` is emitted by `io_contract.parse()` — attributes: `intrinsic.revision`, `intrinsic.parse_ok`, `intrinsic.raw_len` +- **Content capture** (gated on `MELLEA_TRACE_CONTENT` env var, consistent with #1035 / PR #1036): `intrinsic.input.kwargs`, `intrinsic.output.raw`, `intrinsic.output.parsed` emitted as span events +- Update `docs/dev/adapter_observability.md` (created in 2.1) with LocalFile-specific span attributes and the resolved-revision attribute +- Update `docs/docs/advanced/intrinsics.md` to reflect the new `Adapter(weights=LocalFileBinding.from_catalog(...))` construction pattern alongside the existing helper-only examples; the old `IntrinsicAdapter(...)` construction is shown as deprecated with migration note +- Update `docs/examples/intrinsics/` examples: at least one example must show the new Adapter construction; existing helper-call examples gain `model_options=` where relevant + +### Out of scope + +- `EmbeddedBinding` (2.3) +- `ServerMediatedBinding` (3.1) +- Long-running session refresh policy (deferred — PR #1080 §17 Q5) +- Full rewrite of `docs/dev/intrinsics_and_adapters.md` (deferred to 4.1 when shims are gone and the final API shape is stable) + +### Acceptance criteria + +- [ ] All four verbs implemented for `LocalFileBinding` +- [ ] `prepare()` downloads from HF using the pinned revision; explicit `revision="main"` opts into tracking-latest +- [ ] `activate` / `deactivate` toggle the backend's PEFT layer correctly +- [ ] `release()` cleanly unregisters from the backend; second call is a no-op +- [ ] `LocalFileBinding.from_catalog("answerability")` returns a correctly configured binding with the catalogue's pinned revision +- [ ] `intrinsic.call` parent span emitted for every adapter call; child spans `intrinsic.prepare`, `intrinsic.activate`, `intrinsic.deactivate`, `intrinsic.parse` emitted with required attributes +- [ ] `intrinsic.prepare` span records the resolved HF SHA (not `"main"`) as `intrinsic.revision` +- [ ] `MELLEA_TRACE_CONTENT=1`: content events (`intrinsic.input.kwargs`, `intrinsic.output.raw`, `intrinsic.output.parsed`) present; absent otherwise +- [ ] `IntrinsicMetricsPlugin` (from 2.1): `invocations` counter increments; `parse_failures` increments on `AdapterSchemaMismatchError`; `phase_duration_ms` histogram records prepare and activate durations +- [ ] `docs/docs/advanced/intrinsics.md` updated: new construction pattern present, deprecated old pattern noted with migration path +- [ ] `docs/examples/intrinsics/` examples updated: at least one shows new construction; all examples pass `uv run pytest docs/examples/intrinsics/` (or backend-gated with correct marker) +- [ ] `docs/dev/adapter_observability.md` updated with LocalFile-specific attributes +- [ ] Behavioural tests for an end-to-end adapter call (prepare → activate → generate → deactivate → release) pass +- [ ] `ruff format`, `ruff check`, `mypy` clean + +### Test plan + +Unit tests (`test/backends/adapters/test_local_file_binding.py`), with mocked HF download and mocked backend: +- `test_prepare_uses_pinned_revision` — mocked HF download confirms the catalogue SHA is requested, not "main" +- `test_prepare_allows_main_override` — `revision="main"` passes "main" to the download +- `test_release_is_idempotent` — second `release()` call is a no-op +- `test_from_catalog_returns_binding_with_correct_revision` +- `test_activate_deactivate_call_correct_mixin_verbs` +- Span assertion tests using a synthetic OTel exporter: `test_call_span_emitted`, `test_prepare_span_has_revision_attribute`, `test_content_events_absent_by_default`, `test_content_events_present_with_gate_set` +- `test_metrics_invocation_counter_increments` — `IntrinsicMetricsPlugin` wired; assert counter increments on a successful call +- `test_metrics_parse_failures_increments` — inject a synthetic schema-mismatch; assert `parse_failures` increments + +Integration tests (`test/backends/adapters/test_local_file_integration.py`, mark `@pytest.mark.integration`, `@pytest.mark.hf`, `@pytest.mark.slow`): +- Full integration matrix: `LocalHFBackend × LocalFileBinding × {lora, alora} × {check_answerability, requirement_check}` — prepare → activate → generate → deactivate → release; assert expected output shape +- Per-version parse round-trip: inject the pre-#1008 `{"requirement_likelihood": 0.9}` output shape; assert `AdapterSchemaMismatchError` (regression test for the silent-failure case) + +Qualitative test (optional, `@pytest.mark.qualitative`, kept out of fast loop): +- `test_check_answerability_quality` — small canonical dataset, accuracy floor on actual adapter output + +### Risks & Mitigations + +| Risk | Mitigation | +|---|---| +| HF download timeout / network failure during `prepare` | Bubble the underlying `huggingface_hub` exception cleanly; document in helper docstring that `prepare` may raise on network failure | +| `release()` called twice | Make idempotent — second call is a no-op | +| `activate`/`deactivate` race if generation throws mid-call | `adapter_scope` (from 1.A) is a context manager — `__exit__` calls `deactivate` even on exception | +| PEFT layer enable/disable cost on hot path | Document expected overhead in PR; not a regression vs current behaviour | +| Pinned SHA differs from cached SHA from a previous `revision="main"` run | `huggingface_hub` cache key includes revision; correct behaviour by default | + +### Breaking Changes + +None for end users — the binding replaces internal stubs. Behaviour matches the existing `IntrinsicAdapter` runtime path post-1.A. + +### Impinging Issues / PRs + +- 0.2 — catalogue revision pinning; this binding is the consumer +- PR #1080 §17 Q5 — long-running session refresh policy (deferred) +- #1018 — HF backend embedded adapters; orthogonal to this binding (Embedded uses 2.3) + +### References + +- PR #1080 §8.1 (Reality A), §9.2 (verbs per reality), §9.3 (lifecycle sequence) + +--- + +## 2.3 — `EmbeddedBinding` implements verbs (Granite Switch path) + +**Parent:** #929 · **Depends on:** 0.1, 2.1 · **Blocks:** 4.1 · **Phase:** 2 + +### Problem + +`EmbeddedBinding` is a stub from issue 0.1. Reality B (Granite Switch's embedded adapters; today's `EmbeddedIntrinsicAdapter`, used by the OpenAI backend per PR #881) needs the four verbs working. + +### Agreed design + +Implement the four verbs for Reality B (embedded — adapters are part of the base-model weights, not a separate file): + +- `prepare()` — verifies the base model exposes the required adapter (e.g. controls registry on Granite Switch). No file download. +- `activate(ctx)` — calls `AdapterMixin.render_controls` to render the adapter's control prompt +- `deactivate(ctx)` — clears the rendered controls +- `release()` — no-op (nothing to unload; the adapter is part of the base model) + +### Scope + +- `EmbeddedBinding` class — verb implementations +- Uses `AdapterMixin.render_controls` for activation +- Currently only the OpenAI backend supports Reality B (per PR #881); the HF backend will gain support via #1018, sequenced after this refactor +- **Span instrumentation.** `EmbeddedBinding` emits child spans under the `intrinsic.call` parent (set up in 2.2): + - `intrinsic.prepare` — attributes: `intrinsic.name`, `intrinsic.binding_type="embedded"`, `intrinsic.source` (base model identifier). No download — span records "no-op prepare" outcome. + - `intrinsic.activate` — attribute: `intrinsic.controls_key` (name of the controls field rendered into the chat template) + - `intrinsic.deactivate` +- `release()` is a no-op; no span emitted for release +- Update `docs/dev/adapter_observability.md` (from 2.1) to add Embedded-specific span attributes and note the no-op prepare behaviour +- Update `docs/docs/advanced/intrinsics.md` to cover the Embedded reality: `Adapter(weights=EmbeddedBinding.from_base_model(backend))` construction example; note which backends support which bindings +- Update `AGENTS.md §13` (post-parse shapes are now stable for both bindings) with the normalised post-parse shape reference table + +### Out of scope + +- HF backend support for embedded adapters (#1018) +- Other realities (2.2, 3.1) + +### Acceptance criteria + +- [ ] All four verbs implemented for `EmbeddedBinding` +- [ ] OpenAI backend continues to support embedded adapters via the new binding +- [ ] Existing Granite Switch tests pass through the new binding +- [ ] `release()` is a no-op (asserted explicitly in test) +- [ ] `intrinsic.prepare`, `intrinsic.activate`, `intrinsic.deactivate` spans emitted with `binding_type="embedded"` attribute; prepare span records no-op outcome +- [ ] `IntrinsicMetricsPlugin` counters increment correctly for Embedded calls +- [ ] `docs/dev/adapter_observability.md` updated with Embedded-specific span attributes +- [ ] `docs/docs/advanced/intrinsics.md` updated: Embedded construction example present; backend × reality matrix visible to users +- [ ] `AGENTS.md §13` updated with normalised post-parse shape reference +- [ ] `ruff format`, `ruff check`, `mypy` clean + +### Test plan + +Unit tests (`test/backends/adapters/test_embedded_binding.py`), with mocked OpenAI backend: +- `test_prepare_is_noop` — no download; no backend call; span records no-op +- `test_activate_calls_render_controls` +- `test_deactivate_clears_controls` +- `test_release_is_noop` +- `test_multi_call_isolation` — controls from call N do not leak into call N+1 +- Span assertion tests: `test_prepare_span_binding_type_is_embedded`, `test_activate_span_has_controls_key` +- `test_metrics_invocation_counter_increments_for_embedded` + +Integration test (`test/backends/adapters/test_embedded_integration.py`, mark `@pytest.mark.integration`, `@pytest.mark.openai`): +- `OpenAIBackend × EmbeddedBinding × lora/alora` against Granite Switch — prepare → activate → generate → deactivate → release cycle; assert existing Granite Switch tests still pass + +Docs verification: +- `npx markdownlint-cli2 "docs/docs/advanced/intrinsics.md" "docs/dev/adapter_observability.md" "AGENTS.md"` clean + +### Risks & Mitigations + +| Risk | Mitigation | +|---|---| +| `prepare()` verification probe fails for older base models that lack the controls registry | Bubble a clear error naming the missing capability; document required base-model version in helper docstring | +| `activate`/`deactivate` controls leak between calls | `adapter_scope` ensures `deactivate` runs; explicit test for multi-call isolation | +| OpenAI backend uses different control-rendering API in newer versions | Verify against current OpenAI backend implementation before writing; PR description documents the version pinned against | + +### Breaking Changes + +None for end users — the binding replaces internal stubs. OpenAI-backend Granite Switch behaviour is preserved. + +### Impinging Issues / PRs + +- PR #881 — Reality B added to OpenAI backend; this binding is the structural home for that work +- #1018 — HF backend embedded adapters; sequenced after this issue; will reuse `EmbeddedBinding` against `LocalHFBackend` + +### References + +- PR #1080 §8.2 (Reality B), §9.2, §10 (backend × reality matrix) +- PR #881 (Reality B added to OpenAI backend) + +--- + +# Phase 3 — Reality C ships (blocked on upstream) + +## 3.1 — `ServerMediatedBinding` implementation + +**Parent:** #929 · **Depends on:** Phase 2 complete + #27 unblocked · **Phase:** 3 + +### Status: blocked + +Filed for traceability. Sits blocked until vLLM (or another OpenAI-compatible server) supports aLoRA adapters at the API level (#27). Per PR #1080 Part I §5 Q3 (Resolved, Paul): we design the slot but don't invest in stubs while upstream is blocked. + +### Problem + +OpenAI-compatible servers (notably vLLM) do not currently support aLoRA adapters at the API layer. The OpenAI backend cannot serve adapter-based helpers without falling back to embedded (Reality B) or removing adapter support entirely (as PR #543 did in 2026-04). When upstream gains support, Mellea needs `ServerMediatedBinding` ready to plug in. + +### Agreed design + +Implement the four verbs for Reality C — server-mediated adapter where the *server* (not Mellea) owns the weights: + +- `prepare()` — verifies the server exposes the named adapter (probe API) +- `activate(ctx)` — calls `AdapterMixin.set_request_adapter` to set the per-request adapter header/parameter +- `deactivate(ctx)` — clears the per-request adapter +- `release()` — typically a no-op (server owns the weights) + +Specific server-API integration (vLLM, others) is implementer's call when this issue is unblocked. + +### Scope + +- `ServerMediatedBinding` verb implementations +- `OpenAIBackend` integration: drop the `_uses_embedded_adapters` hard-code; use `ServerMediatedBinding` for non-embedded adapters when supported + +### Out of scope + +- Driving vLLM upstream to accept aLoRA support (separate effort, #27) +- Other servers beyond vLLM-compatible (revisit if a customer asks) + +### Acceptance criteria + +- [ ] All four verbs implemented for `ServerMediatedBinding` +- [ ] OpenAI backend uses it for the appropriate adapter type +- [ ] E2E tests against a real vLLM-compatible server with aLoRA support +- [ ] Existing OpenAI backend tests still pass + +### Test plan + +Implementer's call when unblocked. Likely requires a vLLM test fixture or service-level mock. + +### Risks & Mitigations + +| Risk | Mitigation | +|---|---| +| Issue sits blocked for many months and design assumptions go stale | Tracking issue with quarterly check-in; revisit assumptions when #27 has a roadmap | +| Multiple OpenAI-compat servers diverge on aLoRA API shape | Implement against the first one that lands; abstract per-server quirks behind the binding's internals; surface as follow-up issues | +| Hard-coded `_uses_embedded_adapters` removal causes regression on Granite Switch path | Existing Reality B tests gate the change; `EmbeddedBinding` (2.3) is the runtime path for that case | + +### Breaking Changes + +None until unblocked. When implemented, `_uses_embedded_adapters` removal is internal. + +### Impinging Issues / PRs + +- #27 — vLLM aLoRA support; the actual blocker +- PR #543 — historic context (adapter support removed from OpenAI backend when vLLM declined aLoRA) +- PR #881 — embedded adapters added back later + +### References + +- PR #1080 §8.3 (Reality C), Part I §5 Q3 +- #27 — vLLM aLoRA support +- PR #543, PR #881 + +--- + +# Phase 4 — shim removal + +## 4.1 — Remove deprecation shims for old `IntrinsicAdapter` classes + +**Parent:** #929 · **Depends on:** 1.A + Phase 2 complete + 1 minor release elapsed · **Phase:** 4 + +### Status: deferred — file as tracking issue + +### Problem + +The deprecation shims from 1.A still exist after one minor release of warnings. Mellea wants to remove them to finish the structural refactor cleanly. + +Per Part I §5 Q4 (Resolved by Paul/Jake), the deprecation window is at least one minor release (≈ 4–6 weeks), extendable if user impact warrants. + +### Agreed design + +Delete the shim classes: + +- `IntrinsicAdapter` +- `EmbeddedIntrinsicAdapter` +- `CustomIntrinsicAdapter` + +Update the changelog. If `AdapterBasedComponent` placeholder (introduced in 0.1) has been replaced with IBM's final name by this point, fold that rename in here too — otherwise that's a separate issue when the name lands. + +This is also the natural point for a full dev-doc rewrite: the shims are gone, the API shape is final, and the old documentation (which describes the class hierarchy being deleted) is now actively misleading. + +### Scope + +- Delete the three shim classes from `mellea/backends/adapters/__init__.py` and definitions +- Update changelog +- **Rewrite `docs/dev/intrinsics_and_adapters.md`** — the current doc describes the old four-class hierarchy (`IntrinsicAdapter`, `EmbeddedIntrinsicAdapter`, etc.). Rewrite (not edit) to document the final `Adapter` + `WeightsBinding` + `IOContract` model. The design proposal §15 explicitly flags this as "rewrite, not edit." +- **Three tutorials (§15)** — write alongside or immediately after shim removal, when the full API is stable: + - "Adding a custom intrinsic in 20 lines" — replaces the `CustomIntrinsicAdapter` monkey-patch story + - "Handling a breaking schema change without breaking users" — `requirement-check` v1→v2 worked example; HF revision pinning and `AdapterSchemaMismatchError` + - "Reading intrinsic telemetry" — short dashboard-building guide referencing `parse_failures`, `phase_duration_ms`, and the span tree + +### Out of scope + +- Any new functionality +- The post-"Intrinsic" name rename (sequenced when IBM confirms — separate issue) + +### Acceptance criteria + +- [ ] Shim classes removed +- [ ] No internal references to the old class names remain (`grep` gate in test plan) +- [ ] Changelog entry recording the removal with migration note (import path change) +- [ ] `docs/dev/intrinsics_and_adapters.md` rewritten; describes `Adapter` + `WeightsBinding` + `IOContract`; no references to deleted class names +- [ ] At least two of the three tutorials written and linked from `docs/docs/advanced/intrinsics.md` +- [ ] All tutorials' code examples validated against current source (pass `uv run pytest` on any embedded examples) +- [ ] markdownlint clean on all touched docs +- [ ] Existing tests pass (callers should already be on the new types) + +### Test plan + +- `grep` for old class names — should return no hits in `mellea/` (gate in CI) +- `grep` for old class names in `docs/` — no hits in the rewritten docs +- Existing test suite passes +- Docs validation: `npx markdownlint-cli2 "docs/dev/intrinsics_and_adapters.md"` clean +- Tutorial code examples: run via `uv run pytest` (mark examples with `# pytest: e2e, hf` or appropriate markers) + +### Risks & Mitigations + +| Risk | Mitigation | +|---|---| +| External users still depending on old class names break on update | Deprecation window honoured (1 minor release minimum); changelog entry; window extendable per Q4 | +| Removal coincides with IBM's "Intrinsic" rename and conflates two changes | Decision called out in PR description: fold rename in only if IBM has confirmed by then; otherwise keep separate | +| Tests still reference old class names somewhere | `grep` gate in acceptance criteria catches | + +### Breaking Changes + +- **Removal of `IntrinsicAdapter`, `EmbeddedIntrinsicAdapter`, `CustomIntrinsicAdapter`.** Anyone still importing these breaks. By design, after the deprecation window. + +### Impinging Issues / PRs + +- All Phase 1 issues — must be merged for callers to be off the old classes +- All Phase 2 issues — must be merged for backend surface to be on the new types +- IBM "Intrinsic" rename decision — orthogonal, may or may not coincide + +### References + +- PR #1080 §16 Phase 4; Part I §5 Q4 + +--- + +# Cross-references summary + +After filing, every issue body should include a "Related" section linking the parent (`#929`) and any blocking/blocked-by siblings explicitly. The dependency table at the top of this document is the source of truth — when filing, copy the relevant row's edges into each issue body. + +GitHub sub-issues note: per memory `reference_github_subissues`, REST POST is broken; use the GraphQL `addSubIssue` mutation to wire each child to #929 as a formal sub-issue. + +# Open issues / PRs to coordinate before filing + +These are active open issues or PRs in the main repo that overlap with epic #929 work. They must be resolved or coordinated before the corresponding sub-issues are started to avoid conflicts. + +| Item | Overlap | Action | +|---|---|---| +| **PR #935** — docs: migrate Guardian documentation from deprecated GuardianCheck to Intrinsics API | Large open docs PR touching `docs/docs/advanced/intrinsics.md`, AGENTS.md, guardian examples — same files as 1.D/2.2/2.3. Rebased onto main 2026-05-19. | **Merge before starting 1.D.** Our subsequent issues build on top of this docs baseline. If it won't merge before work starts, coordinate to avoid conflicts. | +| **PR #1078** — fix: intrinsic tests and add safeguards for future adapter changes (fixes #1029) | Adds canned I/O tests + `last_validated_commit` for `requirement-check` and `uncertainty` formatters (under `test/formatters/granite/`). | **Merge before starting 0.2/1.C.** Our revision-pinning (0.2) supersedes the `last_validated_commit` approach; 1.C builds on the formatter test data this PR adds. Close #1029 when #1078 merges. | +| **#1094** — Migrate creating_a_new_type_of_session.py example off deprecated GuardianCheck | Session example imports deprecated `GuardianCheck`; same problem space as 1.D. | **Close with 1.D** — include the example migration in 1.D's scope. Add to 1.D impinging issues. | +| **#1071** — feat: intrinsic-backed Requirement subclass for Guardian safety validation | New feature that would add code in the old guardian style | **Gate on 1.D** — do not start #1071 before 1.D merges. When #1071 is picked up, it should use the new `Adapter` types from Phase 0/1. Reference in 1.D's impinging issues. | +| **#1029** — update intrinsic tests | Addressed by PR #1078; 0.2+1.C supersede the `last_validated_commit` approach with revision-pinning. | Close when #1078 merges. No separate action needed. | + +# Filing checklist + +- [ ] **Pre-flight:** PR #935 merged (or conflict-plan in place) +- [ ] **Pre-flight:** PR #1078 merged; #1029 closed +- [ ] Wave 1: 0.1, 0.2 — file simultaneously +- [ ] Wave 2: 1.A — file after 0.1 has a draft PR +- [ ] Wave 3: 1.B, 1.C, 1.D + 2.1 — file after 1.A merges; 1.D must note #1094 (close with this issue) and #1071 (gate on this issue) +- [ ] Wave 4: 2.2, 2.3 — file after 2.1 merges +- [ ] Tracking: 3.1 and 4.1 — file at any point, mark blocked +- [ ] Wire #1111 as sub-issue of #929 (already filed) +- [ ] After every issue is filed, add as sub-issue of #929 via GraphQL `addSubIssue` +- [ ] Update the proposal branch's appendix to reference filed issue numbers (or skip — once PR #1080 closes, the branch is historical) +