diff --git a/docs/dev/hook_system.md b/docs/dev/hook_system.md new file mode 100644 index 000000000..8eeec9cd3 --- /dev/null +++ b/docs/dev/hook_system.md @@ -0,0 +1,1848 @@ +# Mellea Plugin Hook System Design Document + +Mellea's hook system provides extension points for deployed generative AI applications that need policy enforcement, observability, and customization without modifying core library code. Hooks enable plugins to register and respond to events throughout the framework's execution lifecycle — from session initialization through generation, validation, and cleanup. + + +## 1. Overview + +### Design Principles + +1. **Consistent Interface**: All hooks follow the same async pattern with payload and context parameters +2. **Composable**: Multiple plugins can register for the same hook, executing in priority order +3. **Fail-safe**: Hook failures can be handled gracefully without breaking core execution +4. **Minimal Intrusion**: Plugins are opt-in; default Mellea behavior remains unchanged without plugins. Plugins work identically whether invoked through a session (`m.instruct(...)`) or via the functional API (`instruct(backend, context, ...)`) +5. **Architecturally Aligned**: Hook categories reflect Mellea's true abstraction boundaries — Session lifecycle, Component lifecycle, and the (Backend, Context) generation pipeline +6. **Code-First**: Plugins are defined and composed in Python. Decorators are the primary registration mechanism; YAML configuration is a secondary option for deployment-time overrides +7. **Functions-First**: The simplest plugin is a plain async function decorated with `@hook`. Class-based plugins exist for stateful, multi-hook scenarios but are not required + +### Hook Method Signature + +All hooks follow this consistent async pattern: + +```python +# Standalone function hook (primary) +@hook("hook_name", mode="enforce", priority=50) +async def my_hook( + payload: PluginPayload, + context: PluginContext +) -> PluginResult | None + +# Class-based method hook +class MyPlugin(MelleaPlugin): + @hook("hook_name") + async def my_hook( + self, + payload: PluginPayload, + context: PluginContext + ) -> PluginResult | None +``` + +- **`payload`**: Immutable (frozen), strongly-typed data specific to the hook point. Plugins use `model_copy(update={...})` to propose modifications +- **`context`**: Read-only shared context with session metadata and utilities +- **`mode`**: `"enforce"` (default), `"permissive"`, or `"fire_and_forget"` — controls execution behavior (see Execution Mode below) +- **`priority`**: Lower numbers execute first (default: 50) +- **Returns**: A `PluginResult` with continuation flag, modified payload, and violation/explanation — or `None` to continue unchanged + +### Concurrency Model + +Hooks use Python's `async`/`await` cooperative multitasking. Because Python's event loop only switches execution at `await` points, hook code won't be interrupted mid-logic. This means: + +- **Sequential when awaited**: Calling `await hook(...)` keeps control flow deterministic — the hook completes before the caller continues. +- **Race conditions only at `await` points**: Shared state is safe to read and write between `await` calls within a single hook. Races only arise if multiple hooks modify the same shared state and are dispatched concurrently. +- **No preemptive interruption**: Unlike threads, a hook handler runs uninterrupted until it yields control via `await`. + +### Execution Mode + +Hooks support three execution modes, configurable per-registration via the `mode` parameter on the `@hook` decorator: + +| Mode | Behavior | +|------|----------| +| **`enforce`** (default) | Awaited inline. If the hook returns `PluginResult(continue_processing=False)`, execution is blocked. Use for policy enforcement, budget controls, and authorization. | +| **`permissive`** | Awaited inline. Violations are logged but do not block execution. Use for monitoring, auditing, and gradual rollout of policies. | +| **`fire_and_forget`** | Dispatched via `asyncio.create_task()` and runs in the background. The `PluginResult` is ignored — cannot modify payloads or block execution. Use for logging, telemetry, and non-critical side effects where latency matters more than ordering guarantees. | + +Fire-and-forget hooks receive the payload snapshot as it existed at dispatch time; `enforce`/`permissive` hooks in the same chain that execute earlier (higher priority) can modify the payload before fire-and-forget hooks see it. Any exceptions in fire-and-forget hooks are logged but do not propagate. + +> **Note**: All three modes (`enforce`, `permissive`, `fire_and_forget`) are supported by the ContextForge Plugin Framework's `PluginMode` enum. The additional modes `enforce_ignore_error` and `disabled` remain available in the `PluginMode` enum and YAML configuration for deployment-time control, but are not exposed as `@hook` decorator values. They are deployment concerns, not definition-time concerns. + +### Plugin Framework + +The hook system is backed by a lightweight plugin framework built as a Mellea dependency (not a separate user-facing package). This framework: + +- Provides the `@hook` decorator for registering standalone async functions as hook handlers +- Provides the `@plugin` decorator for marking plain classes as multi-hook plugins +- Provides the `MelleaPlugin` base class for stateful plugins that need lifecycle hooks (`initialize`/`shutdown`) and typed context accessors +- Exposes `PluginSet` for grouping related hooks/plugins into composable, reusable units +- Exposes `register()` for global plugin registration and `block()` as a convenience for returning blocking `PluginResult`s +- Implements a plugin manager that loads, registers, and governs the execution of plugins +- Enforces per-hook-type payload policies via `HookPayloadPolicy`, accepting only writable-field changes from plugins + +The public API surface: + +```python +from mellea.plugins import hook, plugin, block, PluginSet, register, MelleaPlugin +``` + +### Global vs Session-Scoped Plugins + +Plugins can be registered at two scopes: + +- **Global**: Registered via `register()` at module or application startup. Global plugins fire for every hook invocation — both session-based (`m.instruct(...)`) and functional (`instruct(backend, context, ...)`). +- **Session-scoped**: Passed via the `plugins` parameter to `start_session()`. Session-scoped plugins fire only for hook invocations within that session. + +Both scopes coexist. When a hook fires within a session, both global plugins and that session's plugins execute, ordered by priority. When a hook fires via the functional API outside a session, only global plugins execute. + +**Implementation**: A single `PluginManager` instance manages all plugins. Plugins are tagged with an optional `session_id`. At dispatch time, the manager filters: global plugins (no session tag) always run; session-tagged plugins run only when the dispatch context matches their session ID. + +**Functional API support**: The functional API (`instruct(backend, context, ...)`) does not require a session. Hooks still fire at the same execution points. If global plugins are registered, they execute. If no plugins are registered, hooks are no-ops with zero overhead. + +### With-Block-Scoped Context Managers + +In addition to global and session-scoped registration, plugins can be activated for a specific block of code using the context manager protocol. Plugins are registered on entry and deregistered on exit, even if the block raises an exception. + +This is useful for: +- Feature flags: enable a plugin only during a specific operation +- Testing: activate a mock or spy plugin around a single call +- Middleware injection: wrap a third-party call with additional hooks without polluting global state +- Composing scopes: stack independent scopes that each clean up after themselves + +Four equivalent forms are supported: + +**1. `plugin_scope(*items)` factory** + +Accepts standalone `@hook` functions, `@plugin`-decorated instances, `PluginSet`s, or any mix: + +```python +from mellea.plugins import plugin_scope + +with plugin_scope(log_request, log_response): + result = m.instruct("Name the planets.") +# log_request and log_response are deregistered here + +with plugin_scope(pii_redactor, observability_set, enforce_budget): + result = m.instruct("Summarize the customer record.") +``` + +**2. `@plugin`-decorated class instance as context manager** + +Any instance of a `@plugin`-decorated class can be entered directly as a context manager: + +```python +guard = ContentGuard() +with guard: + result = m.instruct("What is the boiling point of water?") +# ContentGuard hooks are deregistered here +``` + +**3. `PluginSet` as context manager** + +A `PluginSet` can be entered directly, activating all of its contained hooks and plugins: + +```python +with observability: # observability is a PluginSet + result = m.instruct("What is the capital of France?") +# All observability hooks are deregistered here +``` + +**4. `MelleaPlugin` subclass instance as context manager** + +Any `MelleaPlugin` instance supports the same `with` syntax: + +```python +profiler = SlotProfiler() +with profiler: + result = m.instruct("Generate a report.") +``` + +**Async variants** + +All four forms also support `async with` for use in async code: + +```python +async with plugin_scope(log_request, ContentGuard()): + result = await m.ainstruct("Describe the solar system.") + +async with observability: + result = await m.ainstruct("What is the capital of France?") +``` + +**Nesting and mixing** + +Scopes stack cleanly, i.e., each exit deregisters only its own plugins. Nesting is independent of form: + +```python +with plugin_scope(log_request): # outer scope + with ContentGuard() as guard: # inner scope: @plugin instance + result = m.instruct("...") # log_request + ContentGuard active + result = m.instruct("...") # only log_request active +# no plugins active +``` + +**Cleanup guarantee** + +Plugins are always deregistered on scope exit, even if the block raises an exception. There is no resource leak on error. + +**Re-entrant restriction** + +The same instance cannot be active in two overlapping scopes simultaneously. Attempting to re-enter an already-active instance raises `RuntimeError`. To run the same plugin logic in parallel or in nested scopes, create separate instances: + +```python +guard1 = ContentGuard() +guard2 = ContentGuard() # separate instance + +with guard1: + with guard2: # OK — different instances + ... + +with guard1: + with guard1: # RuntimeError — same instance already active + ... +``` + +**Implementation note**: With-block scopes use the same `session_id` tagging mechanism as session-scoped plugins. Each `with` block gets a unique UUID scope ID; the plugin manager filters plugins by scope ID at dispatch time and deregisters them by scope ID on exit. This means with-block plugins coexist with global and session-scoped plugins: all three layers execute together, ordered by priority. + +### Hook Invocation Responsibilities + +Hooks are called from Mellea's base classes (`Component.aact()`, `Backend.generate()`, `SamplingStrategy.run()`, etc.). This means hook invocation is a framework-level concern, and authors of new backends, sampling strategies, or components do not need to manually insert hook calls. + +The calling convention is a single async call at each hook site: + +```python +result = await plugin_manager.invoke_hook(hook_type, payload, context) +``` + +The caller (the base class method) is responsible for both invoking the hook and processing the result. Processing means checking the result for one of three possible outcomes: + +1. **Continue with original payload**: — `PluginResult(continue_processing=True)` with no `modified_payload`. The caller proceeds unchanged. +2. **Continue with modified payload**: — `PluginResult(continue_processing=True, modified_payload=...)`. The plugin manager applies the hook's payload policy, accepting only changes to writable fields and discarding unauthorized modifications. The caller uses the policy-filtered payload in place of the original. +3. **Block execution** — `PluginResult(continue_processing=False, violation=...)`. The caller raises or returns early with structured error information. + +Hooks cannot redirect control flow, jump to arbitrary code, or alter the calling method's logic beyond these outcomes. This is enforced by the `PluginResult` type. + +### Payload Design Principles + +Hook payloads follow six design principles: + +1. **Strongly typed** — Each hook has a dedicated payload dataclass (not a generic dict). This enables IDE autocompletion, static analysis, and clear documentation of what each hook receives. +2. **Sufficient (maximize-at-boundary)** — Each payload includes everything available at that point in time. Post-hooks include the pre-hook fields plus results. This avoids forcing plugins to maintain their own state across pre/post pairs. +3. **Frozen (immutable)** — Payloads are frozen Pydantic models (`model_config = ConfigDict(frozen=True)`). Plugins cannot mutate payload attributes in place. To propose changes, plugins must call `payload.model_copy(update={...})` and return the copy via `PluginResult.modified_payload`. This ensures every modification is explicit and flows through the policy system. +4. **Policy-controlled** — Each hook type declares a `HookPayloadPolicy` specifying which fields are writable. The plugin manager applies the policy after each plugin returns, accepting only changes to writable fields and silently discarding unauthorized modifications. This separates "what the plugin can observe" from "what the plugin can change" — and enforces it at the framework level. See [Hook Payload Policies](#hook-payload-policies) for the full policy table. +5. **Serializable** — Payloads should be serializable for external (MCP-based) plugins that run out-of-process. All payload fields use types that can round-trip through JSON or similar formats. +6. **Versioned** — Payload schemas carry a `payload_version` so plugins can detect incompatible changes at registration time rather than at runtime. + +## 2. Common Payload Fields + +All hook payloads inherit these base fields: + +```python +class BasePayload(PluginPayload): + """Frozen base — all payloads are immutable by design.""" + model_config = ConfigDict(frozen=True, arbitrary_types_allowed=True) + + session_id: str | None = None # Session identifier (None for functional API calls) + request_id: str # Unique ID for this execution chain + timestamp: datetime # When the event fired + hook: str # Name of the hook (e.g., "generation_pre_call") + user_metadata: dict[str, Any] # Custom metadata carried by user code +``` + +## 3. Hook Summary Table + +| Prio | Hook Point | Category | Domain | Description | +|--|------------|----------|--------|-------------| +|3| `session_pre_init` | Session Lifecycle | Session | Before session initialization | +|3| `session_post_init` | Session Lifecycle | Session | After session is fully initialized | +|3| `session_reset` | Session Lifecycle | Session | When session context is reset | +|3| `session_cleanup` | Session Lifecycle | Session | Before session cleanup/teardown | +|7| `component_pre_create` | Component Lifecycle | Component / (Backend, Context) | Before component creation | +|7| `component_post_create` | Component Lifecycle | Component / (Backend, Context) | After component created, before execution | +|7| `component_pre_execute` | Component Lifecycle | Component / (Backend, Context) | Before component execution via `aact()` | +|7| `component_post_success` | Component Lifecycle | Component / (Backend, Context) | After successful component execution | +|7| `component_post_error` | Component Lifecycle | Component / (Backend, Context) | After component execution fails | +|1| `generation_pre_call` | Generation Pipeline | (Backend, Context) | Before LLM backend call | +|1| `generation_post_call` | Generation Pipeline | (Backend, Context) | After LLM response received | +|1| `generation_stream_chunk` | Generation Pipeline | (Backend, Context) | For each streaming chunk | +|1| `validation_pre_check` | Validation | (Backend, Context) | Before requirement validation | +|1| `validation_post_check` | Validation | (Backend, Context) | After validation completes | +|3| `sampling_loop_start` | Sampling Pipeline | (Backend, Context) | When sampling strategy begins | +|3| `sampling_iteration` | Sampling Pipeline | (Backend, Context) | After each sampling attempt | +|3| `sampling_repair` | Sampling Pipeline | (Backend, Context) | When repair is invoked | +|3| `sampling_loop_end` | Sampling Pipeline | (Backend, Context) | When sampling completes | +|1| `tool_pre_invoke` | Tool Execution | (Backend, Context) | Before tool/function invocation | +|1| `tool_post_invoke` | Tool Execution | (Backend, Context) | After tool execution | +|5| `adapter_pre_load` | Backend Adapter Ops | Backend | Before `backend.load_adapter()` | +|5| `adapter_post_load` | Backend Adapter Ops | Backend | After adapter loaded | +|5| `adapter_pre_unload` | Backend Adapter Ops | Backend | Before `backend.unload_adapter()` | +|5| `adapter_post_unload` | Backend Adapter Ops | Backend | After adapter unloaded | +|???| `context_update` | Context Operations | Context | When context changes | +|???| `context_prune` | Context Operations | Context | When context is trimmed | +|???| `error_occurred` | Error Handling | Cross-cutting | When an unrecoverable error occurs | + +## 3b. Hook Payload Policies + +Each hook type declares a `HookPayloadPolicy` that specifies which payload fields plugins are allowed to modify. The plugin manager enforces these policies after each plugin returns: only changes to writable fields are accepted; all other modifications are silently discarded. + +Hooks not listed in the policy table are **observe-only** — plugins can read the payload but cannot modify any fields. + +### Policy Types + +```python +from dataclasses import dataclass +from enum import Enum + +class DefaultHookPolicy(str, Enum): + """Controls behavior for hooks without an explicit policy.""" + ALLOW = "allow" # Accept all modifications (backwards-compatible) + DENY = "deny" # Reject all modifications (strict mode, default for Mellea) + +@dataclass(frozen=True) +class HookPayloadPolicy: + """Defines which payload fields plugins may modify.""" + writable_fields: frozenset[str] +``` + +### Policy Enforcement + +When a plugin returns `PluginResult(modified_payload=...)`, the plugin manager applies `apply_policy()`: + +```python +def apply_policy( + original: BaseModel, + modified: BaseModel, + policy: HookPayloadPolicy, +) -> BaseModel | None: + """Accept only changes to writable fields; discard all others. + + Returns an updated payload via model_copy(update=...), or None + if the plugin made no effective (allowed) changes. + """ + updates: dict[str, Any] = {} + for field in policy.writable_fields: + old_val = getattr(original, field, _SENTINEL) + new_val = getattr(modified, field, _SENTINEL) + if new_val is not _SENTINEL and new_val != old_val: + updates[field] = new_val + return original.model_copy(update=updates) if updates else None +``` + +### Policy Table + +| Hook Point | Writable Fields | +|------------|----------------| +| **Session Lifecycle** | | +| `session_pre_init` | `backend_name`, `model_id`, `model_options`, `backend_kwargs` | +| `session_post_init` | *(observe-only)* | +| `session_reset` | *(observe-only)* | +| `session_cleanup` | *(observe-only)* | +| **Component Lifecycle** | | +| `component_pre_create` | `description`, `images`, `requirements`, `icl_examples`, `grounding_context`, `user_variables`, `prefix`, `template_id` | +| `component_post_create` | `component` | +| `component_pre_execute` | `action`, `context`, `context_view`, `requirements`, `model_options`, `format`, `strategy`, `tool_calls_enabled` | +| `component_post_success` | `result` | +| `component_post_error` | *(observe-only)* | +| **Generation Pipeline** | | +| `generation_pre_call` | `model_options`, `tools`, `format`, `formatted_prompt` | +| `generation_post_call` | `processed_output`, `model_output` | +| `generation_stream_chunk` | `chunk`, `accumulated` | +| **Validation** | | +| `validation_pre_check` | `requirements`, `model_options` | +| `validation_post_check` | `results`, `all_passed` | +| **Sampling Pipeline** | | +| `sampling_loop_start` | `loop_budget` | +| `sampling_iteration` | *(observe-only)* | +| `sampling_repair` | `repair_action`, `repair_context` | +| `sampling_loop_end` | `final_result` | +| **Tool Execution** | | +| `tool_pre_invoke` | `tool_args` | +| `tool_post_invoke` | `tool_output` | +| **Backend Adapter Ops** | | +| `adapter_pre_load` | *(observe-only)* | +| `adapter_post_load` | *(observe-only)* | +| `adapter_pre_unload` | *(observe-only)* | +| `adapter_post_unload` | *(observe-only)* | +| **Context Operations** | | +| `context_update` | *(observe-only)* | +| `context_prune` | *(observe-only)* | +| **Error Handling** | | +| `error_occurred` | *(observe-only)* | + +### Default Policy + +Mellea uses `DefaultHookPolicy.DENY` as the default for hooks without an explicit policy. This means: + +- **Hooks with an explicit policy**: Only writable fields are accepted; other changes are discarded. +- **Hooks without a policy** (observe-only): All modifications are rejected with a warning log. +- **Custom hooks**: Custom hooks registered by users default to `DENY`. To allow modifications, pass a `HookPayloadPolicy` when registering the custom hook type. + +### Modification Pattern + +Because payloads are frozen, plugins must use `model_copy(update={...})` to create a modified copy: + +```python +@hook("generation_pre_call", mode="enforce", priority=10) +async def enforce_budget(payload, ctx): + if (payload.estimated_tokens or 0) > 4000: + return block("Token budget exceeded") + + # Modify a writable field — use model_copy, not direct assignment + modified = payload.model_copy(update={"model_options": {**payload.model_options, "max_tokens": 4000}}) + return PluginResult(continue_processing=True, modified_payload=modified) +``` + +Attempting to set attributes directly (e.g., `payload.model_options = {...}`) raises a `FrozenModelError`. + +### Chaining + +When multiple plugins modify the same hook's payload, modifications are chained: + +1. Plugin A receives the original payload, returns a modified copy. +2. The policy filters Plugin A's changes to writable fields only. +3. Plugin B receives the policy-filtered result from Plugin A. +4. The policy filters Plugin B's changes. +5. The final policy-filtered payload is returned to the caller. + +This ensures each plugin sees the cumulative effect of prior plugins, and all modifications pass through the policy filter. + +## 4. Hook Definitions + +### A. Session Lifecycle Hooks + +Hooks that manage session boundaries, useful for initialization, state setup, and resource cleanup. + +#### `session_pre_init` + +- **Trigger**: Called immediately when `mellea.start_session()` is invoked, before backend initialization. +- **Use Cases**: + - Loading user-specific policies + - Validating backend/model combinations + - Enforcing model usage policies + - Routing to alternative backends +- **Payload**: + ```python + class SessionPreInitPayload(BasePayload): + backend_name: str # Requested backend identifier + model_id: str | ModelIdentifier # Target model + model_options: dict | None # Generation parameters + backend_kwargs: dict # Additional backend configuration + context_type: type[Context] # Context class to use + ``` +- **Context**: + - `environment`: dict - Environment variables snapshot + - `cwd`: str - Current working directory + + +#### `session_post_init` + +- **Trigger**: Called after session is fully initialized, before any operations. +- **Use Cases**: + - Initializing plugin-specific session state + - Setting up telemetry/observability + - Registering session-scoped resources + - Remote logging setup +- **Payload**: + ```python + class SessionPostInitPayload(BasePayload): + backend: Backend # Initialized backend instance + context: Context # Initial context + logger: FancyLogger # Session logger + ``` +- **Context**: + - `backend_info`: dict - Backend capabilities and metadata + - `model_info`: dict - Model details (context window, etc.) + + +#### `session_reset` + +- **Trigger**: Called when `session.reset()` is invoked to clear context. +- **Use Cases**: + - Resetting plugin state + - Logging context transitions + - Preserving audit trails before reset +- **Payload**: + ```python + class SessionResetPayload(BasePayload): + previous_context: Context # Context before reset + new_context: Context # Fresh context after reset + ``` +- **Context**: + - `session`: MelleaSession + - `reset_reason`: str | None - Optional reason for reset + + +#### `session_cleanup` + +- **Trigger**: Called when `session.close()`, `cleanup()`, or context manager exit occurs. +- **Use Cases**: + - Flushing telemetry buffers + - Persisting audit trails + - Aggregating session metrics + - Cleaning up temporary resources +- **Payload**: + ```python + class SessionCleanupPayload(BasePayload): + context: Context # Final context state + total_generations: int # Count of generations performed + total_tokens_used: int | None # Aggregate token usage + interaction_count: int # Total number of turns + ``` +- **Context**: + - `generate_logs`: list[GenerateLog] - All logs from session + - `duration_ms`: int - Session duration + - `session`: MelleaSession - Final session state + + +### B. Component Lifecycle Hooks + +Hooks around Component creation and execution. All Mellea primitives — Instruction, Message, Query, Transform, GenerativeSlot — are Components. These hooks cover the full Component lifecycle; there are no separate hooks per component type. + +All component payloads include a `component_type: str` field (e.g., `"Instruction"`, `"Message"`, `"GenerativeSlot"`, `"Query"`, `"Transform"`) so plugins can filter by type. For example, a plugin targeting only generative slots would check `component_type == "GenerativeSlot"`. + +Not all `ComponentPreCreatePayload` fields are populated for every component type. The table below shows which fields are available per type (`✓` = populated, `—` = `None` or empty): + +| Field | Instruction | Message | Query | Transform | GenerativeSlot | +|-------|:-----------:|:-------:|:-----:|:---------:|:--------------:| +| `description` | ✓ | ✓ | ✓ | ✓ | ✓ | +| `images` | ✓ | ✓ | — | — | ✓ | +| `requirements` | ✓ | — | — | — | ✓ | +| `icl_examples` | ✓ | — | — | — | ✓ | +| `grounding_context` | ✓ | — | — | — | ✓ | +| `user_variables` | ✓ | — | — | — | ✓ | +| `prefix` | ✓ | — | — | — | ✓ | +| `template_id` | ✓ | — | — | — | ✓ | + +Plugins should check for `None`/empty values rather than assuming all fields are present for all component types. + + +#### `component_pre_create` + +- **Trigger**: Called when `instruct()`, `chat()`, or a generative slot is invoked, before the prompt is constructed. +- **Use Cases**: + - PII redaction on user input + - Prompt injection detection + - Input validation and sanitization + - Injecting mandatory requirements + - Enforcing content policies +- **Payload**: + ```python + class ComponentPreCreatePayload(BasePayload): + component_type: str # "Instruction", "GenerativeSlot", etc. + description: str # Main instruction text + images: list[ImageBlock] | None # Attached images + requirements: list[Requirement | str] # Validation requirements + icl_examples: list[str | CBlock] # In-context learning examples + grounding_context: dict[str, str] # Grounding variables + user_variables: dict[str, str] | None # Template variables + prefix: str | CBlock | None # Output prefix + template_id: str | None # Identifier of prompt template + ``` +- **Context**: + - `backend`: Backend + - `context`: Context - Context the component will be added to + - `history_snapshot`: ContextSnapshot - Conversation history + + +#### `component_post_create` + +- **Trigger**: After component is created and formatted, before backend call. +- **Use Cases**: + - Appending system prompts + - Context stuffing (RAG injection) + - Logging component patterns + - Validating final component structure +- **Payload**: + ```python + class ComponentPostCreatePayload(BasePayload): + component_type: str # "Instruction", "GenerativeSlot", etc. + component: Component # The created component + template_repr: TemplateRepresentation # Formatted representation + ``` +- **Context**: + - `backend`: Backend + - `context`: Context + + +#### `component_pre_execute` + +- **Trigger**: Before any component is executed via `aact()`. +- **Use Cases**: + - Policy enforcement on generation requests + - Injecting/modifying model options + - Routing to different strategies + - Authorization checks + - Logging execution patterns +- **Payload**: + ```python + class ComponentPreExecutePayload(BasePayload): + component_type: str # "Instruction", "GenerativeSlot", etc. + action: Component | CBlock # The component to execute + context: Context # Current context + context_view: list[Component | CBlock] | None # Linearized context + requirements: list[Requirement] # Attached requirements + model_options: dict # Generation parameters + format: type | None # Structured output format + strategy: SamplingStrategy | None # Sampling strategy + tool_calls_enabled: bool # Whether tools are available + ``` +- **Context**: + - `backend`: Backend + - `context`: Context + + +#### `component_post_success` + +- **Trigger**: After component execution completes successfully. +- **Use Cases**: + - Logging generation results + - Output validation (hallucination check) + - PII scrubbing from response + - Applying output transformations + - Audit logging + - Collecting metrics +- **Payload**: + ```python + class ComponentPostSuccessPayload(BasePayload): + component_type: str # "Instruction", "GenerativeSlot", etc. + action: Component | CBlock # Executed component + result: ModelOutputThunk # Generation result + context_before: Context # Context before execution + context_after: Context # Context after execution + generate_log: GenerateLog # Detailed execution log + sampling_results: list[SamplingResult] | None # If sampling was used + latency_ms: int # Execution time + ``` +- **Context**: + - `backend`: Backend + - `context`: Context + - `token_usage`: dict | None + - `original_input`: dict - Input that triggered generation + +> **Design Decision: Separate Success/Error Hooks** +> +> `component_post_success` and `component_post_error` are separate hooks rather than a single `component_post` with a sum type over success/failure. The reasons are: +> +> 1. **Registration granularity** — Plugins subscribe to only what they need. An audit logger may only care about errors; a metrics collector may only care about successes. +> 2. **Distinct payload shapes** — Success payloads carry `result`, `generate_log`, and `sampling_results`; error payloads carry `exception`, `error_type`, and `stack_trace`. A sum type would force nullable fields or tagged unions, adding complexity for every consumer. +> 3. **Different execution modes** — Error hooks may be fire-and-forget (for alerting); success hooks may be blocking (for output transformation). Separate hooks allow per-hook execution timing configuration. + + +#### `component_post_error` + +- **Trigger**: When component execution fails with an exception. +- **Use Cases**: + - Error logging and alerting + - Custom error recovery + - Retry logic + - Graceful degradation +- **Payload**: + ```python + class ComponentPostErrorPayload(BasePayload): + component_type: str # "Instruction", "GenerativeSlot", etc. + action: Component | CBlock # Component that failed + error: Exception # The exception raised + error_type: str # Exception class name + stack_trace: str # Full stack trace + context: Context # Context at time of error + model_options: dict # Options used + ``` +- **Context**: + - `backend`: Backend + - `context`: Context + - `recoverable`: bool - Can execution continue + + +### C. Generation Pipeline Hooks + +Low-level hooks between the component abstraction and raw LLM API calls. These operate on the (Backend, Context) tuple — they do not require a session. + +> **Context Modification Sequencing** +> +> Modifications to `Context` at `component_pre_execute` are reflected in the subsequent `generation_pre_call`, because context linearization happens after the component-level hook. Modifications to `Context` after `generation_pre_call` (e.g., in `generation_post_call`) do not affect the current generation — the prompt has already been sent. This ordering is by design: `component_pre_execute` is the last point where context changes influence what the LLM sees. + + +#### `generation_pre_call` + +- **Trigger**: Just before the backend transmits data to the LLM API. +- **Use Cases**: + - Tool selection filtering and requirements + - Prompt injection detection + - Content filtering + - Token budget enforcement + - Cost estimation + - Prompt caching/deduplication + - Rate limiting + - Last-mile formatting +- **Payload**: + ```python + class GenerationPreCallPayload(BasePayload): + action: Component | CBlock # Source action + context: Context # Current context + linearized_context: list[Component | CBlock] # Context as list + formatted_prompt: str | list[dict] # Final prompt to send + model_options: dict[str, Any] # Generation parameters + tools: dict[str, Callable] | None # Available tools + format: type | None # Structured output format + estimated_tokens: int | None # Token estimate + ``` +- **Context**: + - `backend`: Backend + - `context`: Context + - `backend_name`: str + - `model_id`: str + - `provider`: str - Provider name (e.g., "ibm/granite") + + +#### `generation_post_call` + +- **Trigger**: Immediately after receiving the raw response from the LLM API, before parsing. +- **Use Cases**: + - Output filtering/sanitization + - PII detection and redaction + - Response caching + - Quality metrics collection + - Hallucination detection + - Raw trace logging + - Error interception (API limits/retries) +- **Payload**: + ```python + class GenerationPostCallPayload(BasePayload): + prompt: str | list[dict] # Sent prompt + raw_response: dict # Full JSON response from provider + processed_output: str # Processed output text + model_output: ModelOutputThunk # Output thunk + token_usage: dict | None # Token counts + latency_ms: int # Generation time + finish_reason: str # Why generation stopped + ``` +- **Context**: + - `backend`: Backend + - `context`: Context + - `backend_name`: str + - `model_id`: str + - `status_code`: int | None - HTTP status from provider + - `stream_chunks`: int | None - Number of chunks if streaming + + +#### `generation_stream_chunk` + +- **Trigger**: For each streaming chunk received from the LLM. +- **Use Cases**: + - Real-time content filtering + - Progressive output display + - Early termination on policy violation + - Streaming analytics +- **Payload**: + ```python + class GenerationStreamChunkPayload(BasePayload): + chunk: str # Current chunk text + accumulated: str # All text so far + chunk_index: int # Chunk sequence number + is_final: bool # Is this the last chunk + ``` +- **Context**: + - `thunk_id`: str + - `backend`: Backend + - `context`: Context + - `backend_name`: str + - `model_id`: str + + +### D. Validation Hooks + +Hooks around requirement verification and output validation. These operate on the (Backend, Context) tuple. + + +#### `validation_pre_check` + +- **Trigger**: Before running validation/requirements check. +- **Use Cases**: + - Injecting additional requirements + - Filtering requirements based on context + - Overriding validation strategy + - Custom validation logic +- **Payload**: + ```python + class ValidationPreCheckPayload(BasePayload): + requirements: list[Requirement] # Requirements to check + target: CBlock | None # Target to validate + context: Context # Current context + model_options: dict # Options for LLM-as-judge + ``` +- **Context**: + - `backend`: Backend + - `context`: Context + - `validation_type`: str - "python" | "llm_as_judge" + + +#### `validation_post_check` + +- **Trigger**: After all validations complete. +- **Use Cases**: + - Logging validation outcomes + - Triggering alerts on failures + - Collecting requirement effectiveness metrics + - Overriding validation results + - Monitoring sampling attempts +- **Payload**: + ```python + class ValidationPostCheckPayload(BasePayload): + requirements: list[Requirement] + results: list[ValidationResult] + all_passed: bool + passed_count: int + failed_count: int + generate_logs: list[GenerateLog | None] # Logs from LLM-as-judge + ``` +- **Context**: + - `backend`: Backend + - `context`: Context + - `validation_duration_ms`: int + + +### E. Sampling & Repair Hooks + +Hooks around sampling strategies and failure recovery. These operate on the (Backend, Context) tuple — sampling strategies take explicit `(action, context, backend)` arguments and do not require a session. + + +#### `sampling_loop_start` + +- **Trigger**: When a sampling strategy begins execution. +- **Use Cases**: + - Logging sampling attempts + - Adjusting loop budget dynamically + - Initializing sampling-specific state +- **Payload**: + ```python + class SamplingLoopStartPayload(BasePayload): + strategy_name: str # Strategy class name + action: Component # Initial action + context: Context # Initial context + requirements: list[Requirement] # All requirements + loop_budget: int # Maximum iterations + ``` +- **Context**: + - `backend`: Backend + - `context`: Context + - `strategy_name`: str + - `strategy_config`: dict + + +#### `sampling_iteration` + +- **Trigger**: After each sampling attempt, including validation results. +- **Use Cases**: + - Iteration-level metrics + - Early termination decisions + - Debug sampling behavior + - Adaptive strategy adjustment +- **Payload**: + ```python + class SamplingIterationPayload(BasePayload): + iteration: int # Current iteration number + action: Component # Action used this iteration + result: ModelOutputThunk # Generation result + validation_results: list[tuple[Requirement, ValidationResult]] + all_valid: bool # Did all requirements pass + valid_count: int + total_count: int + ``` +- **Context**: + - `backend`: Backend + - `context`: Context + - `strategy_name`: str + - `remaining_budget`: int + - `elapsed_ms`: int + + +#### `sampling_repair` + +- **Trigger**: When a repair strategy is invoked after validation failure. Behavior varies by sampling strategy. +- **Strategy-Specific Behavior**: + - **RejectionSamplingStrategy**: Identity retry — same action, original context. No actual repair; simply regenerates. (`repair_type: "identity"`) + - **RepairTemplateStrategy**: Appends failure descriptions via `copy_and_repair()`, producing a modified context that includes what went wrong. (`repair_type: "template_repair"`) + - **MultiTurnStrategy**: Adds a Message describing failures to the conversation context, treating repair as a new conversational turn. (`repair_type: "multi_turn_message"`) + - **SOFAISamplingStrategy**: Two-solver approach with targeted feedback between attempts. (`repair_type: "sofai_feedback"`) +- **Use Cases**: + - Logging repair patterns + - Injecting custom repair strategies + - Analyzing failure modes + - Adjusting repair approach +- **Payload**: + ```python + class SamplingRepairPayload(BasePayload): + repair_type: str # "identity" | "template_repair" | "multi_turn_message" | "sofai_feedback" | "custom" + failed_action: Component # Action that failed + failed_result: ModelOutputThunk # Failed output + failed_validations: list[tuple[Requirement, ValidationResult]] + old_context: Context # Context without failure + new_context: Context # Context with failure + repair_action: Component # New action for retry + repair_context: Context # Context for retry + repair_iteration: int # Which repair attempt + ``` +- **Context**: + - `backend`: Backend + - `context`: Context + - `strategy_name`: str + - `past_failures`: list[str] + + +#### `sampling_loop_end` + +- **Trigger**: When sampling completes (success or failure). +- **Use Cases**: + - Sampling effectiveness metrics + - Failure analysis + - Cost tracking + - Selecting best failed attempt +- **Payload**: + ```python + class SamplingLoopEndPayload(BasePayload): + success: bool # Did sampling succeed + iterations_used: int # Total iterations performed + final_result: ModelOutputThunk | None # Best result + final_action: Component | None + final_context: Context | None + failure_reason: str | None # If failed, why + all_results: list[ModelOutputThunk] + all_validations: list[list[tuple[Requirement, ValidationResult]]] + ``` +- **Context**: + - `backend`: Backend + - `context`: Context + - `strategy_name`: str + - `total_duration_ms`: int + - `tokens_used`: int | None + + +### F. Tool Calling Hooks + +Hooks around tool/function execution. These operate on the (Backend, Context) tuple. + + +#### `tool_pre_invoke` + +- **Trigger**: Before invoking a tool/function from LLM output. +- **Use Cases**: + - Tool authorization + - Argument validation/sanitization + - Tool routing/redirection + - Rate limiting per tool +- **Payload**: + ```python + class ToolPreInvokePayload(BasePayload): + tool_name: str # Name of tool to call + tool_args: dict[str, Any] # Arguments to pass + tool_callable: Callable # The actual function + model_tool_call: ModelToolCall # Raw model output + ``` +- **Context**: + - `backend`: Backend + - `context`: Context + - `available_tools`: list[str] + - `invocation_source`: str - "transform" | "action" | etc. + + +#### `tool_post_invoke` + +- **Trigger**: After tool execution completes. +- **Use Cases**: + - Output transformation + - Error handling/recovery + - Tool usage metrics + - Result caching +- **Payload**: + ```python + class ToolPostInvokePayload(BasePayload): + tool_name: str + tool_args: dict[str, Any] + tool_output: Any # Raw tool output + tool_message: ToolMessage # Formatted message + execution_time_ms: int + success: bool # Did tool execute without error + error: Exception | None # Error if any + ``` +- **Context**: + - `backend`: Backend + - `context`: Context + - `invocation_source`: str + + +### G. Backend Adapter Operations + +Hooks around LoRA/aLoRA adapter loading and unloading on backends. Based on the `AdapterMixin` protocol in `mellea/backends/adapters/adapter.py`. + +> **Future Work: Backend Switching** +> +> These hooks cover adapter load/unload on a single backend. Hooks for switching the entire backend on a session (e.g., from Ollama to OpenAI mid-session) are a potential future extension and are distinct from adapter management. + + +#### `adapter_pre_load` + +- **Trigger**: Before `backend.load_adapter()` is called. +- **Use Cases**: + - Validating adapter compatibility + - Enforcing adapter usage policies + - Logging adapter load attempts +- **Payload**: + ```python + class AdapterPreLoadPayload(BasePayload): + adapter_name: str # Name/path of adapter + adapter_config: dict # Adapter configuration + backend_name: str # Backend being adapted + ``` +- **Context**: + - `backend`: Backend + + +#### `adapter_post_load` + +- **Trigger**: After adapter has been successfully loaded. +- **Use Cases**: + - Confirming adapter activation + - Updating metrics/state + - Triggering downstream reconfiguration +- **Payload**: + ```python + class AdapterPostLoadPayload(BasePayload): + adapter_name: str + adapter_config: dict + backend_name: str + load_duration_ms: int # Time to load adapter + ``` +- **Context**: + - `backend`: Backend + + +#### `adapter_pre_unload` + +- **Trigger**: Before `backend.unload_adapter()` is called. +- **Use Cases**: + - Flushing adapter-specific state + - Logging adapter lifecycle + - Preventing unload during active generation +- **Payload**: + ```python + class AdapterPreUnloadPayload(BasePayload): + adapter_name: str + backend_name: str + ``` +- **Context**: + - `backend`: Backend + + +#### `adapter_post_unload` + +- **Trigger**: After adapter has been unloaded. +- **Use Cases**: + - Confirming adapter deactivation + - Cleaning up adapter-specific resources + - Updating metrics +- **Payload**: + ```python + class AdapterPostUnloadPayload(BasePayload): + adapter_name: str + backend_name: str + unload_duration_ms: int # Time to unload adapter + ``` +- **Context**: + - `backend`: Backend + + +### H. Context Operations Hooks + +Hooks around context changes and management. These operate on the Context directly. + + +#### `context_update` + +- **Trigger**: When a component or CBlock is explicitly appended to a session's context (e.g., after a successful generation or a user-initiated addition). Does not fire on internal framework reads or context linearization. +- **Use Cases**: + - Context audit trail + - Memory management policies + - Sensitive data detection + - Token usage monitoring +- **Payload**: + ```python + class ContextUpdatePayload(BasePayload): + previous_context: Context # Context before change + new_data: Component | CBlock # Data being added + resulting_context: Context # Context after change + context_type: str # "simple" | "chat" + change_type: str # "append" | "reset" + ``` +- **Context**: + - `context`: Context + - `history_length`: int + + +#### `context_prune` + +- **Trigger**: When `view_for_generation` is called and context exceeds token limits, or when a dedicated prune API is invoked. This is the point where context is linearized and token budget enforcement becomes relevant. +- **Use Cases**: + - Token budget management + - Recording pruning events + - Custom pruning strategies + - Archiving pruned content +- **Payload**: + ```python + class ContextPrunePayload(BasePayload): + context_before: Context # Context before pruning + context_after: Context # Context after pruning + pruned_items: list[Component | CBlock] # Items removed + reason: str # Why pruning occurred + tokens_freed: int | None # Token estimate freed + ``` +- **Context**: + - `context`: Context + - `token_limit`: int | None + + +### I. Error Handling Hooks + +Cross-cutting hook for error conditions. + + +#### `error_occurred` + +- **Trigger**: When an unrecoverable error occurs during any operation. +- **Fires for**: + - `ComponentParseError` — structured output parsing failures + - Backend communication errors — connection failures, API errors, timeouts + - Assertion violations — internal invariant failures + - Any unhandled `Exception` during component execution, validation, or tool invocation +- **Does NOT fire for**: + - Validation failures within sampling loops — these are handled by `sampling_iteration` and `sampling_repair` + - Controlled plugin violations via `PluginResult(continue_processing=False)` — these are policy decisions, not errors +- **Use Cases**: + - Error logging/alerting + - Custom error recovery + - Error metrics + - Graceful degradation + - Notification systems +- **Payload**: + ```python + class ErrorOccurredPayload(BasePayload): + error: Exception # The exception + error_type: str # Exception class name + error_location: str # Where error occurred + recoverable: bool # Can execution continue + context: Context | None # Context at time of error + action: Component | None # Action being performed + stack_trace: str # Full stack trace + ``` +- **Context**: + - `session`: MelleaSession | None + - `backend`: Backend | None + - `context`: Context | None + - `operation`: str - What operation was being performed + + +## 5. PluginContext by Domain + +The `PluginContext` passed to hooks varies by domain, providing only the references relevant to that category: + +### Session Hooks + +```python +# session_* hooks +session: MelleaSession +session_id: str + +# Environment +environment: dict[str, str] +cwd: str + +# Plugin State +shared_state: dict[str, Any] # Shared across plugins + +# Utilities +logger: Logger +metrics: MetricsCollector + +# Request Metadata +request_id: str +parent_request_id: str | None +timestamp: datetime + +# User Information +user_id: str | None +user_metadata: dict[str, Any] +``` + +### Component, Generation, Validation, Sampling, and Tool Hooks + +```python +# component_*, generation_*, validation_*, sampling_*, tool_* hooks +backend: Backend +context: Context + +# Backend Information (generation hooks) +backend_name: str # Available on generation_* hooks +model_id: str # Available on generation_* hooks + +# Strategy Information (sampling hooks) +strategy_name: str # Available on sampling_* hooks + +# Plugin State +shared_state: dict[str, Any] + +# Utilities +logger: Logger +metrics: MetricsCollector + +# Request Metadata +request_id: str +parent_request_id: str | None +timestamp: datetime + +# User Information +user_id: str | None +user_metadata: dict[str, Any] +``` + +### Adapter Hooks + +```python +# adapter_* hooks +backend: Backend + +# Plugin State +shared_state: dict[str, Any] + +# Utilities +logger: Logger +metrics: MetricsCollector + +# Request Metadata +request_id: str +timestamp: datetime +``` + +### Context Hooks + +```python +# context_* hooks +context: Context + +# Plugin State +shared_state: dict[str, Any] + +# Utilities +logger: Logger +metrics: MetricsCollector + +# Request Metadata +request_id: str +timestamp: datetime +``` + +### Error Hook + +```python +# error_occurred +session: MelleaSession | None +backend: Backend | None +context: Context | None + +# Plugin State +shared_state: dict[str, Any] + +# Utilities +logger: Logger +metrics: MetricsCollector + +# Request Metadata +request_id: str +timestamp: datetime +operation: str +``` + +### Context Snapshot + +When conversation history is relevant, the plugin context object may include a `ContextSnapshot`: + +```python +@dataclass +class ContextSnapshot: + history_length: int # Number of turns + last_turn: dict | None # Last user/assistant exchange + token_estimate: int | None # Estimated token count +``` + +## 6. Hook Results + +Hooks can return different result types to control execution: + +1. **Continue (no-op)** — `PluginResult(continue_processing=True)` with no `modified_payload`. Execution proceeds with the original payload unchanged. +2. **Continue with modification** — `PluginResult(continue_processing=True, modified_payload=...)`. The plugin manager applies the hook's `HookPayloadPolicy`, accepting only changes to writable fields. Execution proceeds with the policy-filtered payload. +3. **Block execution** — `PluginResult(continue_processing=False, violation=...)`. Execution halts with structured error information via `PluginViolation`. + +These three outcomes are exhaustive. Hooks cannot redirect control flow, throw arbitrary exceptions, or alter the calling method's logic beyond these outcomes. This is enforced by the `PluginResult` type — there is no escape hatch. The `violation` field provides structured error information but does not influence which code path runs next. + +Because payloads are frozen, the `modified_payload` in option 2 must be a new object created via `payload.model_copy(update={...})` — not a mutated version of the original. + +### Modify Payload + +```python +# Create an immutable copy with only the desired changes +modified = payload.model_copy(update={"model_options": new_options}) +return PluginResult( + continue_processing=True, + modified_payload=modified, +) +``` + +> **Note**: Only changes to fields listed in the hook's `HookPayloadPolicy.writable_fields` will be accepted. Changes to other fields are silently discarded by the policy enforcement layer. + +### Block Execution + +```python +violation = PluginViolation( + reason="Policy violation", + description="Detailed explanation", + code="POLICY_001", + details={"field": "value"}, + severity="error" # "error" | "warning" +) + +return PluginResult( + continue_processing=False, + violation=violation +) +``` + +## 7. Registration & Configuration + +### Public API + +All plugin registration APIs are available from `mellea.plugins`: + +```python +from mellea.plugins import hook, plugin, block, PluginSet, register, MelleaPlugin +``` + +### Standalone Function Hooks + +The simplest way to define a hook handler is with the `@hook` decorator on a plain async function: + +```python +from mellea.plugins import hook, block + +@hook("generation_pre_call", mode="enforce", priority=10) +async def enforce_budget(payload, ctx): + if (payload.estimated_tokens or 0) > 4000: + return block("Token budget exceeded") + +@hook("component_post_success", mode="fire_and_forget") +async def log_result(payload, ctx): + print(f"[{payload.component_type}] {payload.latency_ms}ms") +``` + +**Parameters**: +- `hook_type: str` — the hook point name (required, first positional argument) +- `mode: str` — `"enforce"` (default), `"permissive"`, or `"fire_and_forget"` +- `priority: int` — lower numbers execute first (default: 50) + +The `block()` helper is shorthand for returning `PluginResult(continue_processing=False, violation=PluginViolation(reason=...))`. It accepts an optional `code`, `description`, and `details` for structured violation information. + +### Class-Based Plugins + +For plugins that need shared state across multiple hooks, use the `@plugin` decorator on a class or subclass `MelleaPlugin`: + +**`@plugin` decorator** — marks a plain class as a multi-hook plugin: + +```python +from mellea.plugins import plugin, hook + +@plugin("pii-redactor", priority=5) +class PIIRedactor: + def __init__(self, patterns: list[str] | None = None): + self.patterns = patterns or [] + + @hook("component_pre_create") + async def redact_input(self, payload, ctx): + ... + + @hook("generation_post_call") + async def redact_output(self, payload, ctx): + ... +``` + +The `@plugin` decorator accepts: +- `name: str` — plugin name (required, first positional argument) +- `priority: int` — default priority for all hooks in this plugin (default: 50). Individual `@hook` decorators on methods can override. + +**`MelleaPlugin` subclass** — for plugins that need lifecycle hooks (`initialize`/`shutdown`) or typed context accessors: + +```python +from mellea.plugins import MelleaPlugin, hook + +class MetricsPlugin(MelleaPlugin): + def __init__(self, endpoint: str): + super().__init__() + self.endpoint = endpoint + self._buffer = [] + + async def initialize(self): + self._client = await connect(self.endpoint) + + async def shutdown(self): + await self._client.flush(self._buffer) + await self._client.close() + + @hook("component_post_success") + async def collect(self, payload, ctx): + backend = self.get_backend(ctx) # typed accessor + self._buffer.append({"latency": payload.latency_ms}) +``` + +Convention-based registration (methods named `on_`) remains supported for `MelleaPlugin` subclasses. + +### Composing Plugins with PluginSet + +`PluginSet` groups related hooks and plugins for reuse across sessions: + +```python +from mellea.plugins import PluginSet + +security = PluginSet("security", [ + enforce_budget, + PIIRedactor(patterns=[r"\d{3}-\d{2}-\d{4}"]), +]) + +observability = PluginSet("observability", [ + log_result, + MetricsPlugin(endpoint="https://..."), +]) +``` + +`PluginSet` accepts standalone hook functions, `@plugin`-decorated class instances, and `MelleaPlugin` instances. PluginSets can be nested. + +### Global Registration + +Register plugins globally so they fire for all hook invocations — both session-based and functional API: + +```python +from mellea.plugins import register + +register(security) # single item +register([security, observability]) # list +register(enforce_budget) # standalone function +``` + +`register()` accepts a single item or a list. Items can be standalone hook functions, plugin instances, or `PluginSet`s. + +### Session-Scoped Registration + +Pass plugins to `start_session()` to scope them to that session: + +```python +m = mellea.start_session( + backend_name="openai", + model_id="gpt-4", + plugins=[security, observability], +) +``` + +The `plugins` parameter accepts the same types as `register()`: standalone hook functions, plugin instances, and `PluginSet`s. These plugins fire only within this session, in addition to any globally registered plugins. They are automatically deregistered when the session is cleaned up. + +### Functional API (No Session) + +When using the functional API directly: + +```python +from mellea.stdlib.functional import instruct + +result = instruct(backend, context, "Extract the user's age") +``` + +Only globally registered plugins fire. If no global plugins are registered, hooks are no-ops with zero overhead. Session-scoped plugins do not apply because there is no session. + +### Priority + +- Lower numbers execute first +- Within the same priority, execution order is deterministic but unspecified +- Default priority: 50 +- Priority can be set on `@hook` (per-handler), `@plugin` (per-plugin default), or `PluginSet` (per-set default). Most specific wins: per-handler > per-plugin > per-set. + +### YAML Configuration (Secondary) + +For deployment-time configuration, plugins can also be loaded from YAML. This is useful for enabling/disabling plugins or changing priorities without code changes: + +```yaml +plugins: + - name: content-policy + kind: mellea.plugins.ContentPolicyPlugin + hooks: + - component_pre_create + - generation_post_call + mode: enforce + priority: 10 + config: + blocked_terms: ["term1", "term2"] + + - name: telemetry + kind: mellea.plugins.TelemetryPlugin + hooks: + - component_post_success + - validation_post_check + - sampling_loop_end + mode: fire_and_forget + priority: 100 + config: + endpoint: "https://telemetry.example.com" +``` + +### Execution Modes (YAML / PluginMode Enum) + +The following modes are available in the ContextForge `PluginMode` enum and YAML configuration: + +- **`enforce`** (`PluginMode.ENFORCE`): Awaited inline, block execution on violation +- **`permissive`** (`PluginMode.PERMISSIVE`): Awaited inline, log violations without blocking +- **`fire_and_forget`** (`PluginMode.FIRE_AND_FORGET`): Background task, result ignored +- **`enforce_ignore_error`** (`PluginMode.ENFORCE_IGNORE_ERROR`): Like `enforce`, but tolerate plugin errors +- **`disabled`** (`PluginMode.DISABLED`): Skip hook execution + +The `@hook` decorator exposes `enforce`, `permissive`, and `fire_and_forget` — all backed by ContextForge's `PluginMode` enum. The others (`enforce_ignore_error`, `disabled`) are deployment-time concerns configured via YAML or programmatic `PluginConfig`. + +### Custom Hook Types + +The plugin framework supports custom hook types for domain-specific extension points beyond the built-in lifecycle hooks. This is particularly relevant for agentic patterns (ReAct, tool-use loops, etc.) where the execution flow is application-defined. + +Custom hooks use the same `@hook` decorator: + +```python +@hook("react_pre_reasoning") +async def before_reasoning(payload, ctx): + ... +``` + +Custom hooks follow the same calling convention, payload chaining, and result semantics as built-in hooks. The plugin manager discovers them via the decorator metadata at registration time. As agentic patterns stabilize in Mellea, frequently-used custom hooks may be promoted to built-in hooks. + +## 8. Example Implementations + +### Token Budget Enforcement (Standalone Function) + +```python +from mellea.plugins import hook, block + +@hook("generation_pre_call", mode="enforce", priority=10) +async def enforce_token_budget(payload, ctx): + budget = 4000 + estimated = payload.estimated_tokens or 0 + if estimated > budget: + return block( + f"Estimated {estimated} tokens exceeds budget of {budget}", + code="TOKEN_BUDGET_001", + details={"estimated": estimated, "budget": budget}, + ) +``` + +### Content Policy (Standalone Function) + +```python +from mellea.plugins import hook, block + +BLOCKED_TERMS = ["term1", "term2"] + +@hook("component_pre_create", mode="enforce", priority=10) +async def enforce_content_policy(payload, ctx): + # Only enforce on Instructions and GenerativeSlots + if payload.component_type not in ("Instruction", "GenerativeSlot"): + return None + + for term in BLOCKED_TERMS: + if term.lower() in payload.description.lower(): + return block( + f"Component contains blocked term: {term}", + code="CONTENT_POLICY_001", + ) +``` + +### Audit Logger (Fire-and-Forget) + +```python +from mellea.plugins import hook + +@hook("component_post_success", mode="fire_and_forget") +async def audit_log_success(payload, ctx): + await send_to_audit_service({ + "event": "generation_success", + "session_id": payload.session_id, + "component_type": payload.component_type, + "latency_ms": payload.latency_ms, + "timestamp": payload.timestamp.isoformat(), + }) + +@hook("component_post_error", mode="fire_and_forget") +async def audit_log_error(payload, ctx): + await send_to_audit_service({ + "event": "generation_error", + "session_id": payload.session_id, + "error_type": payload.error_type, + "timestamp": payload.timestamp.isoformat(), + }) +``` + +### PII Redaction Plugin (Class-Based with `@plugin`) + +```python +import re +from mellea.plugins import plugin, hook, PluginResult + +@plugin("pii-redactor", priority=5) +class PIIRedactor: + def __init__(self, patterns: list[str] | None = None): + self.patterns = patterns or [r"\d{3}-\d{2}-\d{4}"] + + @hook("component_pre_create") + async def redact_input(self, payload, ctx): + redacted = self._redact(payload.description) + if redacted != payload.description: + modified = payload.model_copy(update={"description": redacted}) + return PluginResult(continue_processing=True, modified_payload=modified) + + @hook("generation_post_call") + async def redact_output(self, payload, ctx): + redacted = self._redact(payload.processed_output) + if redacted != payload.processed_output: + modified = payload.model_copy(update={"processed_output": redacted}) + return PluginResult(continue_processing=True, modified_payload=modified) + + def _redact(self, text: str) -> str: + for pattern in self.patterns: + text = re.sub(pattern, "[REDACTED]", text) + return text +``` + +### Generative Slot Profiler (`MelleaPlugin` Subclass) + +```python +from collections import defaultdict +from mellea.plugins import MelleaPlugin, hook + +class SlotProfiler(MelleaPlugin): + """Uses MelleaPlugin for lifecycle hooks and typed context accessors.""" + + def __init__(self): + super().__init__() + self._stats = defaultdict(lambda: {"calls": 0, "total_ms": 0}) + + async def initialize(self): + # Called once when the plugin manager starts + self._stats.clear() + + @hook("component_post_success") + async def profile(self, payload, ctx): + if payload.component_type != "GenerativeSlot": + return None + stats = self._stats[payload.action.__name__] + stats["calls"] += 1 + stats["total_ms"] += payload.latency_ms +``` + +### Composition Example + +```python +from mellea.plugins import PluginSet, register +import mellea + +# Group by concern +security = PluginSet("security", [ + enforce_token_budget, + enforce_content_policy, + PIIRedactor(patterns=[r"\d{3}-\d{2}-\d{4}"]), +]) + +observability = PluginSet("observability", [ + audit_log_success, + audit_log_error, + SlotProfiler(), +]) + +# Global: fires for all invocations (session and functional API) +register(observability) + +# Session-scoped: security only for this session +m = mellea.start_session( + backend_name="openai", + model_id="gpt-4", + plugins=[security], +) + +# Functional API: only global plugins (observability) fire +from mellea.stdlib.functional import instruct +result = instruct(backend, context, "Extract the user's age") +``` + + +## 9. Hook Execution Flow + +### Simplified Main Flow + +```mermaid +flowchart LR + A([Start]) --> B[Session Init] + B --> C[Component] + C --> D[Generation] + D --> E[Validation] + E --> F{OK?} + F --> |Yes| G[Success] + F --> |No| H[Error/Retry] + G --> I[Cleanup] + H --> I + I --> J([End]) + + style A fill:#e1f5fe + style J fill:#e1f5fe + style F fill:#fce4ec +``` + +### Detailed Flow + +```mermaid +flowchart TD + A([User Request]) --> B + + subgraph Session["Session Lifecycle"] + B[session_pre_init] + B --> C[session_post_init] + end + + C --> D + + subgraph CompLifecycle["Component Lifecycle"] + D[component_pre_create] + D --> E[component_post_create] + E --> F[component_pre_execute] + end + + F --> G + + subgraph Sampling["Sampling Loop (if strategy)"] + SL1[sampling_loop_start] + SL1 --> SL2[sampling_iteration] + SL2 --> SL3{Valid?} + SL3 --> |No| SL4[sampling_repair] + SL4 --> SL2 + SL3 --> |Yes| SL5[sampling_loop_end] + end + + subgraph Generation["LLM Generation"] + G[generation_pre_call] + G --> H{{LLM Call}} + H --> H2[generation_stream_chunk] + H2 --> I[generation_post_call] + end + + I --> J + + subgraph Tools["Tool Calling (if tools)"] + T1[tool_pre_invoke] + T1 --> T2{{Tool Execution}} + T2 --> T3[tool_post_invoke] + end + + subgraph Validation["Validation"] + J[validation_pre_check] + J --> K{{Requirements Check}} + K --> L[validation_post_check] + end + + L --> M{Success?} + + M --> |Yes| N[component_post_success] + M --> |No| O[component_post_error] + + N --> P[context_update] + O --> Q[error_occurred] + + subgraph ContextOps["Context Operations"] + P[context_update] + P -.-> P2[context_prune] + end + + subgraph Adapter["Backend Adapter Operations"] + AD1[adapter_pre_load] + AD1 --> AD2[adapter_post_load] + AD3[adapter_pre_unload] + AD3 --> AD4[adapter_post_unload] + end + + subgraph Cleanup["Session End"] + R2[session_cleanup] + R[session_reset] + end + + P --> R2 + Q --> R2 + R2 --> Z([End]) + + %% Styling + style A fill:#e1f5fe + style Z fill:#e1f5fe + style H fill:#fff3e0 + style K fill:#fff3e0 + style T2 fill:#fff3e0 + style M fill:#fce4ec + style SL3 fill:#fce4ec +``` + +## 10. Observability Integration + +### Shallow Logging and OTel + +"Shallow logging" refers to OTel-instrumenting the HTTP transport layer of LLM client libraries (openai, ollama, litellm). This captures request/response spans at the network level without awareness of Mellea's semantic concepts (components, sampling strategies, validation). + +The hook system provides natural integration points for enriching these shallow spans with Mellea-level context: + +- **`generation_pre_call`**: Inject span attributes such as `component_type`, `strategy_name`, and `request_id` into the active OTel context before the HTTP call fires +- **`generation_post_call`**: Attach result metadata — `finish_reason`, `token_usage`, validation outcome — to the span after the call completes +- **`request_id` from `BasePayload`**: Serves as a correlation ID linking Mellea-level hook events to transport-level OTel spans + +> **Forward-looking**: Mellea does not currently include OTel integration. This section describes the intended design for how hooks and shallow logging would compose when OTel support is added. + +## 11. Error Handling, Security & Isolation + +### Error Handling + +- **Isolation**: Plugin exceptions should not crash Mellea sessions; wrap each handler in try/except +- **Logging**: All plugin errors are logged with full context +- **Timeouts**: Support configurable timeouts for plugin execution +- **Circuit Breaker**: Disable failing plugins after repeated errors + +### Security Considerations + +- **Data Privacy**: Payloads may include user content; plugins must respect privacy policies +- **Redaction**: Consider masking sensitive fields for plugins that should not see them +- **Sandboxing**: Provide options to run plugins in restricted environments +- **Validation**: Validate plugin inputs and outputs to prevent injection attacks + +### Isolation Options + +This is a proposal for supporting compartmentalized execution of plugins. + +```yaml +plugins: + - name: untrusted-plugin + kind: external.UntrustedPlugin + isolation: + sandbox: true + timeout_ms: 5000 + max_memory_mb: 256 + allowed_operations: ["read_payload", "emit_metric"] +``` + +## 12. Backward Compatibility & Migration + +### Versioning + +- Hook payload contracts are versioned (e.g., `payload_version: "1.0"`) +- Breaking changes increment major version +- Deprecated fields marked and maintained for one major version +- Hook payload versions are independent of Mellea release versions. Payload versions change only when the payload schema changes, which may or may not coincide with a Mellea release + +### Default Behavior + +- Without plugins registered, Mellea behavior is unchanged +- Default "no-op" plugin manager if no configuration provided diff --git a/docs/dev/hook_system_implementation_plan.md b/docs/dev/hook_system_implementation_plan.md new file mode 100644 index 000000000..992e5bbba --- /dev/null +++ b/docs/dev/hook_system_implementation_plan.md @@ -0,0 +1,1279 @@ +# Mellea Hook System Implementation Plan + +This document describes the implementation plan for the extensibility hook system specified in [`docs/dev/hook_system.md`](hook_system.md). The implementation uses the [ContextForge plugin framework](https://github.com/IBM/mcp-context-forge) (`mcpgateway.plugins.framework`) as an optional external dependency for core plumbing, while all Mellea-specific types — hook enums, payload models, and the plugin base class — are owned by Mellea under a new `mellea/plugins/` subpackage. + +The primary developer-facing API is Python decorators (`@hook`, `@plugin`) and programmatic registration (`register()`, `PluginSet`). YAML configuration is supported as a secondary mechanism for deployment-time overrides. Plugins work identically whether invoked through a session or via the functional API (`instruct(backend, context, ...)`). + +**Note**: The plugin framework is in the process of being extracted as a standalone Python package. Once completed, the package import path prefix will look like `cpex.framework`. + + +## 1. Package Structure + +``` +mellea/plugins/ +├── __init__.py # Public API: hook, plugin, block, PluginSet, register, MelleaPlugin +├── _manager.py # Singleton wrapper + session-tag filtering +├── _base.py # MelleaBasePayload, MelleaPlugin base class +├── _types.py # MelleaHookType enum + hook registration +├── _policies.py # HookPayloadPolicy table + DefaultHookPolicy for Mellea hooks +├── _context.py # Plugin context factory helper +├── _decorators.py # @hook and @plugin decorator implementations +├── _pluginset.py # PluginSet class +├── _registry.py # register(), block() helpers + global/session dispatch logic +└── hooks/ + ├── __init__.py # Re-exports all payload classes + ├── session.py # session lifecycle payloads + ├── component.py # component lifecycle payloads + ├── generation.py # generation pipeline payloads + ├── validation.py # validation payloads + ├── sampling.py # sampling pipeline payloads + ├── tool.py # tool execution payloads + ├── adapter.py # adapter operation payloads + ├── context_ops.py # context operation payloads + └── error.py # error handling payload +``` + +## 2. ContextForge Plugin Framework (Key Interfaces Used) + +The following types from `mcpgateway.plugins.framework` form the plumbing layer. Mellea uses these but does **not** import any ContextForge-specific hook types (prompts, tools, resources, agents, http). + +| Type | Role | +|------|------| +| `Plugin` | ABC base class. `__init__(config: PluginConfig)`, `initialize()`, `shutdown()`. Hook methods discovered by convention (method name = hook type) or `@hook()` decorator. Signature: `async def hook_name(self, payload, context) -> PluginResult`. | +| `PluginManager` | Borg singleton. `__init__(config_path, timeout, observability, hook_policies)`. Key methods: `invoke_hook(hook_type, payload, global_context, ...) -> (PluginResult, PluginContextTable)`, `has_hooks_for(hook_type) -> bool`, `initialize()`, `shutdown()`. The `hook_policies` parameter accepts a `dict[str, HookPayloadPolicy]` mapping hook types to their writable-field policies. | +| `PluginPayload` | Base type for all hook payloads. Frozen Pydantic `BaseModel` (`ConfigDict(frozen=True)`). Plugins use `model_copy(update={...})` to propose modifications. | +| `PluginResult[T]` | Generic result: `continue_processing: bool`, `modified_payload: T | None`, `violation: PluginViolation | None`, `metadata: dict`. | +| `PluginViolation` | `reason`, `description`, `code`, `details`. | +| `PluginConfig` | `name`, `kind`, `hooks`, `mode`, `priority`, `conditions`, `config`, ... | +| `PluginMode` | `ENFORCE`, `ENFORCE_IGNORE_ERROR`, `PERMISSIVE`, `FIRE_AND_FORGET`, `DISABLED`. | +| `PluginContext` | `state: dict`, `global_context: GlobalContext`, `metadata: dict`. | +| `HookPayloadPolicy` | Frozen dataclass with `writable_fields: frozenset[str]`. Defines which payload fields plugins may modify for a given hook type. | +| `DefaultHookPolicy` | Enum: `ALLOW` (accept all modifications), `DENY` (reject all modifications). Controls behavior for hooks without an explicit policy. | +| `apply_policy()` | `apply_policy(original, modified, policy) -> BaseModel \| None`. Accepts only changes to writable fields via `model_copy(update=...)`, discarding unauthorized changes. Returns `None` if no effective changes. | +| `HookRegistry` | `get_hook_registry()`, `register_hook(hook_type, payload_class, result_class)`, `is_registered(hook_type)`. | +| `@hook` decorator | `@hook("hook_type")` or `@hook("hook_type", PayloadType, ResultType)` for custom method names. | + +### Class Diagram + +```mermaid +classDiagram + direction TB + + %% Core Plugin Classes + class Plugin { + <> + +__init__(config: PluginConfig) + +initialize()* async + +shutdown()* async + +hook_name(payload, context)* async PluginResult + } + + class PluginManager { + <> + -config_path: str + -timeout: int + -observability: Any + -hook_registry: HookRegistry + -hook_policies: dict~str, HookPayloadPolicy~ + +__init__(config_path, timeout, observability, hook_policies) + +invoke_hook(hook_type, payload, global_context, ...) tuple~PluginResult, PluginContextTable~ + +has_hooks_for(hook_type: str) bool + +initialize() async + +shutdown() async + } + + %% Configuration + class PluginConfig { + +name: str + +kind: str + +hooks: list~str~ + +mode: PluginMode + +priority: int + +conditions: dict + +config: dict + } + + class PluginMode { + <> + ENFORCE + ENFORCE_IGNORE_ERROR + PERMISSIVE + FIRE_AND_FORGET + DISABLED + } + + %% Policy + class HookPayloadPolicy { + <> + +writable_fields: frozenset~str~ + } + + class DefaultHookPolicy { + <> + ALLOW + DENY + } + + %% Payload & Result + class PluginPayload { + <> + pydantic.BaseModel + model_config: frozen=True + } + + class PluginResult~T~ { + +continue_processing: bool + +modified_payload: T | None + +violation: PluginViolation | None + +metadata: dict + } + + class PluginViolation { + +reason: str + +description: str + +code: str + +details: dict + } + + %% Context + class PluginContext { + +state: dict + +global_context: GlobalContext + +metadata: dict + } + + class hook { + <> + +__call__(hook_type: str) + +__call__(hook_type, PayloadType, ResultType) + } + + %% Relationships + Plugin --> PluginConfig : configured by + Plugin ..> PluginPayload : receives + Plugin ..> PluginResult : returns + Plugin ..> PluginContext : receives + Plugin ..> hook : decorated by + + PluginManager --> Plugin : manages 0..* + PluginManager --> HookPayloadPolicy : enforces per hook + + PluginConfig --> PluginMode : has + + PluginResult --> PluginViolation : may contain + PluginResult --> PluginPayload : wraps modified +``` + +### YAML Plugin Configuration (reference) + +Plugins can also be configured via YAML as a secondary mechanism. Programmatic registration via `@hook`, `@plugin`, and `register()` is the primary approach. + +```yaml +plugins: + - name: content-policy + kind: mellea.plugins.examples.ContentPolicyPlugin + hooks: + - component_pre_create + - generation_post_call + mode: enforce + priority: 10 + config: + blocked_terms: ["term1", "term2"] + + - name: telemetry + kind: mellea.plugins.examples.TelemetryPlugin + hooks: + - component_post_success + - sampling_loop_end + mode: fire_and_forget + priority: 100 + config: + endpoint: "https://telemetry.example.com" +``` + +### Mellea Wrapper Layer + +Mellea exposes its own `@hook` and `@plugin` decorators that translate to ContextForge registrations internally. This serves two purposes: + +1. **Mellea-aligned API**: The `@hook` decorator accepts a `mode` parameter with three string values (`"enforce"`, `"permissive"`, `"fire_and_forget"`) that map directly to ContextForge's `PluginMode` enum (`ENFORCE`, `PERMISSIVE`, `FIRE_AND_FORGET`), matching Mellea's code-first ergonomics without requiring users to import the enum. +2. **Session tagging**: Mellea's wrapper adds session-scoping metadata that ContextForge's `PluginManager` does not natively support. The `_manager.py` layer filters hooks at dispatch time based on session tags. + +Users never import from `mcpgateway.plugins.framework` directly. + +## 3. Core Types + +### 3.1 `MelleaHookType` enum (`mellea/plugins/_types.py`) + +A single `MelleaHookType(str, Enum)` containing all 27 hook types. String-based values for compatibility with ContextForge's `invoke_hook(hook_type: str, ...)`. + +```python +class MelleaHookType(str, Enum): + # Session Lifecycle + SESSION_PRE_INIT = "session_pre_init" + SESSION_POST_INIT = "session_post_init" + SESSION_RESET = "session_reset" + SESSION_CLEANUP = "session_cleanup" + + # Component Lifecycle + COMPONENT_PRE_CREATE = "component_pre_create" + COMPONENT_POST_CREATE = "component_post_create" + COMPONENT_PRE_EXECUTE = "component_pre_execute" + COMPONENT_POST_SUCCESS = "component_post_success" + COMPONENT_POST_ERROR = "component_post_error" + + # Generation Pipeline + GENERATION_PRE_CALL = "generation_pre_call" + GENERATION_POST_CALL = "generation_post_call" + GENERATION_STREAM_CHUNK = "generation_stream_chunk" + + # Validation + VALIDATION_PRE_CHECK = "validation_pre_check" + VALIDATION_POST_CHECK = "validation_post_check" + + # Sampling Pipeline + SAMPLING_LOOP_START = "sampling_loop_start" + SAMPLING_ITERATION = "sampling_iteration" + SAMPLING_REPAIR = "sampling_repair" + SAMPLING_LOOP_END = "sampling_loop_end" + + # Tool Execution + TOOL_PRE_INVOKE = "tool_pre_invoke" + TOOL_POST_INVOKE = "tool_post_invoke" + + # Backend Adapter Ops + ADAPTER_PRE_LOAD = "adapter_pre_load" + ADAPTER_POST_LOAD = "adapter_post_load" + ADAPTER_PRE_UNLOAD = "adapter_pre_unload" + ADAPTER_POST_UNLOAD = "adapter_post_unload" + + # Context Operations + CONTEXT_UPDATE = "context_update" + CONTEXT_PRUNE = "context_prune" + + # Error Handling + ERROR_OCCURRED = "error_occurred" +``` + +### 3.2 `MelleaBasePayload` (`mellea/plugins/_base.py`) + +All Mellea hook payloads inherit from this base, which extends `PluginPayload` with the common fields from the hook system spec (Section 2): + +```python +class MelleaBasePayload(PluginPayload): + """Frozen base — all payloads are immutable by design. + + Plugins must use ``model_copy(update={...})`` to propose modifications + and return the copy via ``PluginResult.modified_payload``. The plugin + manager applies the hook's ``HookPayloadPolicy`` to filter changes to + writable fields only. + """ + model_config = ConfigDict(frozen=True, arbitrary_types_allowed=True) + + session_id: str | None = None + request_id: str + timestamp: datetime = Field(default_factory=datetime.utcnow) + hook: str + user_metadata: dict[str, Any] = Field(default_factory=dict) +``` + +`frozen=True` prevents in-place mutations — attribute assignment on a payload instance raises `FrozenModelError`. `arbitrary_types_allowed=True` is required because payloads include non-serializable Mellea objects (`Backend`, `Context`, `Component`, `ModelOutputThunk`). This means external plugins cannot receive these payloads directly; they are designed for native in-process plugins. + +### 3.3 Hook Registration (`mellea/plugins/_types.py`) + +A `_register_mellea_hooks()` function registers all hook types with the ContextForge `HookRegistry`. Called once during plugin initialization. Idempotent via `is_registered()` check. Follows the same pattern used by ContextForge's own hook modules (e.g., `mcpgateway/plugins/framework/hooks/tools.py`). + +```python +def _register_mellea_hooks() -> None: + registry = get_hook_registry() + for hook_type, (payload_cls, result_cls) in _HOOK_REGISTRY.items(): + if not registry.is_registered(hook_type): + registry.register_hook(hook_type, payload_cls, result_cls) +``` + +### 3.4 Context Mapping (`mellea/plugins/_context.py`) + +The hook system spec defines domain-specific `PluginContext` fields (`session`, `backend`, `context`) that vary by hook category. ContextForge provides a generic `GlobalContext` with a `state: dict`. The mapping uses `GlobalContext.state` as the carrier for Mellea-specific context: + +```python +def build_global_context( + *, + session: MelleaSession | None = None, + backend: Backend | None = None, + context: Context | None = None, + request_id: str = "", + **extra_fields, +) -> GlobalContext: + state: dict[str, Any] = {} + if session is not None: + state["session"] = session + if backend is not None: + state["backend"] = backend + state["backend_name"] = getattr(backend, "model_id", "unknown") + if context is not None: + state["context"] = context + state.update(extra_fields) + return GlobalContext(request_id=request_id, state=state) +``` + +### 3.5 `MelleaPlugin` Base Class (`mellea/plugins/_base.py`) + +`MelleaPlugin` is one of three ways to define plugins, alongside `@hook` on standalone functions (primary) and `@plugin` on plain classes. Use `MelleaPlugin` when you need lifecycle hooks (`initialize`/`shutdown`) or typed context accessors. + +Extends ContextForge `Plugin` with typed context accessor helpers so plugin authors don't need to know about the `GlobalContext.state` mapping: + +```python +class MelleaPlugin(Plugin): + """Base class for Mellea plugins with lifecycle hooks and typed accessors.""" + + def get_backend(self, context: PluginContext) -> Backend | None: + return context.global_context.state.get("backend") + + def get_mellea_context(self, context: PluginContext) -> Context | None: + return context.global_context.state.get("context") + + def get_session(self, context: PluginContext) -> MelleaSession | None: + return context.global_context.state.get("session") + + @property + def plugin_config(self) -> dict[str, Any]: + return self._config.config or {} +``` + +No new abstract methods. ContextForge's `initialize()` and `shutdown()` suffice. + +### 3.6 `@hook` Decorator (`mellea/plugins/_decorators.py`) + +The `@hook` decorator works on both standalone async functions and class methods: + +```python +@dataclass(frozen=True) +class HookMeta: + hook_type: str + mode: Literal["enforce", "permissive", "fire_and_forget"] = "enforce" + priority: int = 50 + +def hook( + hook_type: str, + *, + mode: Literal["enforce", "permissive", "fire_and_forget"] = "enforce", + priority: int = 50, +) -> Callable: + """Register an async function or method as a hook handler.""" + def decorator(fn): + fn._mellea_hook_meta = HookMeta( + hook_type=hook_type, + mode=mode, + priority=priority, + ) + return fn + return decorator +``` + +The `mode` parameter controls both execution strategy and result handling. These map directly to ContextForge's `PluginMode` enum: +- `"enforce"` → `PluginMode.ENFORCE` / `"permissive"` → `PluginMode.PERMISSIVE`: Hook is awaited inline (blocking). Difference is whether violations halt execution or are logged only. +- `"fire_and_forget"` → `PluginMode.FIRE_AND_FORGET`: Hook is dispatched as a background `asyncio.create_task()`. Result is ignored. This is handled by ContextForge's `PluginManager` dispatch logic. + +When used on a standalone function, the metadata is read at `register()` time or when passed to `start_session(plugins=[...])`. When used on a class method, it is discovered during class registration (either via `@plugin` or `MelleaPlugin` introspection). + +### 3.7 `@plugin` Decorator (`mellea/plugins/_decorators.py`) + +The `@plugin` decorator marks a plain class as a multi-hook plugin: + +```python +@dataclass(frozen=True) +class PluginMeta: + name: str + priority: int = 50 + +def plugin( + name: str, + *, + priority: int = 50, +) -> Callable: + """Mark a class as a Mellea plugin.""" + def decorator(cls): + cls._mellea_plugin_meta = PluginMeta( + name=name, + priority=priority, + ) + return cls + return decorator +``` + +On registration, all methods with `_mellea_hook_meta` are discovered and registered as hook handlers bound to the instance. Methods without `@hook` are ignored. + +### 3.8 `PluginSet` (`mellea/plugins/_pluginset.py`) + +A named, composable group of hook functions and plugin instances: + +```python +class PluginSet: + def __init__( + self, + name: str, + items: list[Callable | Any | "PluginSet"], + *, + priority: int | None = None, + ): + self.name = name + self.items = items + self.priority = priority + + def flatten(self) -> list[tuple[Callable | Any, int | None]]: + """Recursively flatten nested PluginSets into (item, priority_override) pairs.""" + result = [] + for item in self.items: + if isinstance(item, PluginSet): + result.extend(item.flatten()) + else: + result.append((item, self.priority)) + return result +``` + +PluginSets are inert containers — they do not register anything themselves. Registration happens when they are passed to `register()` or `start_session(plugins=[...])`. + +### 3.9 `block()` Helper (`mellea/plugins/_registry.py`) + +Convenience function for returning a blocking result from a hook: + +```python +def block( + reason: str, + *, + code: str = "", + description: str = "", + details: dict[str, Any] | None = None, +) -> PluginResult: + return PluginResult( + continue_processing=False, + violation=PluginViolation( + reason=reason, + description=description or reason, + code=code, + details=details or {}, + ), + ) +``` + + +### 3.10 Hook Payload Policies (`mellea/plugins/_policies.py`) + +Defines the concrete per-hook-type policies for Mellea hooks. These are injected into the `PluginManager` at initialization time via the `hook_policies` parameter. + +```python +from mcpgateway.plugins.framework.hooks.policies import HookPayloadPolicy + +MELLEA_HOOK_PAYLOAD_POLICIES: dict[str, HookPayloadPolicy] = { + # Session Lifecycle + "session_pre_init": HookPayloadPolicy( + writable_fields=frozenset({"backend_name", "model_id", "model_options", "backend_kwargs"}), + ), + # session_post_init, session_reset, session_cleanup: observe-only (no entry) + + # Component Lifecycle + "component_pre_create": HookPayloadPolicy( + writable_fields=frozenset({ + "description", "images", "requirements", "icl_examples", + "grounding_context", "user_variables", "prefix", "template_id", + }), + ), + "component_post_create": HookPayloadPolicy( + writable_fields=frozenset({"component"}), + ), + "component_pre_execute": HookPayloadPolicy( + writable_fields=frozenset({ + "action", "context", "context_view", "requirements", + "model_options", "format", "strategy", "tool_calls_enabled", + }), + ), + "component_post_success": HookPayloadPolicy( + writable_fields=frozenset({"result"}), + ), + # component_post_error: observe-only + + # Generation Pipeline + "generation_pre_call": HookPayloadPolicy( + writable_fields=frozenset({"model_options", "tools", "format", "formatted_prompt"}), + ), + "generation_post_call": HookPayloadPolicy( + writable_fields=frozenset({"processed_output", "model_output"}), + ), + "generation_stream_chunk": HookPayloadPolicy( + writable_fields=frozenset({"chunk", "accumulated"}), + ), + + # Validation + "validation_pre_check": HookPayloadPolicy( + writable_fields=frozenset({"requirements", "model_options"}), + ), + "validation_post_check": HookPayloadPolicy( + writable_fields=frozenset({"results", "all_passed"}), + ), + + # Sampling Pipeline + "sampling_loop_start": HookPayloadPolicy( + writable_fields=frozenset({"loop_budget"}), + ), + # sampling_iteration: observe-only + "sampling_repair": HookPayloadPolicy( + writable_fields=frozenset({"repair_action", "repair_context"}), + ), + "sampling_loop_end": HookPayloadPolicy( + writable_fields=frozenset({"final_result"}), + ), + + # Tool Execution + "tool_pre_invoke": HookPayloadPolicy( + writable_fields=frozenset({"tool_args"}), + ), + "tool_post_invoke": HookPayloadPolicy( + writable_fields=frozenset({"tool_output"}), + ), + + # adapter_*, context_*, error_occurred: observe-only (no entry) +} +``` + +Hooks absent from this table are observe-only. With `DefaultHookPolicy.DENY` (the Mellea default), any modification attempt on an observe-only hook is rejected with a warning log. + +## 4. Plugin Manager Integration (`mellea/plugins/_manager.py`) + +### 4.1 Lazy Singleton Wrapper + +The `PluginManager` is lazily initialized on first use (either via `register()` or `start_session(plugins=[...])`). A config path is no longer required — code-first registration may be the only path. + +```python +_plugin_manager: PluginManager | None = None +_plugins_enabled: bool = False +_session_tags: dict[str, set[str]] = {} # session_id -> set of plugin keys + +def has_plugins() -> bool: + """Fast check: are plugins configured and available?""" + return _plugins_enabled + +def get_plugin_manager() -> PluginManager | None: + """Returns the initialized PluginManager, or None if plugins are not configured.""" + return _plugin_manager + +def _ensure_plugin_manager() -> PluginManager: + """Lazily initialize the PluginManager if not already created.""" + global _plugin_manager, _plugins_enabled + if _plugin_manager is None: + _register_mellea_hooks() + pm = PluginManager( + "", + timeout=5, + hook_policies=MELLEA_HOOK_PAYLOAD_POLICIES, + ) + _run_async_in_thread(pm.initialize()) + _plugin_manager = pm + _plugins_enabled = True + return _plugin_manager + +async def initialize_plugins( + config_path: str | None = None, *, timeout: float = 5.0 +) -> PluginManager: + """Initialize the PluginManager with Mellea hook registrations and optional YAML config.""" + global _plugin_manager, _plugins_enabled + _register_mellea_hooks() + pm = PluginManager( + config_path or "", + timeout=int(timeout), + hook_policies=MELLEA_HOOK_PAYLOAD_POLICIES, + ) + await pm.initialize() + _plugin_manager = pm + _plugins_enabled = True + return pm + +async def shutdown_plugins() -> None: + """Shut down the PluginManager.""" + global _plugin_manager, _plugins_enabled, _session_tags + if _plugin_manager is not None: + await _plugin_manager.shutdown() + _plugin_manager = None + _plugins_enabled = False + _session_tags.clear() +``` + +### 4.2 `invoke_hook()` Central Helper + +All hook call sites use this single function. Three layers of no-op guards ensure zero overhead when plugins are not configured: + +1. **`_plugins_enabled` boolean** — module-level, a single pointer dereference +2. **`has_hooks_for(hook_type)`** — skips invocation when no plugin subscribes to this hook +3. **Returns `(None, original_payload)` immediately** when either guard fails + +When `session_id` is provided, the manager invokes both global plugins (those registered without a session tag) and session-scoped plugins matching that session ID. When `session_id` is `None` (functional API path), only global plugins are invoked. + +```python +async def invoke_hook( + hook_type: MelleaHookType, + payload: MelleaBasePayload, + *, + session_id: str | None = None, + session: MelleaSession | None = None, + backend: Backend | None = None, + context: Context | None = None, + request_id: str = "", + violations_as_exceptions: bool = True, + **context_fields, +) -> tuple[PluginResult | None, MelleaBasePayload]: + """Invoke a hook if plugins are configured. + + Returns (result, possibly-modified-payload). + If plugins are not configured, returns (None, original_payload) immediately. + + When session_id is provided, both global plugins and session-scoped + plugins matching that session ID are invoked. When session_id is None + (functional API path), only global plugins are invoked. + """ + if not _plugins_enabled or _plugin_manager is None: + return None, payload + + if not _plugin_manager.has_hooks_for(hook_type.value): + return None, payload + + # Payloads are frozen — use model_copy to set dispatch-time fields + updates: dict[str, Any] = {"hook": hook_type.value, "session_id": session_id} + if not payload.request_id: + updates["request_id"] = request_id + payload = payload.model_copy(update=updates) + + global_ctx = build_global_context( + session=session, backend=backend, context=context, + request_id=request_id, session_id=session_id, **context_fields, + ) + + result, _ = await _plugin_manager.invoke_hook( + hook_type=hook_type.value, + payload=payload, + global_context=global_ctx, + violations_as_exceptions=violations_as_exceptions, + ) + + modified = result.modified_payload if result and result.modified_payload else payload + return result, modified +``` + +### 4.3 Session-Scoped Registration + +`start_session()` in `mellea/stdlib/session.py` gains an optional `plugins` keyword parameter: + +```python +def start_session( + ..., + plugins: list[Callable | Any | PluginSet] | None = None, +) -> MelleaSession: +``` + +When `plugins` is provided, `start_session()` registers each item with the session's ID via `register(items, session_id=session.id)`. These plugins fire only within this session, in addition to any globally registered plugins. They are automatically deregistered when the session is cleaned up (at `session_cleanup`). + +### 4.4 With-Block-Scoped Registration (Context Managers) + +All three plugin forms — standalone `@hook` functions, `@plugin`-decorated class instances, and `MelleaPlugin` subclass instances — plus `PluginSet` support the Python context manager protocol for block-scoped activation. This is a fourth registration scope complementing global, session-scoped, and YAML-configured plugins. + +#### Mechanism + +With-block scopes reuse the existing `session_id` tagging infrastructure from section 4.3. Each `with` entry generates a fresh UUID scope ID, registers plugins with that scope ID, and deregisters them by scope ID on exit. The `_session_tags` dict in `_manager.py` tracks these scope IDs alongside session IDs — the manager makes no distinction between them at dispatch time. + +#### `plugin_scope()` factory (`mellea/plugins/_registry.py`) + +A `_PluginScope` internal class and `plugin_scope()` public factory serve as the universal entry point, accepting any mix of standalone functions, `@plugin` instances, and `PluginSet`s: + +```python +class _PluginScope: + """Context manager that activates a set of plugins for a block of code.""" + + def __init__(self, items: list[Callable | Any | PluginSet]) -> None: + self._items = items + self._scope_id: str | None = None + + def _activate(self) -> None: + self._scope_id = str(uuid.uuid4()) + register(self._items, session_id=self._scope_id) + + def _deactivate(self) -> None: + if self._scope_id is not None: + deregister_session_plugins(self._scope_id) + self._scope_id = None + + def __enter__(self) -> _PluginScope: + self._activate() + return self + + def __exit__(self, exc_type, exc_val, exc_tb) -> None: + self._deactivate() + + async def __aenter__(self) -> _PluginScope: + self._activate() + return self + + async def __aexit__(self, exc_type, exc_val, exc_tb) -> None: + self._deactivate() + + +def plugin_scope(*items: Callable | Any | PluginSet) -> _PluginScope: + """Create a context manager that activates the given plugins for a block of code.""" + return _PluginScope(list(items)) +``` + +#### `@plugin`-decorated class instances (`mellea/plugins/_decorators.py`) + +The `@plugin` decorator injects `__enter__`, `__exit__`, `__aenter__`, `__aexit__` into every decorated class. The methods are defined as module-level helpers (not lambdas) so they work correctly as unbound methods: + +```python +def _plugin_cm_enter(self: Any) -> Any: + if getattr(self, "_scope_id", None) is not None: + meta = getattr(type(self), "_mellea_plugin_meta", None) + plugin_name = meta.name if meta else type(self).__name__ + raise RuntimeError( + f"Plugin {plugin_name!r} is already active as a context manager. " + "Concurrent or nested reuse of the same instance is not supported; " + "create a new instance instead." + ) + self._scope_id = str(uuid.uuid4()) + register(self, session_id=self._scope_id) + return self + + +def _plugin_cm_exit(self: Any, exc_type: Any, exc_val: Any, exc_tb: Any) -> None: + scope_id = getattr(self, "_scope_id", None) + if scope_id is not None: + deregister_session_plugins(scope_id) + self._scope_id = None + + +async def _plugin_cm_aenter(self: Any) -> Any: + return self.__enter__() + + +async def _plugin_cm_aexit(self: Any, exc_type: Any, exc_val: Any, exc_tb: Any) -> None: + self.__exit__(exc_type, exc_val, exc_tb) + + +def plugin(name: str, *, priority: int = 50) -> Callable: + def decorator(cls: Any) -> Any: + cls._mellea_plugin_meta = PluginMeta(name=name, priority=priority) + cls.__enter__ = _plugin_cm_enter # injected here + cls.__exit__ = _plugin_cm_exit + cls.__aenter__ = _plugin_cm_aenter + cls.__aexit__ = _plugin_cm_aexit + return cls + return decorator +``` + +#### `PluginSet` (`mellea/plugins/_pluginset.py`) + +`PluginSet` gains the same context manager protocol using the same UUID scope ID pattern: + +```python +def __enter__(self) -> PluginSet: + if self._scope_id is not None: + raise RuntimeError( + f"PluginSet {self.name!r} is already active as a context manager. " + "Create a new instance to use in a separate scope." + ) + self._scope_id = str(uuid.uuid4()) + register(self, session_id=self._scope_id) + return self + +def __exit__(self, exc_type, exc_val, exc_tb) -> None: + if self._scope_id is not None: + deregister_session_plugins(self._scope_id) + self._scope_id = None + +async def __aenter__(self) -> PluginSet: + return self.__enter__() + +async def __aexit__(self, exc_type, exc_val, exc_tb) -> None: + self.__exit__(exc_type, exc_val, exc_tb) +``` + +#### `MelleaPlugin` (`mellea/plugins/_base.py`) + +`MelleaPlugin` gains the same protocol. Because `MelleaPlugin` subclasses ContextForge `Plugin` (which owns `__init__`), the scope ID is stored as an instance attribute accessed via `getattr` with a default rather than declared in `__init__`: + +```python +def __enter__(self) -> MelleaPlugin: + if getattr(self, "_scope_id", None) is not None: + raise RuntimeError( + f"MelleaPlugin {self.name!r} is already active as a context manager. " + "Create a new instance to use in a separate scope." + ) + self._scope_id = str(uuid.uuid4()) + register(self, session_id=self._scope_id) + return self + +def __exit__(self, exc_type, exc_val, exc_tb) -> None: + scope_id = getattr(self, "_scope_id", None) + if scope_id is not None: + deregister_session_plugins(scope_id) + self._scope_id = None + +async def __aenter__(self) -> MelleaPlugin: + return self.__enter__() + +async def __aexit__(self, exc_type, exc_val, exc_tb) -> None: + self.__exit__(exc_type, exc_val, exc_tb) +``` + +#### Deregistration helper (`mellea/plugins/_manager.py`) + +A `deregister_session_plugins(scope_id)` function removes all plugins tagged with a given scope ID from the `PluginManager` and cleans up the `_session_tags` entry. This is the same function used by `session_cleanup` to deregister session-scoped plugins: + +```python +def deregister_session_plugins(session_id: str) -> None: + """Deregister all plugins associated with a given session or scope ID.""" + pm = _plugin_manager + if pm is None: + return + plugin_keys = _session_tags.pop(session_id, set()) + for key in plugin_keys: + pm.deregister(key) +``` + +#### Public API + +`plugin_scope` is exported from `mellea.plugins`: + +```python +from mellea.plugins import plugin_scope +``` + +All four forms (`plugin_scope`, `@plugin` instance, `PluginSet`, `MelleaPlugin` instance) support both `with` and `async with`. The same-instance re-entrant restriction applies to all forms: attempting to re-enter an already-active instance raises `RuntimeError`. Create separate instances to activate the same plugin logic in nested or concurrent scopes. + +### 4.5 Dependency Management + +Add to `pyproject.toml` under `[project.optional-dependencies]`: + +```toml +plugins = ["contextforge-plugin-framework>=0.1.0"] +``` + +All imports in `mellea/plugins/` are guarded with `try/except ImportError`. + +### 4.6 Global Registration (`mellea/plugins/_registry.py`) + +Global registration happens via `register()` at application startup: + +```python +def register( + items: Callable | Any | PluginSet | list[Callable | Any | PluginSet], + *, + session_id: str | None = None, +) -> None: + """Register plugins globally or for a specific session. + + When session_id is None, plugins are global (fire for all invocations). + When session_id is provided, plugins fire only within that session. + + Accepts standalone @hook functions, @plugin-decorated class instances, + MelleaPlugin instances, PluginSets, or lists thereof. + """ + pm = _ensure_plugin_manager() + + if not isinstance(items, list): + items = [items] + + for item in items: + if isinstance(item, PluginSet): + for flattened_item, priority_override in item.flatten(): + _register_single(pm, flattened_item, session_id, priority_override) + else: + _register_single(pm, item, session_id, None) + + +def _register_single( + pm: PluginManager, + item: Callable | Any, + session_id: str | None, + priority_override: int | None, +) -> None: + """Register a single hook function or plugin instance. + + - Standalone functions with _mellea_hook_meta: wrapped in _FunctionHookAdapter + - @plugin-decorated class instances: methods with _mellea_hook_meta discovered and registered + - MelleaPlugin instances: registered directly with ContextForge + """ + ... +``` + +A `_FunctionHookAdapter` internal class wraps a standalone `@hook`-decorated function into a ContextForge `Plugin` for the `PluginManager`: + +```python +class _FunctionHookAdapter(Plugin): + """Adapts a standalone @hook-decorated function into a ContextForge Plugin.""" + + def __init__(self, fn: Callable, session_id: str | None = None): + meta = fn._mellea_hook_meta + config = PluginConfig( + name=fn.__qualname__, + kind=fn.__module__ + "." + fn.__qualname__, + hooks=[meta.hook_type], + mode=_map_mode(meta.mode), + priority=meta.priority, + ) + super().__init__(config) + self._fn = fn + self._session_id = session_id + + async def initialize(self): + pass + + async def shutdown(self): + pass +``` + +## 5. Hook Call Sites + +**Session context threading**: All `invoke_hook` calls pass `session_id` when operating within a session. For the functional API path, `session_id` is `None` and only globally registered plugins are dispatched. Session-scoped plugins (registered via `start_session(plugins=[...])`) fire only when the dispatch context matches their session ID. + +### 5.1 Session Lifecycle + +**File**: `mellea/stdlib/session.py` + +`start_session()` gains the `plugins` parameter for session-scoped registration: + +```python +def start_session( + backend_name: ... = "ollama", + model_id: ... = IBM_GRANITE_4_MICRO_3B, + ctx: Context | None = None, + *, + model_options: dict | None = None, + plugins: list[Callable | Any | PluginSet] | None = None, + **backend_kwargs, +) -> MelleaSession: +``` + +Session-scoped plugins passed via `plugins=[...]` are registered with this session's ID and deregistered at `session_cleanup`. + +| Hook | Location | Trigger | Result Handling | +|------|----------|---------|-----------------| +| `session_pre_init` | `start_session()`, before `backend_class(model_id, ...)` (~L163) | Before backend instantiation | Supports payload modification: updated `model_options`, `backend_name`. Violation blocks session creation. | +| `session_post_init` | `start_session()`, after `MelleaSession(backend, ctx)` (~L191) | Session fully created | Observability-only. | +| `session_reset` | `MelleaSession.reset()`, before `self.ctx.reset_to_new()` (~L269) | Context about to reset | Observability-only. | +| `session_cleanup` | `MelleaSession.cleanup()`, at top of method (~L272) | Before teardown | Observability-only. Must not raise. Deregisters session-scoped plugins. | + +**Sync/async bridge**: These are sync methods. Use `_run_async_in_thread(invoke_hook(...))` from `mellea/helpers/__init__.py`. + +**Payload examples**: + +```python +# session_pre_init +SessionPreInitPayload( + backend_name=backend_name, + model_id=str(model_id), + model_options=model_options, + backend_kwargs=backend_kwargs, + context_type=type(ctx).__name__ if ctx else "SimpleContext", +) + +# session_post_init +SessionPostInitPayload(session=session) + +# session_cleanup +SessionCleanupPayload( + context=self.ctx, + interaction_count=len(self.ctx.as_list()), +) +``` + +### 5.2 Component Lifecycle + +**File**: `mellea/stdlib/functional.py` + +| Hook | Location | Trigger | Result Handling | +|------|----------|---------|-----------------| +| `component_pre_create` | `instruct()` before `Instruction(...)` (~L200), `chat()` before `Message(...)` (~L244), `query()` (~L321), `transform()` (~L363), and async variants | Before component constructor | Supports payload modification: updated `description`, `requirements`. Violation blocks creation. | +| `component_post_create` | Same functions, after Component constructor, before `act()`/`aact()` | Component created | Supports `component` replacement. Primarily observability. | +| `component_pre_execute` | `aact()`, at top before strategy branch (~L492) | Before generation begins | Supports `action`, `model_options`, `requirements`, `strategy` modification. Violation blocks execution. | +| `component_post_success` | `aact()`, after result in both branches (~L506, ~L534) | Successful execution | Supports `result` modification (output transformation). Primarily observability. | +| `component_post_error` | `aact()`, in new `try/except Exception` wrapping the body | Exception during execution | Observability-only. Always re-raises after hook. | + +**Key changes to `aact()`**: +- Add `time.monotonic()` at entry for latency measurement +- Wrap body (lines ~492–546) in `try/except Exception` +- `except` handler: fire `component_post_error` then `error_occurred`, then re-raise +- Insert `component_post_success` before each `return` path + +**Payload examples**: + +```python +# component_pre_create (Instruction case) +ComponentPreCreatePayload( + component_type="Instruction", + description=description, + images=images, + requirements=requirements, + icl_examples=icl_examples, + grounding_context=grounding_context, +) + +# component_pre_execute +ComponentPreExecutePayload( + component_type=type(action).__name__, + action=action, + context=context, + requirements=requirements or [], + model_options=model_options or {}, + format=format, + strategy_name=type(strategy).__name__ if strategy else None, + tool_calls_enabled=tool_calls, +) + +# component_post_success +ComponentPostSuccessPayload( + component_type=type(action).__name__, + action=action, + result=result, + context_before=context, + context_after=new_ctx, + generate_log=result._generate_log, + sampling_results=sampling_result if strategy else None, + latency_ms=int((time.monotonic() - t0) * 1000), +) +``` + +### 5.3 Generation Pipeline + +**Approach**: Add a non-abstract `generate_from_context_with_hooks()` method to the `Backend` ABC in `mellea/core/backend.py`. This wraps the abstract `generate_from_context()` with pre/post hooks, avoiding modifications to all 6 backend implementations (Ollama, OpenAI, HuggingFace, vLLM, Watsonx, LiteLLM). + +**New method on `Backend`** (`mellea/core/backend.py`): + +```python +async def generate_from_context_with_hooks( + self, + action: Component | CBlock, + ctx: Context, + *, + format=None, + model_options=None, + tool_calls=False, +) -> tuple[ModelOutputThunk, Context]: + """Wraps generate_from_context with generation_pre_call / generation_post_call hooks.""" + from mellea.plugins._manager import invoke_hook, has_plugins + from mellea.plugins._types import MelleaHookType + from mellea.plugins.hooks.generation import GenerationPreCallPayload, GenerationPostCallPayload + + if has_plugins(): + pre_payload = GenerationPreCallPayload( + action=action, context=ctx, + formatted_prompt="", # Populated after linearization; writable by plugins + model_options=model_options or {}, format=format, tools=None, + ) + result, pre_payload = await invoke_hook( + MelleaHookType.GENERATION_PRE_CALL, pre_payload, + backend=self, context=ctx, + ) + # pre_payload is the policy-filtered result — extract all writable fields + model_options = pre_payload.model_options + format = pre_payload.format + + t0 = time.monotonic() + out_result, new_ctx = await self.generate_from_context( + action, ctx, format=format, model_options=model_options, tool_calls=tool_calls, + ) + + if has_plugins(): + post_payload = GenerationPostCallPayload( + prompt=..., # Sent prompt (from linearization) + raw_response=..., # Full JSON response from provider + processed_output=..., # Extracted text from response + model_output=out_result, + token_usage=..., # From backend response metadata + latency_ms=int((time.monotonic() - t0) * 1000), + finish_reason=..., # From backend response metadata + ) + await invoke_hook( + MelleaHookType.GENERATION_POST_CALL, post_payload, + backend=self, context=new_ctx, + ) + + return out_result, new_ctx +``` + +**Call site changes** : +- `mellea/stdlib/functional.py:aact()` line 499: `backend.generate_from_context(...)` → `backend.generate_from_context_with_hooks(...)` +- `mellea/stdlib/sampling/base.py:sample()` line ~163: same substitution + +| Hook | Location | Trigger | Result Handling | +|------|----------|---------|-----------------| +| `generation_pre_call` | `Backend.generate_from_context_with_hooks()`, before delegate | Before LLM API call | Supports `model_options` modification. Violation blocks (e.g., token budget exceeded). | +| `generation_post_call` | Same method, after delegate returns | After LLM response | Supports output modification (redaction). Primarily observability. | +| `generation_stream_chunk` | **Deferred to Phase 7** — requires hooks in `ModelOutputThunk.astream()` streaming path | Per streaming chunk | Fire-and-forget to avoid slowing streaming. | + +### 5.4 Validation + +**File**: `mellea/stdlib/functional.py`, in `avalidate()` (~L699–753) + +| Hook | Location | Trigger | Result Handling | +|------|----------|---------|-----------------| +| `validation_pre_check` | After `reqs` prepared (~L713), before validation loop | Before validation | Supports `requirements` list modification (inject/filter). | +| `validation_post_check` | After all validations, before `return rvs` (~L753) | After validation | Supports `results` override. Primarily observability. | + +**Payload examples**: + +```python +# validation_pre_check +ValidationPreCheckPayload( + requirements=reqs, + target=output, + context=context, + model_options=model_options or {}, +) + +# validation_post_check +ValidationPostCheckPayload( + requirements=reqs, + results=rvs, + all_passed=all(bool(r) for r in rvs), + passed_count=sum(1 for r in rvs if bool(r)), + failed_count=sum(1 for r in rvs if not bool(r)), +) +``` + +### 5.5 Sampling Pipeline + +**File**: `mellea/stdlib/sampling/base.py`, in `BaseSamplingStrategy.sample()` (~L94–256) + +| Hook | Location | Trigger | Result Handling | +|------|----------|---------|-----------------| +| `sampling_loop_start` | Before `for` loop (~L157) | Loop begins | Supports `loop_budget` modification. | +| `sampling_iteration` | Inside loop, after validation (~L192) | Each iteration | Observability. Violation can force early termination. | +| `sampling_repair` | After `self.repair()` call (~L224) | Repair invoked | Supports `repair_action`/`repair_context` modification. | +| `sampling_loop_end` | Before return in success (~L209) and failure (~L249) paths | Loop ends | Observability. Supports `final_result` override. | + +**Additional change**: Add `_get_repair_type() -> str` method to each sampling strategy subclass: + +| Strategy Class | Repair Type | +|---|---| +| `RejectionSamplingStrategy` | `"identity"` | +| `RepairTemplateStrategy` | `"template_repair"` | +| `MultiTurnStrategy` | `"multi_turn_message"` | +| `SOFAISamplingStrategy` | `"sofai_feedback"` | + +**Payload examples**: + +```python +# sampling_loop_start +SamplingLoopStartPayload( + strategy_name=type(self).__name__, + action=action, + context=context, + requirements=reqs, + loop_budget=self.loop_budget, +) + +# sampling_repair +SamplingRepairPayload( + repair_type=self._get_repair_type(), + failed_action=sampled_actions[-1], + failed_result=sampled_results[-1], + failed_validations=sampled_scores[-1], + repair_action=next_action, + repair_context=next_context, + repair_iteration=loop_count, +) +``` + +### 5.6 Tool Execution + +**File**: `mellea/stdlib/functional.py`, in the `_call_tools()` helper (~L904) + +| Hook | Location | Trigger | Result Handling | +|------|----------|---------|-----------------| +| `tool_pre_invoke` | Before `tool.call_func()` (~L917) | Before tool call | Supports `tool_args` modification. Violation blocks tool call. | +| `tool_post_invoke` | After `tool.call_func()` (~L919) | After tool call | Supports `tool_output` modification. Primarily observability. | + +### 5.7 Backend Adapter Operations + +**Files**: `mellea/backends/openai.py` (`load_adapter` ~L907, `unload_adapter` ~L944), `mellea/backends/huggingface.py` (`load_adapter` ~L1192, `unload_adapter` ~L1224) + +| Hook | Location | Trigger | Result Handling | +|------|----------|---------|-----------------| +| `adapter_pre_load` | Start of `load_adapter()` | Before adapter load | Violation prevents loading. | +| `adapter_post_load` | End of `load_adapter()` | After adapter loaded | Observability. | +| `adapter_pre_unload` | Start of `unload_adapter()` | Before adapter unload | Violation prevents unloading. | +| `adapter_post_unload` | End of `unload_adapter()` | After adapter unloaded | Observability. | + +**Sync/async bridge**: Adapter methods are synchronous. Use `_run_async_in_thread(invoke_hook(...))`. + +### 5.8 Context Operations + +**Files**: `mellea/stdlib/context.py` (`ChatContext.add()` ~L17, `SimpleContext.add()` ~L31) + +| Hook | Location | Trigger | Result Handling | +|------|----------|---------|-----------------| +| `context_update` | After `from_previous()` in `add()` | Context appended | Observability-only (context is immutable). | +| `context_prune` | `ChatContext.view_for_generation()` when window truncates | Context windowed | Observability-only. | + +**Performance note**: `context_update` fires on every context addition, which is frequent. The `has_hooks_for()` guard is critical — when no plugin subscribes to `context_update`, the overhead is a single boolean check. + +### 5.9 Error Handling + +**File**: `mellea/stdlib/functional.py` (utility function callable from any error path) + +| Hook | Location | Trigger | Result Handling | +|------|----------|---------|-----------------| +| `error_occurred` | `aact()` except block + utility `fire_error_hook()` | Unrecoverable error | Observability-only. Must never raise from own execution. | + +**Fires for**: `ComponentParseError`, backend communication errors, assertion violations, unhandled exceptions during component execution, validation, or tool invocation. + +**Does NOT fire for**: Validation failures within sampling loops (handled by `sampling_iteration`/`sampling_repair`), controlled `PluginViolation` blocks (those are policy decisions, not errors). + +**Utility function**: + +```python +async def fire_error_hook( + error: Exception, + location: str, + *, + session=None, backend=None, context=None, action=None, +) -> None: + """Fire the error_occurred hook. Never raises.""" + try: + payload = ErrorOccurredPayload( + error=error, + error_type=type(error).__name__, + error_location=location, + stack_trace=traceback.format_exc(), + recoverable=False, + action=action, + ) + await invoke_hook( + MelleaHookType.ERROR_OCCURRED, payload, + session=session, backend=backend, context=context, + violations_as_exceptions=False, + ) + except Exception: + pass # Never propagate errors from error hook +``` + + +## 8. Modifications Summary + +| File | Changes | +|------|---------| +| `mellea/stdlib/functional.py` | ~12 hook insertions (component lifecycle, validation, tools, error) | +| `mellea/stdlib/session.py` | 4 session hooks + `plugins` param on `start_session()` + session-scoped plugin registration/deregistration | +| `mellea/stdlib/sampling/base.py` | 4 sampling hooks + `generate_from_context` → `generate_from_context_with_hooks` | +| `mellea/core/backend.py` | Add `generate_from_context_with_hooks()` wrapper method to `Backend` ABC | +| `mellea/stdlib/context.py` | 2 context operation hooks in `ChatContext.add()`, `SimpleContext.add()` | +| `mellea/backends/openai.py` | 4 adapter hooks in `load_adapter()` / `unload_adapter()` | +| `mellea/backends/huggingface.py` | 4 adapter hooks in `load_adapter()` / `unload_adapter()` | +| `pyproject.toml` | Add `plugins` optional dependency + `plugins` test marker | +| `mellea/plugins/__init__.py` (new) | Public API: `hook`, `plugin`, `block`, `PluginSet`, `register`, `MelleaPlugin`, `plugin_scope` | +| `mellea/plugins/_decorators.py` (new) | `@hook` and `@plugin` decorator implementations, `HookMeta`, `PluginMeta`; `@plugin` injects `__enter__`/`__exit__`/`__aenter__`/`__aexit__` into decorated classes | +| `mellea/plugins/_pluginset.py` (new) | `PluginSet` class with `flatten()` for recursive expansion; context manager protocol (`__enter__`/`__exit__`/`__aenter__`/`__aexit__`) for with-block scoping | +| `mellea/plugins/_registry.py` (new) | `register()`, `block()`, `_FunctionHookAdapter`, `_register_single()`; `_PluginScope` class and `plugin_scope()` factory for with-block scoping | +| `mellea/plugins/_manager.py` (new) | Singleton wrapper, `invoke_hook()` with session-tag filtering, `_ensure_plugin_manager()`; `deregister_session_plugins()` for scope cleanup (used by both session and with-block exit) | +| `mellea/plugins/_base.py` (new) | `MelleaBasePayload` (frozen), `MelleaPlugin` base class with context manager protocol for with-block scoping | +| `mellea/plugins/_types.py` (new) | `MelleaHookType` enum, `_register_mellea_hooks()` | +| `mellea/plugins/_policies.py` (new) | `MELLEA_HOOK_PAYLOAD_POLICIES` table, injected into `PluginManager` at init | +| `mellea/plugins/_context.py` (new) | `build_global_context()` factory | +| `mellea/plugins/hooks/` (new) | Hook payload dataclasses (session, component, generation, etc.) | +| `test/plugins/` (new) | Tests for plugins subpackage | + +> Note: + update docs and add examples.