From 3fd88b17faa76894755f82e021d42f65b177d3d1 Mon Sep 17 00:00:00 2001 From: Frederico Araujo Date: Tue, 3 Feb 2026 21:44:51 -0500 Subject: [PATCH 01/10] docs: add initial specification for extensibility hooks in Mellea Signed-off-by: Frederico Araujo --- docs/dev/hook_system.md | 1157 +++++++++++++++++++++++++++++++++++++++ 1 file changed, 1157 insertions(+) create mode 100644 docs/dev/hook_system.md diff --git a/docs/dev/hook_system.md b/docs/dev/hook_system.md new file mode 100644 index 000000000..037f6196b --- /dev/null +++ b/docs/dev/hook_system.md @@ -0,0 +1,1157 @@ +# Mellea Plugin Hook Points Design Document + +This document defines the hook system for Mellea, enabling plugins to register and respond to events throughout the framework's execution lifecycle. Hooks provide extensibility points for policy enforcement, data transformation, observability, and custom behavior injection. + + +## 1. Overview + +### Design Principles + +1. **Consistent Interface**: All hooks follow the same async pattern with payload and context parameters +2. **Composable**: Multiple plugins can register for the same hook, executing in priority order +3. **Fail-safe**: Hook failures can be handled gracefully without breaking core execution +4. **Minimal Intrusion**: Plugins are opt-in; default Mellea behavior remains unchanged without plugins + +### Hook Method Signature + +All hooks follow this consistent async pattern: + +```python +async def hook_name( + self, + payload: PluginPayload, + context: PluginContext +) -> PluginResult +``` + +- **`payload`**: Mutable, strongly-typed data specific to the hook point +- **`context`**: Read-only shared context with session metadata and utilities +- **Returns**: A result object with continuation flag, modified payload, and violation/explanation + +### Plugin Framework + +To enable this extension system, we plan to leverage a lightweight, standalone plugin framework that: +- Installs as a Python package dependency with minimum footprint +- Exposes APIs to define hook invocation points, and base data objects for plugin payload, context, and result. +- Exposes a base calss and decorator to implement concrete plugins and register hook functions +- Implements a plugin manager that loads, registers, and governs the execution of plugins + +## 2. Common Payload Fields + +All hook payloads inherit these base fields: + +```python +class BasePayload(PluginPayload): + session_id: str # Unique session identifier + request_id: str # Unique ID for this execution chain + timestamp: datetime # When the event fired + hook: str # Name of the hook (e.g., "generation_pre_call") + user_metadata: dict[str, Any] # Custom metadata carried by user code +``` + +## 3. Hook Summary Table + +| Hook Point | Category | Description | +|------------|----------|-------------| +| `session_pre_init` | Session | Before session initialization | +| `session_post_init` | Session | After session is fully initialized | +| `session_reset` | Session | When session context is reset | +| `session_cleanup` | Session | Before session cleanup/teardown | +| `instruction_pre_create` | Instruction | Before Instruction component creation | +| `instruction_post_create` | Instruction | After Instruction created, before execution | +| `action_pre_execute` | Action | Before any action execution | +| `action_post_success` | Action | After successful action completion | +| `action_post_error` | Action | After action fails with error | +| `generation_pre_call` | Generation | Before LLM backend call | +| `generation_post_call` | Generation | After LLM response received | +| `generation_stream_chunk` | Generation | For each streaming chunk | +| `validation_pre_check` | Validation | Before requirement validation | +| `validation_post_check` | Validation | After validation completes | +| `sampling_loop_start` | Sampling | When sampling strategy begins | +| `sampling_iteration` | Sampling | After each sampling attempt | +| `sampling_repair` | Sampling | When repair is invoked | +| `sampling_loop_end` | Sampling | When sampling completes | +| `tool_pre_invoke` | Tool | Before tool/function invocation | +| `tool_post_invoke` | Tool | After tool execution | +| `slot_pre_call` | Generative Slot | Before generative slot invocation | +| `slot_post_call` | Generative Slot | After generative slot returns | +| `context_update` | Context | When context changes | +| `context_prune` | Context | When context is trimmed | +| `error_occurred` | Error | When an error occurs | + +## 4. Hook Definitions + +### A. Session Lifecycle Hooks + +Hooks that manage session boundaries, useful for initialization, state setup, and resource cleanup. + +#### `session_pre_init` + +- **Trigger**: Called immediately when `mellea.start_session()` is invoked, before backend initialization. +- **Use Cases**: + - Loading user-specific policies + - Validating backend/model combinations + - Enforcing model usage policies + - Routing to alternative backends +- **Payload**: + ```python + class SessionPreInitPayload(BasePayload): + backend_name: str # Requested backend identifier + model_id: str | ModelIdentifier # Target model + model_options: dict | None # Generation parameters + backend_kwargs: dict # Additional backend configuration + context_type: type[Context] # Context class to use + ``` +- **Context**: + - `environment`: dict - Environment variables snapshot + - `cwd`: str - Current working directory + + +#### `session_post_init` + +- **Trigger**: Called after session is fully initialized, before any operations. +- **Use Cases**: + - Initializing plugin-specific session state + - Setting up telemetry/observability + - Registering session-scoped resources + - Remote logging setup +- **Payload**: + ```python + class SessionPostInitPayload(BasePayload): + backend: Backend # Initialized backend instance + context: Context # Initial context + logger: FancyLogger # Session logger + ``` +- **Context**: + - `backend_info`: dict - Backend capabilities and metadata + - `model_info`: dict - Model details (context window, etc.) + + +#### `session_reset` + +- **Trigger**: Called when `session.reset()` is invoked to clear context. +- **Use Cases**: + - Resetting plugin state + - Logging context transitions + - Preserving audit trails before reset +- **Payload**: + ```python + class SessionResetPayload(BasePayload): + previous_context: Context # Context before reset + new_context: Context # Fresh context after reset + ``` +- **Context**: + - `session`: MelleaSession + - `reset_reason`: str | None - Optional reason for reset + + +#### `session_cleanup` + +- **Trigger**: Called when `session.close()`, `cleanup()`, or context manager exit occurs. +- **Use Cases**: + - Flushing telemetry buffers + - Persisting audit trails + - Aggregating session metrics + - Cleaning up temporary resources +- **Payload**: + ```python + class SessionCleanupPayload(BasePayload): + context: Context # Final context state + total_generations: int # Count of generations performed + total_tokens_used: int | None # Aggregate token usage + interaction_count: int # Total number of turns + ``` +- **Context**: + - `generate_logs`: list[GenerateLog] - All logs from session + - `duration_ms`: int - Session duration + - `session`: MelleaSession - Final session state + + +### B. Instruction & Action Hooks + +Hooks around the high-level primitives like `instruct()`, `chat()`, and action execution. + + +#### `instruction_pre_create` + +- **Trigger**: Called when `instruct()`, `chat()`, or a generative slot is invoked, before the prompt is constructed. +- **Use Cases**: + - PII redaction on user input + - Prompt injection detection + - Input validation and sanitization + - Injecting mandatory requirements + - Enforcing content policies +- **Payload**: + ```python + class InstructionPreCreatePayload(BasePayload): + description: str # Main instruction text + images: list[ImageBlock] | None # Attached images + requirements: list[Requirement | str] # Validation requirements + icl_examples: list[str | CBlock] # In-context learning examples + grounding_context: dict[str, str] # Grounding variables + user_variables: dict[str, str] | None # Template variables + prefix: str | CBlock | None # Output prefix + template_id: str | None # Identifier of prompt template + ``` +- **Context**: + - `session`: MelleaSession + - `parent_context`: Context - Context instruction will be added to + - `history_snapshot`: ContextSnapshot - Conversation history + + +#### `instruction_post_create` + +- **Trigger**: After Instruction component is created and formatted, before backend call. +- **Use Cases**: + - Appending system prompts + - Context stuffing (RAG injection) + - Logging instruction patterns + - Validating final instruction structure +- **Payload**: + ```python + class InstructionPostCreatePayload(BasePayload): + instruction: Instruction # Created instruction component + template_repr: TemplateRepresentation # Formatted representation + component: Component # The structured prompt object + ``` +- **Context**: + - `session`: MelleaSession + - `parent_context`: Context + + +#### `action_pre_execute` + +- **Trigger**: Before any action (Instruction, Query, Transform) is executed via `act()`. +- **Use Cases**: + - Policy enforcement on generation requests + - Injecting/modifying model options + - Routing to different strategies + - Authorization checks + - Logging action patterns +- **Payload**: + ```python + class ActionPreExecutePayload(BasePayload): + action: Component | CBlock # The action to execute + context: Context # Current context + context_view: list[Component | CBlock] | None # Linearized context + requirements: list[Requirement] # Attached requirements + model_options: dict # Generation parameters + format: type | None # Structured output format + strategy: SamplingStrategy | None # Sampling strategy + tool_calls_enabled: bool # Whether tools are available + ``` +- **Context**: + - `session`: MelleaSession + - `backend`: Backend + - `action_type`: str - Type of action being executed + + +#### `action_post_success` + +- **Trigger**: After action execution completes successfully. +- **Use Cases**: + - Logging generation results + - Output validation (hallucination check) + - PII scrubbing from response + - Applying output transformations + - Audit logging + - Collecting metrics +- **Payload**: + ```python + class ActionPostSuccessPayload(BasePayload): + action: Component | CBlock # Executed action + result: ModelOutputThunk # Generation result + context_before: Context # Context before action + context_after: Context # Context after action + generate_log: GenerateLog # Detailed execution log + sampling_results: list[SamplingResult] | None # If sampling was used + latency_ms: int # Execution time + ``` +- **Context**: + - `session`: MelleaSession + - `token_usage`: dict | None + - `original_input`: dict - Input that triggered generation + + +#### `action_post_error` + +- **Trigger**: When action execution fails with an exception. +- **Use Cases**: + - Error logging and alerting + - Custom error recovery + - Retry logic + - Graceful degradation +- **Payload**: + ```python + class ActionPostErrorPayload(BasePayload): + action: Component | CBlock # Action that failed + error: Exception # The exception raised + error_type: str # Exception class name + stack_trace: str # Full stack trace + context: Context # Context at time of error + model_options: dict # Options used + ``` +- **Context**: + - `session`: MelleaSession + - `recoverable`: bool - Can execution continue + + +### C. Generation Hooks (Backend Execution) + +Low-level hooks between the component abstraction and raw LLM API calls. + + +#### `generation_pre_call` + +- **Trigger**: Just before the backend transmits data to the LLM API. +- **Use Cases**: + - Tool selection filtering and requirements + - Prompt injection detection + - Content filtering + - Token budget enforcement + - Cost estimation + - Prompt caching/deduplication + - Rate limiting + - Last-mile formatting +- **Payload**: + ```python + class GenerationPreCallPayload(BasePayload): + action: Component | CBlock # Source action + context: Context # Current context + linearized_context: list[Component | CBlock] # Context as list + formatted_prompt: str | list[dict] # Final prompt to send + model_options: dict[str, Any] # Generation parameters + tools: dict[str, Callable] | None # Available tools + format: type | None # Structured output format + estimated_tokens: int | None # Token estimate + ``` +- **Context**: + - `backend_name`: str + - `model_id`: str + - `provider`: str - Provider name (e.g., "ibm/granite") + + +#### `generation_post_call` + +- **Trigger**: Immediately after receiving the raw response from the LLM API, before parsing. +- **Use Cases**: + - Output filtering/sanitization + - PII detection and redaction + - Response caching + - Quality metrics collection + - Hallucination detection + - Raw trace logging + - Error interception (API limits/retries) +- **Payload**: + ```python + class GenerationPostCallPayload(BasePayload): + prompt: str | list[dict] # Sent prompt + raw_response: dict # Full JSON response from provider + processed_output: str # Processed output text + model_output: ModelOutputThunk # Output thunk + token_usage: dict | None # Token counts + latency_ms: int # Generation time + finish_reason: str # Why generation stopped + ``` +- **Context**: + - `backend_name`: str + - `model_id`: str + - `status_code`: int | None - HTTP status from provider + - `stream_chunks`: int | None - Number of chunks if streaming + + +#### `generation_stream_chunk` + +- **Trigger**: For each streaming chunk received from the LLM. +- **Use Cases**: + - Real-time content filtering + - Progressive output display + - Early termination on policy violation + - Streaming analytics +- **Payload**: + ```python + class GenerationStreamChunkPayload(BasePayload): + chunk: str # Current chunk text + accumulated: str # All text so far + chunk_index: int # Chunk sequence number + is_final: bool # Is this the last chunk + ``` +- **Context**: + - `thunk_id`: str + - `backend_name`: str + - `model_id`: str + + +### D. Validation Hooks + +Hooks around requirement verification and output validation. + + +#### `validation_pre_check` + +- **Trigger**: Before running validation/requirements check. +- **Use Cases**: + - Injecting additional requirements + - Filtering requirements based on context + - Overriding validation strategy + - Custom validation logic +- **Payload**: + ```python + class ValidationPreCheckPayload(BasePayload): + requirements: list[Requirement] # Requirements to check + target: CBlock | None # Target to validate + context: Context # Current context + model_options: dict # Options for LLM-as-judge + ``` +- **Context**: + - `session`: MelleaSession + - `validation_type`: str - "python" | "llm_as_judge" + + +#### `validation_post_check` + +- **Trigger**: After all validations complete. +- **Use Cases**: + - Logging validation outcomes + - Triggering alerts on failures + - Collecting requirement effectiveness metrics + - Overriding validation results + - Monitoring sampling attempts +- **Payload**: + ```python + class ValidationPostCheckPayload(BasePayload): + requirements: list[Requirement] + results: list[ValidationResult] + all_passed: bool + passed_count: int + failed_count: int + generate_logs: list[GenerateLog | None] # Logs from LLM-as-judge + ``` +- **Context**: + - `session`: MelleaSession + - `validation_duration_ms`: int + + +### E. Sampling & Repair Hooks + +Hooks around sampling strategies and failure recovery. + + +#### `sampling_loop_start` + +- **Trigger**: When a sampling strategy begins execution. +- **Use Cases**: + - Logging sampling attempts + - Adjusting loop budget dynamically + - Initializing sampling-specific state +- **Payload**: + ```python + class SamplingLoopStartPayload(BasePayload): + strategy_name: str # Strategy class name + action: Component # Initial action + context: Context # Initial context + requirements: list[Requirement] # All requirements + loop_budget: int # Maximum iterations + ``` +- **Context**: + - `session`: MelleaSession + - `strategy_config`: dict + + +#### `sampling_iteration` + +- **Trigger**: After each sampling attempt, including validation results. +- **Use Cases**: + - Iteration-level metrics + - Early termination decisions + - Debug sampling behavior + - Adaptive strategy adjustment +- **Payload**: + ```python + class SamplingIterationPayload(BasePayload): + iteration: int # Current iteration number + action: Component # Action used this iteration + result: ModelOutputThunk # Generation result + validation_results: list[tuple[Requirement, ValidationResult]] + all_valid: bool # Did all requirements pass + valid_count: int + total_count: int + ``` +- **Context**: + - `strategy_name`: str + - `remaining_budget`: int + - `elapsed_ms`: int + + +#### `sampling_repair` + +- **Trigger**: When repair strategy is invoked after validation failure. +- **Use Cases**: + - Logging repair patterns + - Injecting custom repair strategies + - Analyzing failure modes + - Adjusting repair approach +- **Payload**: + ```python + class SamplingRepairPayload(BasePayload): + failed_action: Component # Action that failed + failed_result: ModelOutputThunk # Failed output + failed_validations: list[tuple[Requirement, ValidationResult]] + old_context: Context # Context without failure + new_context: Context # Context with failure + repair_action: Component # New action for retry + repair_context: Context # Context for retry + repair_iteration: int # Which repair attempt + ``` +- **Context**: + - `strategy_name`: str + - `past_failures`: list[str] + + +#### `sampling_loop_end` + +- **Trigger**: When sampling completes (success or failure). +- **Use Cases**: + - Sampling effectiveness metrics + - Failure analysis + - Cost tracking + - Selecting best failed attempt +- **Payload**: + ```python + class SamplingLoopEndPayload(BasePayload): + success: bool # Did sampling succeed + iterations_used: int # Total iterations performed + final_result: ModelOutputThunk | None # Best result + final_action: Component | None + final_context: Context | None + failure_reason: str | None # If failed, why + all_results: list[ModelOutputThunk] + all_validations: list[list[tuple[Requirement, ValidationResult]]] + ``` +- **Context**: + - `strategy_name`: str + - `total_duration_ms`: int + - `tokens_used`: int | None + + +### F. Tool Calling Hooks + +Hooks around tool/function execution. + + +#### `tool_pre_invoke` + +- **Trigger**: Before invoking a tool/function from LLM output. +- **Use Cases**: + - Tool authorization + - Argument validation/sanitization + - Tool routing/redirection + - Rate limiting per tool +- **Payload**: + ```python + class ToolPreInvokePayload(BasePayload): + tool_name: str # Name of tool to call + tool_args: dict[str, Any] # Arguments to pass + tool_callable: Callable # The actual function + model_tool_call: ModelToolCall # Raw model output + ``` +- **Context**: + - `session`: MelleaSession + - `available_tools`: list[str] + - `invocation_source`: str - "transform" | "action" | etc. + + +#### `tool_post_invoke` + +- **Trigger**: After tool execution completes. +- **Use Cases**: + - Output transformation + - Error handling/recovery + - Tool usage metrics + - Result caching +- **Payload**: + ```python + class ToolPostInvokePayload(BasePayload): + tool_name: str + tool_args: dict[str, Any] + tool_output: Any # Raw tool output + tool_message: ToolMessage # Formatted message + execution_time_ms: int + success: bool # Did tool execute without error + error: Exception | None # Error if any + ``` +- **Context**: + - `session`: MelleaSession + - `invocation_source`: str + + +### G. Generative Slot Hooks + +Hooks specific to `@generative` decorated functions. + + +#### `slot_pre_call` + +- **Trigger**: When a generative slot function is about to be invoked. +- **Use Cases**: + - Slot-level authorization + - Argument validation + - Profiling setup + - Logging slot invocations +- **Payload**: + ```python + class SlotPreCallPayload(BasePayload): + slot_name: str # Name of the generative slot function + slot_signature: str # Typed signature + args: dict[str, Any] # Arguments passed to the slot + kwargs: dict[str, Any] # Keyword arguments + docstring: str | None # Docstring / prompt template + ``` +- **Context**: + - `session`: MelleaSession + - `slot_module`: str - Module containing the slot + + +#### `slot_post_call` + +- **Trigger**: Once the slot returns (or fails). +- **Use Cases**: + - Measuring slot latency + - Recording success/failure + - Profiling statistics per slot + - Output transformation +- **Payload**: + ```python + class SlotPostCallPayload(BasePayload): + slot_name: str + args: dict[str, Any] + kwargs: dict[str, Any] + result: Any # The returned value + duration_ms: int + success: bool + error: Exception | None + ``` +- **Context**: + - `session`: MelleaSession + - `slot_module`: str + + +### H. Context Manipulation Hooks + +Hooks around context changes and management. + + +#### `context_update` + +- **Trigger**: When data is added to context or context changes. +- **Use Cases**: + - Context audit trail + - Memory management policies + - Sensitive data detection + - Token usage monitoring +- **Payload**: + ```python + class ContextUpdatePayload(BasePayload): + previous_context: Context # Context before change + new_data: Component | CBlock # Data being added + resulting_context: Context # Context after change + context_type: str # "simple" | "chat" + change_type: str # "append" | "reset" + ``` +- **Context**: + - `session`: MelleaSession + - `history_length`: int + + +#### `context_prune` + +- **Trigger**: When context trimming or pruning logic runs. +- **Use Cases**: + - Token budget management + - Recording pruning events + - Custom pruning strategies + - Archiving pruned content +- **Payload**: + ```python + class ContextPrunePayload(BasePayload): + context_before: Context # Context before pruning + context_after: Context # Context after pruning + pruned_items: list[Component | CBlock] # Items removed + reason: str # Why pruning occurred + tokens_freed: int | None # Token estimate freed + ``` +- **Context**: + - `session`: MelleaSession + - `token_limit`: int | None + + +### I. Error Handling Hooks + +Hooks for error conditions and recovery. + + +#### `error_occurred` + +- **Trigger**: When an error occurs during any operation. +- **Use Cases**: + - Error logging/alerting + - Custom error recovery + - Error metrics + - Graceful degradation + - Notification systems +- **Payload**: + ```python + class ErrorOccurredPayload(BasePayload): + error: Exception # The exception + error_type: str # Exception class name + error_location: str # Where error occurred + recoverable: bool # Can execution continue + context: Context | None # Context at time of error + action: Component | None # Action being performed + stack_trace: str # Full stack trace + ``` +- **Context**: + - `session`: MelleaSession | None + - `operation`: str - What operation was being performed + + +## 5. Plugin Context Object + +The `PluginContext` passed to all hooks provides shared state and utilities: + +```python +# Session Information +session_id: str +session: MelleaSession | None + +# Backend Information +backend_name: str +model_id: str +backend_capabilities: dict + +# Environment +environment: dict[str, str] +cwd: str + +# Plugin State +shared_state: dict[str, Any] # Shared across plugins + +# Utilities +logger: Logger +metrics: MetricsCollector + +# Request Metadata +request_id: str # Unique ID for this execution chain +parent_request_id: str | None # For nested calls +timestamp: datetime + +# User Information +user_id: str | None # User reference if available +user_metadata: dict[str, Any] # User-specific data +``` + +> Note: this is a suggestion. Requires discussion. + +### Context Snapshot + +When conversation history is relevant, the plugin context object may include a `ContextSnapshot`: + +```python +@dataclass +class ContextSnapshot: + history_length: int # Number of turns + last_turn: dict | None # Last user/assistant exchange + token_estimate: int | None # Estimated token count +``` + +## 6. Hook Results + +Hooks can return different result types to control execution: + +### Modify Payload + +```python +return PluginResult( + continue_processing=True + modified_payload=modified_payload, +) +``` + +### Block Execution + +```python +violation = PluginViolation( + reason="Policy violation", + description="Detailed explanation", + code="POLICY_001", + details={"field": "value"}, + severity="error" # "error" | "warning" +) + +return PluginResult( + continue_processing=False, + violation=violation +) +``` + +## 7. Registration & Configuration + +### Plugin Registration + +Plugins register programmatically or via YAML configuration: + +```yaml +plugins: + - name: content-policy + kind: mellea.plugins.ContentPolicyPlugin + hooks: + - instruction_pre_create + - generation_post_call + mode: enforce + priority: 10 + config: + blocked_terms: ["term1", "term2"] + + - name: telemetry + kind: mellea.plugins.TelemetryPlugin + hooks: + - action_post_success + - validation_post_check + - sampling_loop_end + mode: permissive + priority: 100 + config: + endpoint: "https://telemetry.example.com" +``` + +### Execution Modes + +- **`enforce`**: Block execution on violation +- **`enforce_ignore_error`**: Block on violation, but tolerate plugin errors +- **`permissive`**: Log violations without blocking +- **`disabled`**: Skip hook execution + +### Priority + +- Lower numbers execute first +- Hooks with same priority may execute in parallel +- Default priority: 50 + +### Convention-Based Registration + +Plugins can use method naming conventions: + +```python +class MyPlugin(MelleaPlugin): + async def on_generation_pre_call(self, payload, context): + # Automatically registered for generation_pre_call + ... + + async def on_validation_post_check(self, payload, context): + # Automatically registered for validation_post_check + ... +``` + +### Programmatic Registration + +```python +class PIIRedactionPlugin(Plugin): + def name(self): + return "PII_Redactor" + + def register(self, hooks): + hooks.register("instruction_pre_create", self.redact_input) + hooks.register("generation_post_call", self.redact_output) + + async def redact_input(self, payload, context): + # Redact PII from input + ... +``` + +### Session-Level Configuration + +```python +m = mellea.start_session( + ..., + plugin_manager=pm, + hooks_enabled=["instruction_pre_create", "generation_post_call"] +) +``` + +## 8. Example Implementations + +### Content Policy Plugin + +```python +class ContentPolicyPlugin(MelleaPlugin): + async def on_instruction_pre_create( + self, + payload: InstructionPreCreatePayload, + context: PluginContext + ) -> PluginResult | None: + blocked_terms = self.config.get("blocked_terms", []) + + for term in blocked_terms: + if term.lower() in payload.description.lower(): + return PluginResult( + continue_processing=False, + violation=PluginViolation( + reason="Blocked content detected", + description=f"Instruction contains blocked term: {term}", + code="CONTENT_POLICY_001" + ) + + async def on_generation_post_call( + self, + payload: GenerationPostCallPayload, + context: PluginContext + ) -> PluginResult | None: + # Redact sensitive patterns from output + redacted = self._redact_pii(payload.processed_output) + + if redacted != payload.processed_output: + payload.processed_output = redacted + return PluginResult(modified_payload=payload) +``` + +### Audit Logging Plugin + +```python +class AuditLoggingPlugin(MelleaPlugin): + async def on_action_post_success( + self, + payload: ActionPostSuccessPayload, + context: PluginContext + ) -> PluginResult | None: + self._log_audit_event({ + "event": "generation_success", + "session_id": payload.session_id, + "action_type": type(payload.action).__name__, + "latency_ms": payload.latency_ms, + "token_usage": context.get("token_usage"), + "timestamp": payload.timestamp.isoformat() + }) + + + async def on_action_post_error( + self, + payload: ActionPostErrorPayload, + context: PluginContext + ) -> PluginResult | None: + self._log_audit_event({ + "event": "generation_error", + "session_id": payload.session_id, + "error_type": payload.error_type, + "stack_trace": payload.stack_trace, + "timestamp": payload.timestamp.isoformat() + }) +``` + +### Token Budget Plugin + +```python +class TokenBudgetPlugin(MelleaPlugin): + async def on_generation_pre_call( + self, + payload: GenerationPreCallPayload, + context: PluginContext + ) -> PluginResult | None: + budget = self.config.get("max_tokens_per_request", 4000) + estimated = payload.estimated_tokens or 0 + + if estimated > budget: + return PluginResult( + continue_processing=False, + violation=PluginViolation( + reason="Token budget exceeded", + description=f"Estimated {estimated} tokens exceeds budget of {budget}", + code="TOKEN_BUDGET_001", + details={"estimated": estimated, "budget": budget} + ) +``` + +### Generative Slot Profiler + +```python +class SlotProfilerPlugin(MelleaPlugin): + def __init__(self): + self._stats = defaultdict(lambda: {"calls": 0, "total_ms": 0, "errors": 0}) + + async def on_slot_post_call( + self, + payload: SlotPostCallPayload, + context: PluginContext + ) -> PluginResult | None: + stats = self._stats[payload.slot_name] + stats["calls"] += 1 + stats["total_ms"] += payload.duration_ms + if not payload.success: + stats["errors"] += 1 + + context.emit_metric( + "slot_latency_ms", + payload.duration_ms, + tags={"slot": payload.slot_name, "success": payload.success} + ) +``` + + +## 9. Hook Execution Flow + +### Simplified Main Flow + +```mermaid +flowchart LR + A([Start]) --> B[Session Init] + B --> C[Instruction] + C --> D[Action] + D --> E[Generation] + E --> F[Validation] + F --> G{OK?} + G --> |Yes| H[Success] + G --> |No| I[Error/Retry] + H --> J[Cleanup] + I --> J + J --> K([End]) + + style A fill:#e1f5fe + style K fill:#e1f5fe + style G fill:#fce4ec +``` + +### Detailed Flows + +```mermaid +flowchart TD + A([User Request]) --> B + + subgraph Session["Session Lifecycle"] + B[session_pre_init] + B --> C[session_post_init] + end + + C --> D + + subgraph Slot["Generative Slot (optional)"] + S1[slot_pre_call] + S1 --> S2[slot_post_call] + end + + subgraph Instruction["Instruction Creation"] + D[instruction_pre_create] + D --> E[instruction_post_create] + end + + E --> F + + subgraph Action["Action Execution"] + F[action_pre_execute] + end + + F --> G + + subgraph Sampling["Sampling Loop (if strategy)"] + SL1[sampling_loop_start] + SL1 --> SL2[sampling_iteration] + SL2 --> SL3{Valid?} + SL3 --> |No| SL4[sampling_repair] + SL4 --> SL2 + SL3 --> |Yes| SL5[sampling_loop_end] + end + + subgraph Generation["LLM Generation"] + G[generation_pre_call] + G --> H{{LLM Call}} + H --> H2[generation_stream_chunk] + H2 --> I[generation_post_call] + end + + I --> J + + subgraph Tools["Tool Calling (if tools)"] + T1[tool_pre_invoke] + T1 --> T2{{Tool Execution}} + T2 --> T3[tool_post_invoke] + end + + subgraph Validation["Validation"] + J[validation_pre_check] + J --> K{{Requirements Check}} + K --> L[validation_post_check] + end + + L --> M{Success?} + + M --> |Yes| N[action_post_success] + M --> |No| O[action_post_error] + + N --> P[context_update] + O --> Q[error_occurred] + + subgraph Context["Context Operations"] + P[context_update] + P -.-> P2[context_prune] + end + + subgraph Cleanup["Session End"] + R[session_reset] + R2[session_cleanup] + end + + P --> R2 + Q --> R2 + R2 --> Z([End]) + + %% Styling + style A fill:#e1f5fe + style Z fill:#e1f5fe + style H fill:#fff3e0 + style K fill:#fff3e0 + style T2 fill:#fff3e0 + style M fill:#fce4ec + style SL3 fill:#fce4ec +``` + +## 10. Error Handling, Security & Isolation + +### Error Handling + +- **Isolation**: Plugin exceptions should not crash Mellea sessions; wrap each handler in try/except +- **Logging**: All plugin errors are logged with full context +- **Timeouts**: Support configurable timeouts for plugin execution +- **Circuit Breaker**: Disable failing plugins after repeated errors + +### Security Considerations + +- **Data Privacy**: Payloads may include user content; plugins must respect privacy policies +- **Redaction**: Consider masking sensitive fields for plugins that should not see them +- **Sandboxing**: Provide options to run plugins in restricted environments +- **Validation**: Validate plugin inputs and outputs to prevent injection attacks + +### Isolation Options + +This is a proposal for supporting compartmentalized execution of plugins. + +```yaml +plugins: + - name: untrusted-plugin + kind: external.UntrustedPlugin + isolation: + sandbox: true + timeout_ms: 5000 + max_memory_mb: 256 + allowed_operations: ["read_payload", "emit_metric"] +``` + +## 11. Backward Compatibility & Migration + +### Versioning + +- Hook payload contracts are versioned (e.g., `payload_version: "1.0"`) +- Breaking changes increment major version +- Deprecated fields marked and maintained for one major version + +### Default Behavior + +- Without plugins registered, Mellea behavior is unchanged +- Default "no-op" plugin manager if no configuration provided From 7098bb515c53d8a1f26c7ede96db621b6c281bab Mon Sep 17 00:00:00 2001 From: Frederico Araujo Date: Tue, 3 Feb 2026 23:24:58 -0500 Subject: [PATCH 02/10] docs: update hook system spec to factor component hooks and address design drifts Signed-off-by: Frederico Araujo --- docs/dev/hook_system.md | 598 +++++++++++++++++++++++++++------------- 1 file changed, 402 insertions(+), 196 deletions(-) diff --git a/docs/dev/hook_system.md b/docs/dev/hook_system.md index 037f6196b..165d7e25a 100644 --- a/docs/dev/hook_system.md +++ b/docs/dev/hook_system.md @@ -1,6 +1,6 @@ -# Mellea Plugin Hook Points Design Document +# Mellea Plugin Hook System Design Document -This document defines the hook system for Mellea, enabling plugins to register and respond to events throughout the framework's execution lifecycle. Hooks provide extensibility points for policy enforcement, data transformation, observability, and custom behavior injection. +Mellea's hook system provides extension points for deployed generative AI applications that need policy enforcement, observability, and customization without modifying core library code. Hooks enable plugins to register and respond to events throughout the framework's execution lifecycle — from session initialization through generation, validation, and cleanup. ## 1. Overview @@ -11,6 +11,7 @@ This document defines the hook system for Mellea, enabling plugins to register a 2. **Composable**: Multiple plugins can register for the same hook, executing in priority order 3. **Fail-safe**: Hook failures can be handled gracefully without breaking core execution 4. **Minimal Intrusion**: Plugins are opt-in; default Mellea behavior remains unchanged without plugins +5. **Architecturally Aligned**: Hook categories reflect Mellea's true abstraction boundaries — Session lifecycle, Component lifecycle, and the (Backend, Context) generation pipeline ### Hook Method Signature @@ -28,12 +29,29 @@ async def hook_name( - **`context`**: Read-only shared context with session metadata and utilities - **Returns**: A result object with continuation flag, modified payload, and violation/explanation +### Concurrency Model + +Hooks use Python's `async`/`await` cooperative multitasking. Because Python's event loop only switches execution at `await` points, hook code won't be interrupted mid-logic. This means: + +- **Sequential when awaited**: Calling `await hook(...)` keeps control flow deterministic — the hook completes before the caller continues. +- **Race conditions only at `await` points**: Shared state is safe to read and write between `await` calls within a single hook. Races only arise if multiple hooks modify the same shared state and are dispatched concurrently. +- **No preemptive interruption**: Unlike threads, a hook handler runs uninterrupted until it yields control via `await`. + +### Execution Timing + +Hooks support two execution timing modes, configurable per-registration: + +- **Blocking** (default): The hook is awaited inline. Use for policy enforcement, payload transformation, and any hook that must complete before execution continues. +- **Fire-and-forget**: The hook is dispatched via `asyncio.create_task()` and runs in the background. Use for logging, telemetry, and non-critical side effects where latency matters more than ordering guarantees. + +Fire-and-forget hooks cannot modify payloads or block execution — their `PluginResult` is ignored. Any exceptions in fire-and-forget hooks are logged but do not propagate. + ### Plugin Framework -To enable this extension system, we plan to leverage a lightweight, standalone plugin framework that: -- Installs as a Python package dependency with minimum footprint -- Exposes APIs to define hook invocation points, and base data objects for plugin payload, context, and result. -- Exposes a base calss and decorator to implement concrete plugins and register hook functions +The hook system is backed by a lightweight plugin framework built as a Mellea dependency (not a separate user-facing package). This framework: + +- Provides APIs to define hook invocation points and base data objects for plugin payload, context, and result +- Exposes a base class and decorator to implement concrete plugins and register hook functions - Implements a plugin manager that loads, registers, and governs the execution of plugins ## 2. Common Payload Fields @@ -51,33 +69,35 @@ class BasePayload(PluginPayload): ## 3. Hook Summary Table -| Hook Point | Category | Description | -|------------|----------|-------------| -| `session_pre_init` | Session | Before session initialization | -| `session_post_init` | Session | After session is fully initialized | -| `session_reset` | Session | When session context is reset | -| `session_cleanup` | Session | Before session cleanup/teardown | -| `instruction_pre_create` | Instruction | Before Instruction component creation | -| `instruction_post_create` | Instruction | After Instruction created, before execution | -| `action_pre_execute` | Action | Before any action execution | -| `action_post_success` | Action | After successful action completion | -| `action_post_error` | Action | After action fails with error | -| `generation_pre_call` | Generation | Before LLM backend call | -| `generation_post_call` | Generation | After LLM response received | -| `generation_stream_chunk` | Generation | For each streaming chunk | -| `validation_pre_check` | Validation | Before requirement validation | -| `validation_post_check` | Validation | After validation completes | -| `sampling_loop_start` | Sampling | When sampling strategy begins | -| `sampling_iteration` | Sampling | After each sampling attempt | -| `sampling_repair` | Sampling | When repair is invoked | -| `sampling_loop_end` | Sampling | When sampling completes | -| `tool_pre_invoke` | Tool | Before tool/function invocation | -| `tool_post_invoke` | Tool | After tool execution | -| `slot_pre_call` | Generative Slot | Before generative slot invocation | -| `slot_post_call` | Generative Slot | After generative slot returns | -| `context_update` | Context | When context changes | -| `context_prune` | Context | When context is trimmed | -| `error_occurred` | Error | When an error occurs | +| Hook Point | Category | Domain | Description | +|------------|----------|--------|-------------| +| `session_pre_init` | Session Lifecycle | Session | Before session initialization | +| `session_post_init` | Session Lifecycle | Session | After session is fully initialized | +| `session_reset` | Session Lifecycle | Session | When session context is reset | +| `session_cleanup` | Session Lifecycle | Session | Before session cleanup/teardown | +| `component_pre_create` | Component Lifecycle | Component / (Backend, Context) | Before component creation | +| `component_post_create` | Component Lifecycle | Component / (Backend, Context) | After component created, before execution | +| `component_pre_execute` | Component Lifecycle | Component / (Backend, Context) | Before component execution via `aact()` | +| `component_post_success` | Component Lifecycle | Component / (Backend, Context) | After successful component execution | +| `component_post_error` | Component Lifecycle | Component / (Backend, Context) | After component execution fails | +| `generation_pre_call` | Generation Pipeline | (Backend, Context) | Before LLM backend call | +| `generation_post_call` | Generation Pipeline | (Backend, Context) | After LLM response received | +| `generation_stream_chunk` | Generation Pipeline | (Backend, Context) | For each streaming chunk | +| `validation_pre_check` | Validation | (Backend, Context) | Before requirement validation | +| `validation_post_check` | Validation | (Backend, Context) | After validation completes | +| `sampling_loop_start` | Sampling Pipeline | (Backend, Context) | When sampling strategy begins | +| `sampling_iteration` | Sampling Pipeline | (Backend, Context) | After each sampling attempt | +| `sampling_repair` | Sampling Pipeline | (Backend, Context) | When repair is invoked | +| `sampling_loop_end` | Sampling Pipeline | (Backend, Context) | When sampling completes | +| `tool_pre_invoke` | Tool Execution | (Backend, Context) | Before tool/function invocation | +| `tool_post_invoke` | Tool Execution | (Backend, Context) | After tool execution | +| `adapter_pre_load` | Backend Adapter Ops | Backend | Before `backend.load_adapter()` | +| `adapter_post_load` | Backend Adapter Ops | Backend | After adapter loaded | +| `adapter_pre_unload` | Backend Adapter Ops | Backend | Before `backend.unload_adapter()` | +| `adapter_post_unload` | Backend Adapter Ops | Backend | After adapter unloaded | +| `context_update` | Context Operations | Context | When context changes | +| `context_prune` | Context Operations | Context | When context is trimmed | +| `error_occurred` | Error Handling | Cross-cutting | When an unrecoverable error occurs | ## 4. Hook Definitions @@ -167,12 +187,14 @@ Hooks that manage session boundaries, useful for initialization, state setup, an - `session`: MelleaSession - Final session state -### B. Instruction & Action Hooks +### B. Component Lifecycle Hooks -Hooks around the high-level primitives like `instruct()`, `chat()`, and action execution. +Hooks around Component creation and execution. All Mellea primitives — Instruction, Message, Query, Transform, GenerativeSlot — are Components. These hooks cover the full Component lifecycle; there are no separate hooks per component type. +All component payloads include a `component_type: str` field (e.g., `"Instruction"`, `"Message"`, `"GenerativeSlot"`, `"Query"`, `"Transform"`) so plugins can filter by type. For example, a plugin targeting only generative slots would check `component_type == "GenerativeSlot"`. -#### `instruction_pre_create` + +#### `component_pre_create` - **Trigger**: Called when `instruct()`, `chat()`, or a generative slot is invoked, before the prompt is constructed. - **Use Cases**: @@ -183,7 +205,8 @@ Hooks around the high-level primitives like `instruct()`, `chat()`, and action e - Enforcing content policies - **Payload**: ```python - class InstructionPreCreatePayload(BasePayload): + class ComponentPreCreatePayload(BasePayload): + component_type: str # "Instruction", "GenerativeSlot", etc. description: str # Main instruction text images: list[ImageBlock] | None # Attached images requirements: list[Requirement | str] # Validation requirements @@ -194,44 +217,45 @@ Hooks around the high-level primitives like `instruct()`, `chat()`, and action e template_id: str | None # Identifier of prompt template ``` - **Context**: - - `session`: MelleaSession - - `parent_context`: Context - Context instruction will be added to + - `backend`: Backend + - `context`: Context - Context the component will be added to - `history_snapshot`: ContextSnapshot - Conversation history -#### `instruction_post_create` +#### `component_post_create` -- **Trigger**: After Instruction component is created and formatted, before backend call. +- **Trigger**: After component is created and formatted, before backend call. - **Use Cases**: - Appending system prompts - Context stuffing (RAG injection) - - Logging instruction patterns - - Validating final instruction structure + - Logging component patterns + - Validating final component structure - **Payload**: ```python - class InstructionPostCreatePayload(BasePayload): - instruction: Instruction # Created instruction component + class ComponentPostCreatePayload(BasePayload): + component_type: str # "Instruction", "GenerativeSlot", etc. + component: Component # The created component template_repr: TemplateRepresentation # Formatted representation - component: Component # The structured prompt object ``` - **Context**: - - `session`: MelleaSession - - `parent_context`: Context + - `backend`: Backend + - `context`: Context -#### `action_pre_execute` +#### `component_pre_execute` -- **Trigger**: Before any action (Instruction, Query, Transform) is executed via `act()`. +- **Trigger**: Before any component is executed via `aact()`. - **Use Cases**: - Policy enforcement on generation requests - Injecting/modifying model options - Routing to different strategies - Authorization checks - - Logging action patterns + - Logging execution patterns - **Payload**: ```python - class ActionPreExecutePayload(BasePayload): - action: Component | CBlock # The action to execute + class ComponentPreExecutePayload(BasePayload): + component_type: str # "Instruction", "GenerativeSlot", etc. + action: Component | CBlock # The component to execute context: Context # Current context context_view: list[Component | CBlock] | None # Linearized context requirements: list[Requirement] # Attached requirements @@ -241,14 +265,13 @@ Hooks around the high-level primitives like `instruct()`, `chat()`, and action e tool_calls_enabled: bool # Whether tools are available ``` - **Context**: - - `session`: MelleaSession - `backend`: Backend - - `action_type`: str - Type of action being executed + - `context`: Context -#### `action_post_success` +#### `component_post_success` -- **Trigger**: After action execution completes successfully. +- **Trigger**: After component execution completes successfully. - **Use Cases**: - Logging generation results - Output validation (hallucination check) @@ -258,24 +281,34 @@ Hooks around the high-level primitives like `instruct()`, `chat()`, and action e - Collecting metrics - **Payload**: ```python - class ActionPostSuccessPayload(BasePayload): - action: Component | CBlock # Executed action + class ComponentPostSuccessPayload(BasePayload): + component_type: str # "Instruction", "GenerativeSlot", etc. + action: Component | CBlock # Executed component result: ModelOutputThunk # Generation result - context_before: Context # Context before action - context_after: Context # Context after action + context_before: Context # Context before execution + context_after: Context # Context after execution generate_log: GenerateLog # Detailed execution log sampling_results: list[SamplingResult] | None # If sampling was used latency_ms: int # Execution time ``` - **Context**: - - `session`: MelleaSession + - `backend`: Backend + - `context`: Context - `token_usage`: dict | None - `original_input`: dict - Input that triggered generation +> **Design Decision: Separate Success/Error Hooks** +> +> `component_post_success` and `component_post_error` are separate hooks rather than a single `component_post` with a sum type over success/failure. The reasons are: +> +> 1. **Registration granularity** — Plugins subscribe to only what they need. An audit logger may only care about errors; a metrics collector may only care about successes. +> 2. **Distinct payload shapes** — Success payloads carry `result`, `generate_log`, and `sampling_results`; error payloads carry `exception`, `error_type`, and `stack_trace`. A sum type would force nullable fields or tagged unions, adding complexity for every consumer. +> 3. **Different execution modes** — Error hooks may be fire-and-forget (for alerting); success hooks may be blocking (for output transformation). Separate hooks allow per-hook execution timing configuration. + -#### `action_post_error` +#### `component_post_error` -- **Trigger**: When action execution fails with an exception. +- **Trigger**: When component execution fails with an exception. - **Use Cases**: - Error logging and alerting - Custom error recovery @@ -283,8 +316,9 @@ Hooks around the high-level primitives like `instruct()`, `chat()`, and action e - Graceful degradation - **Payload**: ```python - class ActionPostErrorPayload(BasePayload): - action: Component | CBlock # Action that failed + class ComponentPostErrorPayload(BasePayload): + component_type: str # "Instruction", "GenerativeSlot", etc. + action: Component | CBlock # Component that failed error: Exception # The exception raised error_type: str # Exception class name stack_trace: str # Full stack trace @@ -292,13 +326,14 @@ Hooks around the high-level primitives like `instruct()`, `chat()`, and action e model_options: dict # Options used ``` - **Context**: - - `session`: MelleaSession + - `backend`: Backend + - `context`: Context - `recoverable`: bool - Can execution continue -### C. Generation Hooks (Backend Execution) +### C. Generation Pipeline Hooks -Low-level hooks between the component abstraction and raw LLM API calls. +Low-level hooks between the component abstraction and raw LLM API calls. These operate on the (Backend, Context) tuple — they do not require a session. #### `generation_pre_call` @@ -326,6 +361,8 @@ Low-level hooks between the component abstraction and raw LLM API calls. estimated_tokens: int | None # Token estimate ``` - **Context**: + - `backend`: Backend + - `context`: Context - `backend_name`: str - `model_id`: str - `provider`: str - Provider name (e.g., "ibm/granite") @@ -354,6 +391,8 @@ Low-level hooks between the component abstraction and raw LLM API calls. finish_reason: str # Why generation stopped ``` - **Context**: + - `backend`: Backend + - `context`: Context - `backend_name`: str - `model_id`: str - `status_code`: int | None - HTTP status from provider @@ -378,13 +417,15 @@ Low-level hooks between the component abstraction and raw LLM API calls. ``` - **Context**: - `thunk_id`: str + - `backend`: Backend + - `context`: Context - `backend_name`: str - `model_id`: str ### D. Validation Hooks -Hooks around requirement verification and output validation. +Hooks around requirement verification and output validation. These operate on the (Backend, Context) tuple. #### `validation_pre_check` @@ -404,7 +445,8 @@ Hooks around requirement verification and output validation. model_options: dict # Options for LLM-as-judge ``` - **Context**: - - `session`: MelleaSession + - `backend`: Backend + - `context`: Context - `validation_type`: str - "python" | "llm_as_judge" @@ -428,13 +470,14 @@ Hooks around requirement verification and output validation. generate_logs: list[GenerateLog | None] # Logs from LLM-as-judge ``` - **Context**: - - `session`: MelleaSession + - `backend`: Backend + - `context`: Context - `validation_duration_ms`: int ### E. Sampling & Repair Hooks -Hooks around sampling strategies and failure recovery. +Hooks around sampling strategies and failure recovery. These operate on the (Backend, Context) tuple — sampling strategies take explicit `(action, context, backend)` arguments and do not require a session. #### `sampling_loop_start` @@ -454,7 +497,9 @@ Hooks around sampling strategies and failure recovery. loop_budget: int # Maximum iterations ``` - **Context**: - - `session`: MelleaSession + - `backend`: Backend + - `context`: Context + - `strategy_name`: str - `strategy_config`: dict @@ -478,6 +523,8 @@ Hooks around sampling strategies and failure recovery. total_count: int ``` - **Context**: + - `backend`: Backend + - `context`: Context - `strategy_name`: str - `remaining_budget`: int - `elapsed_ms`: int @@ -485,7 +532,12 @@ Hooks around sampling strategies and failure recovery. #### `sampling_repair` -- **Trigger**: When repair strategy is invoked after validation failure. +- **Trigger**: When a repair strategy is invoked after validation failure. Behavior varies by sampling strategy. +- **Strategy-Specific Behavior**: + - **RejectionSamplingStrategy**: Identity retry — same action, original context. No actual repair; simply regenerates. (`repair_type: "identity"`) + - **RepairTemplateStrategy**: Appends failure descriptions via `copy_and_repair()`, producing a modified context that includes what went wrong. (`repair_type: "template_repair"`) + - **MultiTurnStrategy**: Adds a Message describing failures to the conversation context, treating repair as a new conversational turn. (`repair_type: "multi_turn_message"`) + - **SOFAISamplingStrategy**: Two-solver approach with targeted feedback between attempts. (`repair_type: "sofai_feedback"`) - **Use Cases**: - Logging repair patterns - Injecting custom repair strategies @@ -494,6 +546,7 @@ Hooks around sampling strategies and failure recovery. - **Payload**: ```python class SamplingRepairPayload(BasePayload): + repair_type: str # "identity" | "template_repair" | "multi_turn_message" | "sofai_feedback" | "custom" failed_action: Component # Action that failed failed_result: ModelOutputThunk # Failed output failed_validations: list[tuple[Requirement, ValidationResult]] @@ -504,6 +557,8 @@ Hooks around sampling strategies and failure recovery. repair_iteration: int # Which repair attempt ``` - **Context**: + - `backend`: Backend + - `context`: Context - `strategy_name`: str - `past_failures`: list[str] @@ -529,6 +584,8 @@ Hooks around sampling strategies and failure recovery. all_validations: list[list[tuple[Requirement, ValidationResult]]] ``` - **Context**: + - `backend`: Backend + - `context`: Context - `strategy_name`: str - `total_duration_ms`: int - `tokens_used`: int | None @@ -536,7 +593,7 @@ Hooks around sampling strategies and failure recovery. ### F. Tool Calling Hooks -Hooks around tool/function execution. +Hooks around tool/function execution. These operate on the (Backend, Context) tuple. #### `tool_pre_invoke` @@ -556,7 +613,8 @@ Hooks around tool/function execution. model_tool_call: ModelToolCall # Raw model output ``` - **Context**: - - `session`: MelleaSession + - `backend`: Backend + - `context`: Context - `available_tools`: list[str] - `invocation_source`: str - "transform" | "action" | etc. @@ -581,64 +639,95 @@ Hooks around tool/function execution. error: Exception | None # Error if any ``` - **Context**: - - `session`: MelleaSession + - `backend`: Backend + - `context`: Context - `invocation_source`: str -### G. Generative Slot Hooks +### G. Backend Adapter Operations + +Hooks around LoRA/aLoRA adapter loading and unloading on backends. Based on the `AdapterMixin` protocol in `mellea/backends/adapters/adapter.py`. -Hooks specific to `@generative` decorated functions. +> **Future Work: Backend Switching** +> +> These hooks cover adapter load/unload on a single backend. Hooks for switching the entire backend on a session (e.g., from Ollama to OpenAI mid-session) are a potential future extension and are distinct from adapter management. -#### `slot_pre_call` +#### `adapter_pre_load` -- **Trigger**: When a generative slot function is about to be invoked. +- **Trigger**: Before `backend.load_adapter()` is called. - **Use Cases**: - - Slot-level authorization - - Argument validation - - Profiling setup - - Logging slot invocations + - Validating adapter compatibility + - Enforcing adapter usage policies + - Logging adapter load attempts - **Payload**: ```python - class SlotPreCallPayload(BasePayload): - slot_name: str # Name of the generative slot function - slot_signature: str # Typed signature - args: dict[str, Any] # Arguments passed to the slot - kwargs: dict[str, Any] # Keyword arguments - docstring: str | None # Docstring / prompt template + class AdapterPreLoadPayload(BasePayload): + adapter_name: str # Name/path of adapter + adapter_config: dict # Adapter configuration + backend_name: str # Backend being adapted ``` - **Context**: - - `session`: MelleaSession - - `slot_module`: str - Module containing the slot + - `backend`: Backend -#### `slot_post_call` +#### `adapter_post_load` -- **Trigger**: Once the slot returns (or fails). +- **Trigger**: After adapter has been successfully loaded. - **Use Cases**: - - Measuring slot latency - - Recording success/failure - - Profiling statistics per slot - - Output transformation + - Confirming adapter activation + - Updating metrics/state + - Triggering downstream reconfiguration - **Payload**: ```python - class SlotPostCallPayload(BasePayload): - slot_name: str - args: dict[str, Any] - kwargs: dict[str, Any] - result: Any # The returned value - duration_ms: int - success: bool - error: Exception | None + class AdapterPostLoadPayload(BasePayload): + adapter_name: str + adapter_config: dict + backend_name: str + load_duration_ms: int # Time to load adapter ``` - **Context**: - - `session`: MelleaSession - - `slot_module`: str + - `backend`: Backend + + +#### `adapter_pre_unload` + +- **Trigger**: Before `backend.unload_adapter()` is called. +- **Use Cases**: + - Flushing adapter-specific state + - Logging adapter lifecycle + - Preventing unload during active generation +- **Payload**: + ```python + class AdapterPreUnloadPayload(BasePayload): + adapter_name: str + backend_name: str + ``` +- **Context**: + - `backend`: Backend + + +#### `adapter_post_unload` + +- **Trigger**: After adapter has been unloaded. +- **Use Cases**: + - Confirming adapter deactivation + - Cleaning up adapter-specific resources + - Updating metrics +- **Payload**: + ```python + class AdapterPostUnloadPayload(BasePayload): + adapter_name: str + backend_name: str + unload_duration_ms: int # Time to unload adapter + ``` +- **Context**: + - `backend`: Backend -### H. Context Manipulation Hooks +### H. Context Operations Hooks -Hooks around context changes and management. +Hooks around context changes and management. These operate on the Context directly. #### `context_update` @@ -659,7 +748,7 @@ Hooks around context changes and management. change_type: str # "append" | "reset" ``` - **Context**: - - `session`: MelleaSession + - `context`: Context - `history_length`: int @@ -681,18 +770,26 @@ Hooks around context changes and management. tokens_freed: int | None # Token estimate freed ``` - **Context**: - - `session`: MelleaSession + - `context`: Context - `token_limit`: int | None ### I. Error Handling Hooks -Hooks for error conditions and recovery. +Cross-cutting hook for error conditions. #### `error_occurred` -- **Trigger**: When an error occurs during any operation. +- **Trigger**: When an unrecoverable error occurs during any operation. +- **Fires for**: + - `ComponentParseError` — structured output parsing failures + - Backend communication errors — connection failures, API errors, timeouts + - Assertion violations — internal invariant failures + - Any unhandled `Exception` during component execution, validation, or tool invocation +- **Does NOT fire for**: + - Validation failures within sampling loops — these are handled by `sampling_iteration` and `sampling_repair` + - Controlled plugin violations via `PluginResult(continue_processing=False)` — these are policy decisions, not errors - **Use Cases**: - Error logging/alerting - Custom error recovery @@ -712,45 +809,130 @@ Hooks for error conditions and recovery. ``` - **Context**: - `session`: MelleaSession | None + - `backend`: Backend | None + - `context`: Context | None - `operation`: str - What operation was being performed -## 5. Plugin Context Object +## 5. PluginContext by Domain -The `PluginContext` passed to all hooks provides shared state and utilities: +The `PluginContext` passed to hooks varies by domain, providing only the references relevant to that category: -```python -# Session Information -session_id: str -session: MelleaSession | None +### Session Hooks -# Backend Information -backend_name: str -model_id: str -backend_capabilities: dict +```python +# session_* hooks +session: MelleaSession +session_id: str # Environment environment: dict[str, str] cwd: str # Plugin State -shared_state: dict[str, Any] # Shared across plugins +shared_state: dict[str, Any] # Shared across plugins + +# Utilities +logger: Logger +metrics: MetricsCollector + +# Request Metadata +request_id: str +parent_request_id: str | None +timestamp: datetime + +# User Information +user_id: str | None +user_metadata: dict[str, Any] +``` + +### Component, Generation, Validation, Sampling, and Tool Hooks + +```python +# component_*, generation_*, validation_*, sampling_*, tool_* hooks +backend: Backend +context: Context + +# Backend Information (generation hooks) +backend_name: str # Available on generation_* hooks +model_id: str # Available on generation_* hooks + +# Strategy Information (sampling hooks) +strategy_name: str # Available on sampling_* hooks + +# Plugin State +shared_state: dict[str, Any] # Utilities logger: Logger metrics: MetricsCollector # Request Metadata -request_id: str # Unique ID for this execution chain -parent_request_id: str | None # For nested calls +request_id: str +parent_request_id: str | None timestamp: datetime # User Information -user_id: str | None # User reference if available -user_metadata: dict[str, Any] # User-specific data +user_id: str | None +user_metadata: dict[str, Any] ``` -> Note: this is a suggestion. Requires discussion. +### Adapter Hooks + +```python +# adapter_* hooks +backend: Backend + +# Plugin State +shared_state: dict[str, Any] + +# Utilities +logger: Logger +metrics: MetricsCollector + +# Request Metadata +request_id: str +timestamp: datetime +``` + +### Context Hooks + +```python +# context_* hooks +context: Context + +# Plugin State +shared_state: dict[str, Any] + +# Utilities +logger: Logger +metrics: MetricsCollector + +# Request Metadata +request_id: str +timestamp: datetime +``` + +### Error Hook + +```python +# error_occurred +session: MelleaSession | None +backend: Backend | None +context: Context | None + +# Plugin State +shared_state: dict[str, Any] + +# Utilities +logger: Logger +metrics: MetricsCollector + +# Request Metadata +request_id: str +timestamp: datetime +operation: str +``` ### Context Snapshot @@ -758,10 +940,10 @@ When conversation history is relevant, the plugin context object may include a ` ```python @dataclass -class ContextSnapshot: +class ContextSnapshot: history_length: int # Number of turns last_turn: dict | None # Last user/assistant exchange - token_estimate: int | None # Estimated token count + token_estimate: int | None # Estimated token count ``` ## 6. Hook Results @@ -773,7 +955,7 @@ Hooks can return different result types to control execution: ```python return PluginResult( continue_processing=True - modified_payload=modified_payload, + modified_payload=modified_payload, ) ``` @@ -788,7 +970,7 @@ violation = PluginViolation( severity="error" # "error" | "warning" ) -return PluginResult( +return PluginResult( continue_processing=False, violation=violation ) @@ -805,9 +987,10 @@ plugins: - name: content-policy kind: mellea.plugins.ContentPolicyPlugin hooks: - - instruction_pre_create + - component_pre_create - generation_post_call mode: enforce + execution: blocking priority: 10 config: blocked_terms: ["term1", "term2"] @@ -815,10 +998,11 @@ plugins: - name: telemetry kind: mellea.plugins.TelemetryPlugin hooks: - - action_post_success + - component_post_success - validation_post_check - sampling_loop_end mode: permissive + execution: fire_and_forget priority: 100 config: endpoint: "https://telemetry.example.com" @@ -831,6 +1015,11 @@ plugins: - **`permissive`**: Log violations without blocking - **`disabled`**: Skip hook execution +### Execution Timing + +- **`blocking`** (default): Hook is awaited inline before continuing +- **`fire_and_forget`**: Hook is dispatched as an `asyncio.create_task()` — cannot modify payloads or block execution + ### Priority - Lower numbers execute first @@ -860,7 +1049,7 @@ class PIIRedactionPlugin(Plugin): return "PII_Redactor" def register(self, hooks): - hooks.register("instruction_pre_create", self.redact_input) + hooks.register("component_pre_create", self.redact_input) hooks.register("generation_post_call", self.redact_output) async def redact_input(self, payload, context): @@ -874,7 +1063,7 @@ class PIIRedactionPlugin(Plugin): m = mellea.start_session( ..., plugin_manager=pm, - hooks_enabled=["instruction_pre_create", "generation_post_call"] + hooks_enabled=["component_pre_create", "generation_post_call"] ) ``` @@ -884,22 +1073,26 @@ m = mellea.start_session( ```python class ContentPolicyPlugin(MelleaPlugin): - async def on_instruction_pre_create( + async def on_component_pre_create( self, - payload: InstructionPreCreatePayload, + payload: ComponentPreCreatePayload, context: PluginContext ) -> PluginResult | None: + # Only enforce on Instructions and GenerativeSlots + if payload.component_type not in ("Instruction", "GenerativeSlot"): + return None + blocked_terms = self.config.get("blocked_terms", []) for term in blocked_terms: if term.lower() in payload.description.lower(): return PluginResult( - continue_processing=False, + continue_processing=False, violation=PluginViolation( reason="Blocked content detected", - description=f"Instruction contains blocked term: {term}", + description=f"Component contains blocked term: {term}", code="CONTENT_POLICY_001" - ) + ) async def on_generation_post_call( self, @@ -918,33 +1111,34 @@ class ContentPolicyPlugin(MelleaPlugin): ```python class AuditLoggingPlugin(MelleaPlugin): - async def on_action_post_success( + async def on_component_post_success( self, - payload: ActionPostSuccessPayload, + payload: ComponentPostSuccessPayload, context: PluginContext ) -> PluginResult | None: self._log_audit_event({ "event": "generation_success", "session_id": payload.session_id, - "action_type": type(payload.action).__name__, + "component_type": payload.component_type, "latency_ms": payload.latency_ms, "token_usage": context.get("token_usage"), "timestamp": payload.timestamp.isoformat() }) - - async def on_action_post_error( + + async def on_component_post_error( self, - payload: ActionPostErrorPayload, + payload: ComponentPostErrorPayload, context: PluginContext ) -> PluginResult | None: self._log_audit_event({ "event": "generation_error", "session_id": payload.session_id, + "component_type": payload.component_type, "error_type": payload.error_type, "stack_trace": payload.stack_trace, "timestamp": payload.timestamp.isoformat() - }) + }) ``` ### Token Budget Plugin @@ -961,13 +1155,13 @@ class TokenBudgetPlugin(MelleaPlugin): if estimated > budget: return PluginResult( - continue_processing=False, - violation=PluginViolation( + continue_processing=False, + violation=PluginViolation( reason="Token budget exceeded", description=f"Estimated {estimated} tokens exceeds budget of {budget}", code="TOKEN_BUDGET_001", details={"estimated": estimated, "budget": budget} - ) + ) ``` ### Generative Slot Profiler @@ -977,21 +1171,23 @@ class SlotProfilerPlugin(MelleaPlugin): def __init__(self): self._stats = defaultdict(lambda: {"calls": 0, "total_ms": 0, "errors": 0}) - async def on_slot_post_call( + async def on_component_post_success( self, - payload: SlotPostCallPayload, + payload: ComponentPostSuccessPayload, context: PluginContext ) -> PluginResult | None: - stats = self._stats[payload.slot_name] + # Only profile GenerativeSlot components + if payload.component_type != "GenerativeSlot": + return None + + stats = self._stats[payload.action.__name__] stats["calls"] += 1 - stats["total_ms"] += payload.duration_ms - if not payload.success: - stats["errors"] += 1 + stats["total_ms"] += payload.latency_ms context.emit_metric( "slot_latency_ms", - payload.duration_ms, - tags={"slot": payload.slot_name, "success": payload.success} + payload.latency_ms, + tags={"slot": payload.action.__name__, "success": True} ) ``` @@ -1003,23 +1199,22 @@ class SlotProfilerPlugin(MelleaPlugin): ```mermaid flowchart LR A([Start]) --> B[Session Init] - B --> C[Instruction] - C --> D[Action] - D --> E[Generation] - E --> F[Validation] - F --> G{OK?} - G --> |Yes| H[Success] - G --> |No| I[Error/Retry] - H --> J[Cleanup] - I --> J - J --> K([End]) + B --> C[Component] + C --> D[Generation] + D --> E[Validation] + E --> F{OK?} + F --> |Yes| G[Success] + F --> |No| H[Error/Retry] + G --> I[Cleanup] + H --> I + I --> J([End]) style A fill:#e1f5fe - style K fill:#e1f5fe - style G fill:#fce4ec + style J fill:#e1f5fe + style F fill:#fce4ec ``` -### Detailed Flows +### Detailed Flow ```mermaid flowchart TD @@ -1032,20 +1227,10 @@ flowchart TD C --> D - subgraph Slot["Generative Slot (optional)"] - S1[slot_pre_call] - S1 --> S2[slot_post_call] - end - - subgraph Instruction["Instruction Creation"] - D[instruction_pre_create] - D --> E[instruction_post_create] - end - - E --> F - - subgraph Action["Action Execution"] - F[action_pre_execute] + subgraph CompLifecycle["Component Lifecycle"] + D[component_pre_create] + D --> E[component_post_create] + E --> F[component_pre_execute] end F --> G @@ -1082,20 +1267,27 @@ flowchart TD L --> M{Success?} - M --> |Yes| N[action_post_success] - M --> |No| O[action_post_error] + M --> |Yes| N[component_post_success] + M --> |No| O[component_post_error] N --> P[context_update] O --> Q[error_occurred] - subgraph Context["Context Operations"] + subgraph ContextOps["Context Operations"] P[context_update] P -.-> P2[context_prune] end + subgraph Adapter["Backend Adapter Operations"] + AD1[adapter_pre_load] + AD1 --> AD2[adapter_post_load] + AD3[adapter_pre_unload] + AD3 --> AD4[adapter_post_unload] + end + subgraph Cleanup["Session End"] - R[session_reset] R2[session_cleanup] + R[session_reset] end P --> R2 @@ -1112,7 +1304,21 @@ flowchart TD style SL3 fill:#fce4ec ``` -## 10. Error Handling, Security & Isolation +## 10. Observability Integration + +### Shallow Logging and OTel + +"Shallow logging" refers to OTel-instrumenting the HTTP transport layer of LLM client libraries (openai, ollama, litellm). This captures request/response spans at the network level without awareness of Mellea's semantic concepts (components, sampling strategies, validation). + +The hook system provides natural integration points for enriching these shallow spans with Mellea-level context: + +- **`generation_pre_call`**: Inject span attributes such as `component_type`, `strategy_name`, and `request_id` into the active OTel context before the HTTP call fires +- **`generation_post_call`**: Attach result metadata — `finish_reason`, `token_usage`, validation outcome — to the span after the call completes +- **`request_id` from `BasePayload`**: Serves as a correlation ID linking Mellea-level hook events to transport-level OTel spans + +> **Forward-looking**: Mellea does not currently include OTel integration. This section describes the intended design for how hooks and shallow logging would compose when OTel support is added. + +## 11. Error Handling, Security & Isolation ### Error Handling @@ -1143,7 +1349,7 @@ plugins: allowed_operations: ["read_payload", "emit_metric"] ``` -## 11. Backward Compatibility & Migration +## 12. Backward Compatibility & Migration ### Versioning From 355146fffadf89103f4b130d4a4102e0e713230e Mon Sep 17 00:00:00 2001 From: Frederico Araujo Date: Wed, 4 Feb 2026 23:52:08 -0500 Subject: [PATCH 03/10] docs: add clarifications for component hook payload fields and additional suggestions by maintainers Signed-off-by: Frederico Araujo --- docs/dev/hook_system.md | 80 +++++++++++++++++++++++++++++++++++++++-- 1 file changed, 77 insertions(+), 3 deletions(-) diff --git a/docs/dev/hook_system.md b/docs/dev/hook_system.md index 165d7e25a..4ddec0c83 100644 --- a/docs/dev/hook_system.md +++ b/docs/dev/hook_system.md @@ -44,7 +44,7 @@ Hooks support two execution timing modes, configurable per-registration: - **Blocking** (default): The hook is awaited inline. Use for policy enforcement, payload transformation, and any hook that must complete before execution continues. - **Fire-and-forget**: The hook is dispatched via `asyncio.create_task()` and runs in the background. Use for logging, telemetry, and non-critical side effects where latency matters more than ordering guarantees. -Fire-and-forget hooks cannot modify payloads or block execution — their `PluginResult` is ignored. Any exceptions in fire-and-forget hooks are logged but do not propagate. +Fire-and-forget hooks cannot modify payloads or block execution — their `PluginResult` is ignored. Any exceptions in fire-and-forget hooks are logged but do not propagate. Fire-and-forget hooks receive the payload snapshot as it existed at dispatch time; blocking hooks in the same chain that execute earlier (higher priority) can modify the payload before fire-and-forget hooks see it. ### Plugin Framework @@ -54,6 +54,34 @@ The hook system is backed by a lightweight plugin framework built as a Mellea de - Exposes a base class and decorator to implement concrete plugins and register hook functions - Implements a plugin manager that loads, registers, and governs the execution of plugins +### Hook Invocation Responsibilities + +Hooks are called from Mellea's base classes (`Component.aact()`, `Backend.generate()`, `SamplingStrategy.run()`, etc.). This means hook invocation is a framework-level concern, and authors of new backends, sampling strategies, or components do not need to manually insert hook calls. + +The calling convention is a single async call at each hook site: + +```python +result = await plugin_manager.invoke_hook(hook_type, payload, context) +``` + +The caller (the base class method) is responsible for both invoking the hook and processing the result. Processing means checking the result for one of three possible outcomes: + +1. **Continue with original payload**: — `PluginResult(continue_processing=True)` with no `modified_payload`. The caller proceeds unchanged. +2. **Continue with modified payload**: — `PluginResult(continue_processing=True, modified_payload=...)`. The caller uses the modified payload fields in place of the originals. +3. **Block execution** — `PluginResult(continue_processing=False, violation=...)`. The caller raises or returns early with structured error information. + +Hooks cannot redirect control flow, jump to arbitrary code, or alter the calling method's logic beyond these outcomes. This is enforced by the `PluginResult` type. + +### Payload Design Principles + +Hook payloads follow five design principles: + +1. **Strongly typed** — Each hook has a dedicated payload dataclass (not a generic dict). This enables IDE autocompletion, static analysis, and clear documentation of what each hook receives. +2. **Sufficient (maximize-at-boundary)** — Each payload includes everything available at that point in time. Post-hooks include the pre-hook fields plus results. This avoids forcing plugins to maintain their own state across pre/post pairs. +3. **Immutable context** — `PluginContext` fields are read-only; only the `payload` is mutable. This separates "what the plugin can observe" from "what the plugin can change." +4. **Serializable** — Payloads should be serializable for external (MCP-based) plugins that run out-of-process. All payload fields use types that can round-trip through JSON or similar formats. +5. **Versioned** — Payload schemas carry a `payload_version` so plugins can detect incompatible changes at registration time rather than at runtime. + ## 2. Common Payload Fields All hook payloads inherit these base fields: @@ -193,6 +221,21 @@ Hooks around Component creation and execution. All Mellea primitives — Instruc All component payloads include a `component_type: str` field (e.g., `"Instruction"`, `"Message"`, `"GenerativeSlot"`, `"Query"`, `"Transform"`) so plugins can filter by type. For example, a plugin targeting only generative slots would check `component_type == "GenerativeSlot"`. +Not all `ComponentPreCreatePayload` fields are populated for every component type. The table below shows which fields are available per type (`✓` = populated, `—` = `None` or empty): + +| Field | Instruction | Message | Query | Transform | GenerativeSlot | +|-------|:-----------:|:-------:|:-----:|:---------:|:--------------:| +| `description` | ✓ | ✓ | ✓ | ✓ | ✓ | +| `images` | ✓ | ✓ | — | — | ✓ | +| `requirements` | ✓ | — | — | — | ✓ | +| `icl_examples` | ✓ | — | — | — | ✓ | +| `grounding_context` | ✓ | — | — | — | ✓ | +| `user_variables` | ✓ | — | — | — | ✓ | +| `prefix` | ✓ | — | — | — | ✓ | +| `template_id` | ✓ | — | — | — | ✓ | + +Plugins should check for `None`/empty values rather than assuming all fields are present for all component types. + #### `component_pre_create` @@ -335,6 +378,10 @@ All component payloads include a `component_type: str` field (e.g., `"Instructio Low-level hooks between the component abstraction and raw LLM API calls. These operate on the (Backend, Context) tuple — they do not require a session. +> **Context Modification Sequencing** +> +> Modifications to `Context` at `component_pre_execute` are reflected in the subsequent `generation_pre_call`, because context linearization happens after the component-level hook. Modifications to `Context` after `generation_pre_call` (e.g., in `generation_post_call`) do not affect the current generation — the prompt has already been sent. This ordering is by design: `component_pre_execute` is the last point where context changes influence what the LLM sees. + #### `generation_pre_call` @@ -732,7 +779,7 @@ Hooks around context changes and management. These operate on the Context direct #### `context_update` -- **Trigger**: When data is added to context or context changes. +- **Trigger**: When a component or CBlock is explicitly appended to a session's context (e.g., after a successful generation or a user-initiated addition). Does not fire on internal framework reads or context linearization. - **Use Cases**: - Context audit trail - Memory management policies @@ -754,7 +801,7 @@ Hooks around context changes and management. These operate on the Context direct #### `context_prune` -- **Trigger**: When context trimming or pruning logic runs. +- **Trigger**: When `view_for_generation` is called and context exceeds token limits, or when a dedicated prune API is invoked. This is the point where context is linearized and token budget enforcement becomes relevant. - **Use Cases**: - Token budget management - Recording pruning events @@ -950,6 +997,12 @@ class ContextSnapshot: Hooks can return different result types to control execution: +1. **Continue (no-op)** — `PluginResult(continue_processing=True)` with no `modified_payload`. Execution proceeds with the original payload unchanged. +2. **Continue with modification** — `PluginResult(continue_processing=True, modified_payload=...)`. Execution proceeds with the modified payload fields in place of the originals. +3. **Block execution** — `PluginResult(continue_processing=False, violation=...)`. Execution halts with structured error information via `PluginViolation`. + +These three outcomes are exhaustive. Hooks cannot redirect control flow, throw arbitrary exceptions, or alter the calling method's logic beyond these outcomes. This is enforced by the `PluginResult` type — there is no escape hatch. The `violation` field provides structured error information but does not influence which code path runs next. + ### Modify Payload ```python @@ -1067,6 +1120,26 @@ m = mellea.start_session( ) ``` +### Global PluginManager + +The hook system uses a **singleton PluginManager** that is initialized once (typically at application startup via YAML config) and shared globally. Session-level configuration (e.g., `hooks_enabled`) is for scoped overrides — selectively enabling or disabling specific hooks for a particular session — not for owning or replacing the plugin manager. + +For the functional (non-session) path (e.g., calling `instruct()` or `generate()` directly without a `MelleaSession`), the PluginManager is accessed directly. Hooks still fire at the same points in the execution lifecycle; the only difference is that session-scoped overrides do not apply. + +### Custom Hook Types + +The plugin framework supports custom hook types for domain-specific extension points beyond the built-in lifecycle hooks. This is particularly relevant for agentic patterns (ReAct, tool-use loops, etc.) where the execution flow is application-defined. + +Custom hooks are registered using the `@hook` decorator: + +```python +@hook("react_pre_reasoning", ReactReasoningPayload, ReactReasoningResult) +async def before_reasoning(self, payload, context): + ... +``` + +Custom hooks follow the same calling convention, payload chaining, and result semantics as built-in hooks. The plugin manager discovers them via the decorator metadata at registration time. As agentic patterns stabilize in Mellea, frequently-used custom hooks may be promoted to built-in hooks. + ## 8. Example Implementations ### Content Policy Plugin @@ -1356,6 +1429,7 @@ plugins: - Hook payload contracts are versioned (e.g., `payload_version: "1.0"`) - Breaking changes increment major version - Deprecated fields marked and maintained for one major version +- Hook payload versions are independent of Mellea release versions. Payload versions change only when the payload schema changes, which may or may not coincide with a Mellea release ### Default Behavior From b667eebf861e3eb6a9183ddab5ecbcaf90b8d827 Mon Sep 17 00:00:00 2001 From: Frederico Araujo Date: Fri, 6 Feb 2026 00:57:49 -0500 Subject: [PATCH 04/10] docs: add implementation plan Signed-off-by: Frederico Araujo --- docs/dev/hook_system_implementation_plan.md | 729 ++++++++++++++++++++ 1 file changed, 729 insertions(+) create mode 100644 docs/dev/hook_system_implementation_plan.md diff --git a/docs/dev/hook_system_implementation_plan.md b/docs/dev/hook_system_implementation_plan.md new file mode 100644 index 000000000..c871b9b57 --- /dev/null +++ b/docs/dev/hook_system_implementation_plan.md @@ -0,0 +1,729 @@ +# Mellea Hook System — Implementation Plan + +This document describes the implementation plan for the extensibility hook system specified in [`docs/dev/hook_system.md`](hook_system.md). The implementation uses the [ContextForge plugin framework](https://github.com/IBM/mcp-context-forge) (`mcpgateway.plugins.framework`) as an optional external dependency for core plumbing, while all Mellea-specific types — hook enums, payload models, and the plugin base class — are owned by Mellea under a new `mellea/plugins/` subpackage. + + +## 1. Package Structure + +``` +mellea/plugins/ +├── __init__.py # Public API with try/except ImportError guard +├── _manager.py # Lazy singleton wrapper around PluginManager +├── _base.py # MelleaBasePayload, MelleaPlugin base class +├── _types.py # MelleaHookType enum + hook registration +├── _context.py # Plugins context factory helper +└── hooks/ + ├── __init__.py # Re-exports all payload classes + ├── session.py # session lifecycle payloads + ├── component.py # component lifecycle payloads + ├── generation.py # generation pipeline payloads + ├── validation.py # validation payloads + ├── sampling.py # sampling pipeline payloads + ├── tool.py # tool execution payloads + ├── adapter.py # adapter operation payloads + ├── context_ops.py # context operation payloads + └── error.py # error handling payload +``` + +## 2. ContextForge Plugin Framework — Key Interfaces Used + +The following types from `mcpgateway.plugins.framework` form the plumbing layer. Mellea uses these but does **not** import any ContextForge-specific hook types (prompts, tools, resources, agents, http). + +| Type | Role | +|------|------| +| `Plugin` | ABC base class. `__init__(config: PluginConfig)`, `initialize()`, `shutdown()`. Hook methods discovered by convention (method name = hook type) or `@hook()` decorator. Signature: `async def hook_name(self, payload, context) -> PluginResult`. | +| `PluginManager` | Borg singleton. `__init__(config_path, timeout, observability)`. Key methods: `invoke_hook(hook_type, payload, global_context, ...) -> (PluginResult, PluginContextTable)`, `has_hooks_for(hook_type) -> bool`, `initialize()`, `shutdown()`. | +| `PluginPayload` | Type alias for `pydantic.BaseModel`. Base type for all hook payloads. | +| `PluginResult[T]` | Generic result: `continue_processing: bool`, `modified_payload: T | None`, `violation: PluginViolation | None`, `metadata: dict`. | +| `PluginViolation` | `reason`, `description`, `code`, `details`. | +| `PluginConfig` | `name`, `kind`, `hooks`, `mode`, `priority`, `conditions`, `config`, ... | +| `PluginMode` | `ENFORCE`, `ENFORCE_IGNORE_ERROR`, `PERMISSIVE`, `DISABLED`. | +| `PluginContext` | `state: dict`, `global_context: GlobalContext`, `metadata: dict`. | +| `HookRegistry` | `get_hook_registry()`, `register_hook(hook_type, payload_class, result_class)`, `is_registered(hook_type)`. | +| `@hook` decorator | `@hook("hook_type")` or `@hook("hook_type", PayloadType, ResultType)` for custom method names. | + +### Class Diagram + +```mermaid +classDiagram + direction TB + + %% Core Plugin Classes + class Plugin { + <> + +__init__(config: PluginConfig) + +initialize()* async + +shutdown()* async + +hook_name(payload, context)* async PluginResult + } + + class PluginManager { + <> + -config_path: str + -timeout: int + -observability: Any + -hook_registry: HookRegistry + +__init__(config_path, timeout, observability) + +invoke_hook(hook_type, payload, global_context, ...) tuple~PluginResult, PluginContextTable~ + +has_hooks_for(hook_type: str) bool + +initialize() async + +shutdown() async + } + + %% Configuration + class PluginConfig { + +name: str + +kind: str + +hooks: list~str~ + +mode: PluginMode + +priority: int + +conditions: dict + +config: dict + } + + class PluginMode { + <> + ENFORCE + ENFORCE_IGNORE_ERROR + PERMISSIVE + DISABLED + } + + %% Payload & Result + class PluginPayload { + <> + pydantic.BaseModel + } + + class PluginResult~T~ { + +continue_processing: bool + +modified_payload: T | None + +violation: PluginViolation | None + +metadata: dict + } + + class PluginViolation { + +reason: str + +description: str + +code: str + +details: dict + } + + %% Context + class PluginContext { + +state: dict + +global_context: GlobalContext + +metadata: dict + } + + class hook { + <> + +__call__(hook_type: str) + +__call__(hook_type, PayloadType, ResultType) + } + + %% Relationships + Plugin --> PluginConfig : configured by + Plugin ..> PluginPayload : receives + Plugin ..> PluginResult : returns + Plugin ..> PluginContext : receives + Plugin ..> hook : decorated by + + PluginManager --> Plugin : manages 0..* + + PluginConfig --> PluginMode : has + + PluginResult --> PluginViolation : may contain + PluginResult --> PluginPayload : wraps modified +``` + +### YAML Plugin Configuration (reference) + +Plugins can also be configured programmatically without a YAML file. + +```yaml +plugins: + - name: content-policy + kind: mellea.plugins.examples.ContentPolicyPlugin + hooks: + - component_pre_create + - generation_post_call + mode: enforce + priority: 10 + config: + blocked_terms: ["term1", "term2"] + + - name: telemetry + kind: mellea.plugins.examples.TelemetryPlugin + hooks: + - component_post_success + - sampling_loop_end + mode: permissive + priority: 100 + config: + endpoint: "https://telemetry.example.com" +``` + +## 3. Core Types + +### 3.1 `MelleaHookType` enum (`mellea/plugins/_types.py`) + +A single `MelleaHookType(str, Enum)` containing all 27 hook types. String-based values for compatibility with ContextForge's `invoke_hook(hook_type: str, ...)`. + +```python +class MelleaHookType(str, Enum): + # Session Lifecycle + SESSION_PRE_INIT = "session_pre_init" + SESSION_POST_INIT = "session_post_init" + SESSION_RESET = "session_reset" + SESSION_CLEANUP = "session_cleanup" + + # Component Lifecycle + COMPONENT_PRE_CREATE = "component_pre_create" + COMPONENT_POST_CREATE = "component_post_create" + COMPONENT_PRE_EXECUTE = "component_pre_execute" + COMPONENT_POST_SUCCESS = "component_post_success" + COMPONENT_POST_ERROR = "component_post_error" + + # Generation Pipeline + GENERATION_PRE_CALL = "generation_pre_call" + GENERATION_POST_CALL = "generation_post_call" + GENERATION_STREAM_CHUNK = "generation_stream_chunk" + + # Validation + VALIDATION_PRE_CHECK = "validation_pre_check" + VALIDATION_POST_CHECK = "validation_post_check" + + # Sampling Pipeline + SAMPLING_LOOP_START = "sampling_loop_start" + SAMPLING_ITERATION = "sampling_iteration" + SAMPLING_REPAIR = "sampling_repair" + SAMPLING_LOOP_END = "sampling_loop_end" + + # Tool Execution + TOOL_PRE_INVOKE = "tool_pre_invoke" + TOOL_POST_INVOKE = "tool_post_invoke" + + # Backend Adapter Ops + ADAPTER_PRE_LOAD = "adapter_pre_load" + ADAPTER_POST_LOAD = "adapter_post_load" + ADAPTER_PRE_UNLOAD = "adapter_pre_unload" + ADAPTER_POST_UNLOAD = "adapter_post_unload" + + # Context Operations + CONTEXT_UPDATE = "context_update" + CONTEXT_PRUNE = "context_prune" + + # Error Handling + ERROR_OCCURRED = "error_occurred" +``` + +### 3.2 `MelleaBasePayload` (`mellea/plugins/_base.py`) + +All Mellea hook payloads inherit from this base, which extends `PluginPayload` with the common fields from the hook system spec (Section 2): + +```python +class MelleaBasePayload(PluginPayload): + model_config = ConfigDict(arbitrary_types_allowed=True) + + session_id: str + request_id: str + timestamp: datetime = Field(default_factory=datetime.utcnow) + hook: str + user_metadata: dict[str, Any] = Field(default_factory=dict) +``` + +`arbitrary_types_allowed=True` is required because payloads include non-serializable Mellea objects (`Backend`, `Context`, `Component`, `ModelOutputThunk`). This means external plugins cannot receive these payloads directly; they are designed for native in-process plugins. + +### 3.3 Hook Registration (`mellea/plugins/_types.py`) + +A `_register_mellea_hooks()` function registers all hook types with the ContextForge `HookRegistry`. Called once during plugin initialization. Idempotent via `is_registered()` check. Follows the same pattern used by ContextForge's own hook modules (e.g., `mcpgateway/plugins/framework/hooks/tools.py`). + +```python +def _register_mellea_hooks() -> None: + registry = get_hook_registry() + for hook_type, (payload_cls, result_cls) in _HOOK_REGISTRY.items(): + if not registry.is_registered(hook_type): + registry.register_hook(hook_type, payload_cls, result_cls) +``` + +### 3.4 Context Mapping (`mellea/plugins/_context.py`) + +The hook system spec defines domain-specific `PluginContext` fields (`session`, `backend`, `context`) that vary by hook category. ContextForge provides a generic `GlobalContext` with a `state: dict`. The mapping uses `GlobalContext.state` as the carrier for Mellea-specific context: + +```python +def build_global_context( + *, + session: MelleaSession | None = None, + backend: Backend | None = None, + context: Context | None = None, + request_id: str = "", + **extra_fields, +) -> GlobalContext: + state: dict[str, Any] = {} + if session is not None: + state["session"] = session + if backend is not None: + state["backend"] = backend + state["backend_name"] = getattr(backend, "model_id", "unknown") + if context is not None: + state["context"] = context + state.update(extra_fields) + return GlobalContext(request_id=request_id, state=state) +``` + +### 3.5 `MelleaPlugin` Base Class (`mellea/plugins/_base.py`) + +Extends ContextForge `Plugin` with typed context accessor helpers so plugin authors don't need to know about the `GlobalContext.state` mapping: + +```python +class MelleaPlugin(Plugin): + """Base class for Mellea plugins.""" + + def get_backend(self, context: PluginContext) -> Backend | None: + return context.global_context.state.get("backend") + + def get_mellea_context(self, context: PluginContext) -> Context | None: + return context.global_context.state.get("context") + + def get_session(self, context: PluginContext) -> MelleaSession | None: + return context.global_context.state.get("session") + + @property + def plugin_config(self) -> dict[str, Any]: + return self._config.config or {} +``` + +No new abstract methods. ContextForge's `initialize()` and `shutdown()` suffice. + + +## 4. Plugin Manager Integration (`mellea/plugins/_manager.py`) + +### 4.1 Lazy Singleton Wrapper + +```python +_plugin_manager: PluginManager | None = None +_plugins_enabled: bool = False + +def has_plugins() -> bool: + """Fast check: are plugins configured and available?""" + return _plugins_enabled + +def get_plugin_manager() -> PluginManager | None: + """Returns the initialized PluginManager, or None if plugins are not configured.""" + return _plugin_manager + +async def initialize_plugins( + config_path: str | None = None, *, timeout: float = 5.0 +) -> PluginManager: + """Initialize the PluginManager with Mellea hook registrations.""" + global _plugin_manager, _plugins_enabled + _register_mellea_hooks() + pm = PluginManager(config_path or "", timeout=int(timeout)) + await pm.initialize() + _plugin_manager = pm + _plugins_enabled = True + return pm + +async def shutdown_plugins() -> None: + """Shut down the PluginManager.""" + global _plugin_manager, _plugins_enabled + if _plugin_manager is not None: + await _plugin_manager.shutdown() + _plugin_manager = None + _plugins_enabled = False +``` + +### 4.2 `invoke_hook()` Central Helper + +All hook call sites use this single function. Three layers of no-op guards ensure zero overhead when plugins are not configured: + +1. **`_plugins_enabled` boolean** — module-level, a single pointer dereference +2. **`has_hooks_for(hook_type)`** — skips invocation when no plugin subscribes to this hook +3. **Returns `(None, original_payload)` immediately** when either guard fails + +```python +async def invoke_hook( + hook_type: MelleaHookType, + payload: MelleaBasePayload, + *, + session: MelleaSession | None = None, + backend: Backend | None = None, + context: Context | None = None, + request_id: str = "", + violations_as_exceptions: bool = True, + **context_fields, +) -> tuple[PluginResult | None, MelleaBasePayload]: + """Invoke a hook if plugins are configured. + + Returns (result, possibly-modified-payload). + If plugins are not configured, returns (None, original_payload) immediately. + """ + if not _plugins_enabled or _plugin_manager is None: + return None, payload + + if not _plugin_manager.has_hooks_for(hook_type.value): + return None, payload + + payload.hook = hook_type.value + if not payload.request_id: + payload.request_id = request_id + + global_ctx = build_global_context( + session=session, backend=backend, context=context, + request_id=request_id, **context_fields, + ) + + result, _ = await _plugin_manager.invoke_hook( + hook_type=hook_type.value, + payload=payload, + global_context=global_ctx, + violations_as_exceptions=violations_as_exceptions, + ) + + modified = result.modified_payload if result and result.modified_payload else payload + return result, modified +``` + +### 4.3 Session-Level Configuration + +`start_session()` in `mellea/stdlib/session.py` gains two optional keyword-only parameters: + +```python +def start_session( + ..., + plugin_config: str | None = None, # Path to plugin YAML config + plugin_manager: PluginManager | None = None, # Pre-configured manager +) -> MelleaSession: +``` + +If `plugin_manager` is provided, it is used directly. If `plugin_config` is a path, `initialize_plugins()` is called. Backward-compatible: existing code without these parameters sees no change. + +### 4.4 Dependency Management + +Add to `pyproject.toml` under `[project.optional-dependencies]`: + +```toml +plugins = ["contextforge-plugin-framework>=0.1.0"] +``` + +All imports in `mellea/plugins/` are guarded with `try/except ImportError`. + +## 5. Hook Call Sites + +### 5.1 Session Lifecycle + +**File**: `mellea/stdlib/session.py` + +| Hook | Location | Trigger | Result Handling | +|------|----------|---------|-----------------| +| `session_pre_init` | `start_session()`, before `backend_class(model_id, ...)` (~L163) | Before backend instantiation | Supports payload modification: updated `model_options`, `backend_name`. Violation blocks session creation. | +| `session_post_init` | `start_session()`, after `MelleaSession(backend, ctx)` (~L191) | Session fully created | Observability-only. | +| `session_reset` | `MelleaSession.reset()`, before `self.ctx.reset_to_new()` (~L269) | Context about to reset | Observability-only. | +| `session_cleanup` | `MelleaSession.cleanup()`, at top of method (~L272) | Before teardown | Observability-only. Must not raise. | + +**Sync/async bridge**: These are sync methods. Use `_run_async_in_thread(invoke_hook(...))` from `mellea/helpers/__init__.py`. + +**Payload examples**: + +```python +# session_pre_init +SessionPreInitPayload( + backend_name=backend_name, + model_id=str(model_id), + model_options=model_options, + backend_kwargs=backend_kwargs, + context_type=type(ctx).__name__ if ctx else "SimpleContext", +) + +# session_post_init +SessionPostInitPayload(session=session) + +# session_cleanup +SessionCleanupPayload( + context=self.ctx, + interaction_count=len(self.ctx.as_list()), +) +``` + +### 5.2 Component Lifecycle + +**File**: `mellea/stdlib/functional.py` + +| Hook | Location | Trigger | Result Handling | +|------|----------|---------|-----------------| +| `component_pre_create` | `instruct()` before `Instruction(...)` (~L200), `chat()` before `Message(...)` (~L244), `query()` (~L321), `transform()` (~L363), and async variants | Before component constructor | Supports payload modification: updated `description`, `requirements`. Violation blocks creation. | +| `component_post_create` | Same functions, after Component constructor, before `act()`/`aact()` | Component created | Supports `component` replacement. Primarily observability. | +| `component_pre_execute` | `aact()`, at top before strategy branch (~L492) | Before generation begins | Supports `action`, `model_options`, `requirements`, `strategy` modification. Violation blocks execution. | +| `component_post_success` | `aact()`, after result in both branches (~L506, ~L534) | Successful execution | Supports `result` modification (output transformation). Primarily observability. | +| `component_post_error` | `aact()`, in new `try/except Exception` wrapping the body | Exception during execution | Observability-only. Always re-raises after hook. | + +**Key changes to `aact()`**: +- Add `time.monotonic()` at entry for latency measurement +- Wrap body (lines ~492–546) in `try/except Exception` +- `except` handler: fire `component_post_error` then `error_occurred`, then re-raise +- Insert `component_post_success` before each `return` path + +**Payload examples**: + +```python +# component_pre_create (Instruction case) +ComponentPreCreatePayload( + component_type="Instruction", + description=description, + images=images, + requirements=requirements, + icl_examples=icl_examples, + grounding_context=grounding_context, +) + +# component_pre_execute +ComponentPreExecutePayload( + component_type=type(action).__name__, + action=action, + context=context, + requirements=requirements or [], + model_options=model_options or {}, + format=format, + strategy_name=type(strategy).__name__ if strategy else None, + tool_calls_enabled=tool_calls, +) + +# component_post_success +ComponentPostSuccessPayload( + component_type=type(action).__name__, + action=action, + result=result, + context_before=context, + context_after=new_ctx, + generate_log=result._generate_log, + sampling_results=sampling_result if strategy else None, + latency_ms=int((time.monotonic() - t0) * 1000), +) +``` + +### 5.3 Generation Pipeline + +**Approach**: Add a non-abstract `generate_from_context_with_hooks()` method to the `Backend` ABC in `mellea/core/backend.py`. This wraps the abstract `generate_from_context()` with pre/post hooks, avoiding modifications to all 6 backend implementations (Ollama, OpenAI, HuggingFace, vLLM, Watsonx, LiteLLM). + +**New method on `Backend`** (`mellea/core/backend.py`): + +```python +async def generate_from_context_with_hooks( + self, + action: Component | CBlock, + ctx: Context, + *, + format=None, + model_options=None, + tool_calls=False, +) -> tuple[ModelOutputThunk, Context]: + """Wraps generate_from_context with generation_pre_call / generation_post_call hooks.""" + from mellea.plugins._manager import invoke_hook, has_plugins + from mellea.plugins._types import MelleaHookType + from mellea.plugins.hooks.generation import GenerationPreCallPayload, GenerationPostCallPayload + + if has_plugins(): + pre_payload = GenerationPreCallPayload( + action=action, context=ctx, + model_options=model_options or {}, format=format, tools=None, + ) + result, pre_payload = await invoke_hook( + MelleaHookType.GENERATION_PRE_CALL, pre_payload, + backend=self, context=ctx, + ) + if result and result.modified_payload: + model_options = result.modified_payload.model_options + + t0 = time.monotonic() + out_result, new_ctx = await self.generate_from_context( + action, ctx, format=format, model_options=model_options, tool_calls=tool_calls, + ) + + if has_plugins(): + post_payload = GenerationPostCallPayload( + model_output=out_result, + latency_ms=int((time.monotonic() - t0) * 1000), + ) + await invoke_hook( + MelleaHookType.GENERATION_POST_CALL, post_payload, + backend=self, context=new_ctx, + ) + + return out_result, new_ctx +``` + +**Call site changes** : +- `mellea/stdlib/functional.py:aact()` line 499: `backend.generate_from_context(...)` → `backend.generate_from_context_with_hooks(...)` +- `mellea/stdlib/sampling/base.py:sample()` line ~163: same substitution + +| Hook | Location | Trigger | Result Handling | +|------|----------|---------|-----------------| +| `generation_pre_call` | `Backend.generate_from_context_with_hooks()`, before delegate | Before LLM API call | Supports `model_options` modification. Violation blocks (e.g., token budget exceeded). | +| `generation_post_call` | Same method, after delegate returns | After LLM response | Supports output modification (redaction). Primarily observability. | +| `generation_stream_chunk` | **Deferred to Phase 7** — requires hooks in `ModelOutputThunk.astream()` streaming path | Per streaming chunk | Fire-and-forget to avoid slowing streaming. | + +### 5.4 Validation + +**File**: `mellea/stdlib/functional.py`, in `avalidate()` (~L699–753) + +| Hook | Location | Trigger | Result Handling | +|------|----------|---------|-----------------| +| `validation_pre_check` | After `reqs` prepared (~L713), before validation loop | Before validation | Supports `requirements` list modification (inject/filter). | +| `validation_post_check` | After all validations, before `return rvs` (~L753) | After validation | Supports `results` override. Primarily observability. | + +**Payload examples**: + +```python +# validation_pre_check +ValidationPreCheckPayload( + requirements=reqs, + target=output, + context=context, + model_options=model_options or {}, +) + +# validation_post_check +ValidationPostCheckPayload( + requirements=reqs, + results=rvs, + all_passed=all(bool(r) for r in rvs), + passed_count=sum(1 for r in rvs if bool(r)), + failed_count=sum(1 for r in rvs if not bool(r)), +) +``` + +### 5.5 Sampling Pipeline + +**File**: `mellea/stdlib/sampling/base.py`, in `BaseSamplingStrategy.sample()` (~L94–256) + +| Hook | Location | Trigger | Result Handling | +|------|----------|---------|-----------------| +| `sampling_loop_start` | Before `for` loop (~L157) | Loop begins | Supports `loop_budget` modification. | +| `sampling_iteration` | Inside loop, after validation (~L192) | Each iteration | Observability. Violation can force early termination. | +| `sampling_repair` | After `self.repair()` call (~L224) | Repair invoked | Supports `repair_action`/`repair_context` modification. | +| `sampling_loop_end` | Before return in success (~L209) and failure (~L249) paths | Loop ends | Observability. Supports `final_result` override. | + +**Additional change**: Add `_get_repair_type() -> str` method to each sampling strategy subclass: + +| Strategy Class | Repair Type | +|---|---| +| `RejectionSamplingStrategy` | `"identity"` | +| `RepairTemplateStrategy` | `"template_repair"` | +| `MultiTurnStrategy` | `"multi_turn_message"` | +| `SOFAISamplingStrategy` | `"sofai_feedback"` | + +**Payload examples**: + +```python +# sampling_loop_start +SamplingLoopStartPayload( + strategy_name=type(self).__name__, + action=action, + context=context, + requirements=reqs, + loop_budget=self.loop_budget, +) + +# sampling_repair +SamplingRepairPayload( + repair_type=self._get_repair_type(), + failed_action=sampled_actions[-1], + failed_result=sampled_results[-1], + failed_validations=sampled_scores[-1], + repair_action=next_action, + repair_context=next_context, + repair_iteration=loop_count, +) +``` + +### 5.6 Tool Execution + +**File**: `mellea/stdlib/functional.py`, in the `_call_tools()` helper (~L904) + +| Hook | Location | Trigger | Result Handling | +|------|----------|---------|-----------------| +| `tool_pre_invoke` | Before `tool.call_func()` (~L917) | Before tool call | Supports `tool_args` modification. Violation blocks tool call. | +| `tool_post_invoke` | After `tool.call_func()` (~L919) | After tool call | Supports `tool_output` modification. Primarily observability. | + +### 5.7 Backend Adapter Operations + +**Files**: `mellea/backends/openai.py` (`load_adapter` ~L907, `unload_adapter` ~L944), `mellea/backends/huggingface.py` (`load_adapter` ~L1192, `unload_adapter` ~L1224) + +| Hook | Location | Trigger | Result Handling | +|------|----------|---------|-----------------| +| `adapter_pre_load` | Start of `load_adapter()` | Before adapter load | Violation prevents loading. | +| `adapter_post_load` | End of `load_adapter()` | After adapter loaded | Observability. | +| `adapter_pre_unload` | Start of `unload_adapter()` | Before adapter unload | Violation prevents unloading. | +| `adapter_post_unload` | End of `unload_adapter()` | After adapter unloaded | Observability. | + +**Sync/async bridge**: Adapter methods are synchronous. Use `_run_async_in_thread(invoke_hook(...))`. + +### 5.8 Context Operations + +**Files**: `mellea/stdlib/context.py` (`ChatContext.add()` ~L17, `SimpleContext.add()` ~L31) + +| Hook | Location | Trigger | Result Handling | +|------|----------|---------|-----------------| +| `context_update` | After `from_previous()` in `add()` | Context appended | Observability-only (context is immutable). | +| `context_prune` | `ChatContext.view_for_generation()` when window truncates | Context windowed | Observability-only. | + +**Performance note**: `context_update` fires on every context addition, which is frequent. The `has_hooks_for()` guard is critical — when no plugin subscribes to `context_update`, the overhead is a single boolean check. + +### 5.9 Error Handling + +**File**: `mellea/stdlib/functional.py` (utility function callable from any error path) + +| Hook | Location | Trigger | Result Handling | +|------|----------|---------|-----------------| +| `error_occurred` | `aact()` except block + utility `fire_error_hook()` | Unrecoverable error | Observability-only. Must never raise from own execution. | + +**Fires for**: `ComponentParseError`, backend communication errors, assertion violations, unhandled exceptions during component execution, validation, or tool invocation. + +**Does NOT fire for**: Validation failures within sampling loops (handled by `sampling_iteration`/`sampling_repair`), controlled `PluginViolation` blocks (those are policy decisions, not errors). + +**Utility function**: + +```python +async def fire_error_hook( + error: Exception, + location: str, + *, + session=None, backend=None, context=None, action=None, +) -> None: + """Fire the error_occurred hook. Never raises.""" + try: + payload = ErrorOccurredPayload( + error=error, + error_type=type(error).__name__, + error_location=location, + stack_trace=traceback.format_exc(), + recoverable=False, + action=action, + ) + await invoke_hook( + MelleaHookType.ERROR_OCCURRED, payload, + session=session, backend=backend, context=context, + violations_as_exceptions=False, + ) + except Exception: + pass # Never propagate errors from error hook +``` + + +## 8. Critical Files Summary + +| File | Changes | +|------|---------| +| `mellea/stdlib/functional.py` | ~12 hook insertions (component lifecycle, validation, tools, error) | +| `mellea/stdlib/session.py` | 4 session hooks + `plugin_config`/`plugin_manager` params on `start_session()` | +| `mellea/stdlib/sampling/base.py` | 4 sampling hooks + `generate_from_context` → `generate_from_context_with_hooks` | +| `mellea/core/backend.py` | Add `generate_from_context_with_hooks()` wrapper method to `Backend` ABC | +| `mellea/stdlib/context.py` | 2 context operation hooks in `ChatContext.add()`, `SimpleContext.add()` | +| `mellea/backends/openai.py` | 4 adapter hooks in `load_adapter()` / `unload_adapter()` | +| `mellea/backends/huggingface.py` | 4 adapter hooks in `load_adapter()` / `unload_adapter()` | +| `pyproject.toml` | Add `plugins` optional dependency + `plugins` test marker | +| `mellea/plugins/` (new) | Plugin subpackage | +| `test/plugins/` (new) | Tests for plugins subpackage | + +> Note: + update docs and add examples. From 211abcfeea8f3406ce43f59e22ce896e357e31e7 Mon Sep 17 00:00:00 2001 From: Frederico Araujo Date: Fri, 6 Feb 2026 01:11:43 -0500 Subject: [PATCH 05/10] docs: update implementation plan Signed-off-by: Frederico Araujo --- docs/dev/hook_system_implementation_plan.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/dev/hook_system_implementation_plan.md b/docs/dev/hook_system_implementation_plan.md index c871b9b57..e0b04c87c 100644 --- a/docs/dev/hook_system_implementation_plan.md +++ b/docs/dev/hook_system_implementation_plan.md @@ -1,4 +1,4 @@ -# Mellea Hook System — Implementation Plan +# Mellea Hook System Implementation Plan This document describes the implementation plan for the extensibility hook system specified in [`docs/dev/hook_system.md`](hook_system.md). The implementation uses the [ContextForge plugin framework](https://github.com/IBM/mcp-context-forge) (`mcpgateway.plugins.framework`) as an optional external dependency for core plumbing, while all Mellea-specific types — hook enums, payload models, and the plugin base class — are owned by Mellea under a new `mellea/plugins/` subpackage. From aac39081552dd41d318bcfd9c5b1539b76e32e95 Mon Sep 17 00:00:00 2001 From: Frederico Araujo Date: Fri, 6 Feb 2026 01:17:39 -0500 Subject: [PATCH 06/10] docs: minor cleanups to implementation plan Signed-off-by: Frederico Araujo --- docs/dev/hook_system_implementation_plan.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/dev/hook_system_implementation_plan.md b/docs/dev/hook_system_implementation_plan.md index e0b04c87c..c3d2ae500 100644 --- a/docs/dev/hook_system_implementation_plan.md +++ b/docs/dev/hook_system_implementation_plan.md @@ -25,7 +25,7 @@ mellea/plugins/ └── error.py # error handling payload ``` -## 2. ContextForge Plugin Framework — Key Interfaces Used +## 2. ContextForge Plugin Framework (Key Interfaces Used) The following types from `mcpgateway.plugins.framework` form the plumbing layer. Mellea uses these but does **not** import any ContextForge-specific hook types (prompts, tools, resources, agents, http). @@ -711,7 +711,7 @@ async def fire_error_hook( ``` -## 8. Critical Files Summary +## 8. Modifications Summary | File | Changes | |------|---------| From 53cd027f0c5c1589b3513f53156c4464af676371 Mon Sep 17 00:00:00 2001 From: Frederico Araujo Date: Sun, 15 Feb 2026 10:50:10 -0500 Subject: [PATCH 07/10] feat: update to reflect programmatic and functional-first design Signed-off-by: Frederico Araujo --- docs/dev/hook_system.md | 514 +++++++++++++------- docs/dev/hook_system_implementation_plan.md | 292 ++++++++++- 2 files changed, 615 insertions(+), 191 deletions(-) diff --git a/docs/dev/hook_system.md b/docs/dev/hook_system.md index 4ddec0c83..8fa78a607 100644 --- a/docs/dev/hook_system.md +++ b/docs/dev/hook_system.md @@ -10,24 +10,38 @@ Mellea's hook system provides extension points for deployed generative AI applic 1. **Consistent Interface**: All hooks follow the same async pattern with payload and context parameters 2. **Composable**: Multiple plugins can register for the same hook, executing in priority order 3. **Fail-safe**: Hook failures can be handled gracefully without breaking core execution -4. **Minimal Intrusion**: Plugins are opt-in; default Mellea behavior remains unchanged without plugins +4. **Minimal Intrusion**: Plugins are opt-in; default Mellea behavior remains unchanged without plugins. Plugins work identically whether invoked through a session (`m.instruct(...)`) or via the functional API (`instruct(backend, context, ...)`) 5. **Architecturally Aligned**: Hook categories reflect Mellea's true abstraction boundaries — Session lifecycle, Component lifecycle, and the (Backend, Context) generation pipeline +6. **Code-First**: Plugins are defined and composed in Python. Decorators are the primary registration mechanism; YAML configuration is a secondary option for deployment-time overrides +7. **Functions-First**: The simplest plugin is a plain async function decorated with `@hook`. Class-based plugins exist for stateful, multi-hook scenarios but are not required ### Hook Method Signature All hooks follow this consistent async pattern: ```python -async def hook_name( - self, +# Standalone function hook (primary) +@hook("hook_name", mode="enforce", priority=50) +async def my_hook( payload: PluginPayload, context: PluginContext -) -> PluginResult +) -> PluginResult | None + +# Class-based method hook +class MyPlugin(MelleaPlugin): + @hook("hook_name") + async def my_hook( + self, + payload: PluginPayload, + context: PluginContext + ) -> PluginResult | None ``` - **`payload`**: Mutable, strongly-typed data specific to the hook point - **`context`**: Read-only shared context with session metadata and utilities -- **Returns**: A result object with continuation flag, modified payload, and violation/explanation +- **`mode`**: `"enforce"` (default), `"permissive"`, or `"fire_and_forget"` — controls execution behavior (see Execution Mode below) +- **`priority`**: Lower numbers execute first (default: 50) +- **Returns**: A `PluginResult` with continuation flag, modified payload, and violation/explanation — or `None` to continue unchanged ### Concurrency Model @@ -37,23 +51,50 @@ Hooks use Python's `async`/`await` cooperative multitasking. Because Python's ev - **Race conditions only at `await` points**: Shared state is safe to read and write between `await` calls within a single hook. Races only arise if multiple hooks modify the same shared state and are dispatched concurrently. - **No preemptive interruption**: Unlike threads, a hook handler runs uninterrupted until it yields control via `await`. -### Execution Timing +### Execution Mode + +Hooks support three execution modes, configurable per-registration via the `mode` parameter on the `@hook` decorator: -Hooks support two execution timing modes, configurable per-registration: +| Mode | Behavior | +|------|----------| +| **`enforce`** (default) | Awaited inline. If the hook returns `PluginResult(continue_processing=False)`, execution is blocked. Use for policy enforcement, budget controls, and authorization. | +| **`permissive`** | Awaited inline. Violations are logged but do not block execution. Use for monitoring, auditing, and gradual rollout of policies. | +| **`fire_and_forget`** | Dispatched via `asyncio.create_task()` and runs in the background. The `PluginResult` is ignored — cannot modify payloads or block execution. Use for logging, telemetry, and non-critical side effects where latency matters more than ordering guarantees. | -- **Blocking** (default): The hook is awaited inline. Use for policy enforcement, payload transformation, and any hook that must complete before execution continues. -- **Fire-and-forget**: The hook is dispatched via `asyncio.create_task()` and runs in the background. Use for logging, telemetry, and non-critical side effects where latency matters more than ordering guarantees. +Fire-and-forget hooks receive the payload snapshot as it existed at dispatch time; `enforce`/`permissive` hooks in the same chain that execute earlier (higher priority) can modify the payload before fire-and-forget hooks see it. Any exceptions in fire-and-forget hooks are logged but do not propagate. -Fire-and-forget hooks cannot modify payloads or block execution — their `PluginResult` is ignored. Any exceptions in fire-and-forget hooks are logged but do not propagate. Fire-and-forget hooks receive the payload snapshot as it existed at dispatch time; blocking hooks in the same chain that execute earlier (higher priority) can modify the payload before fire-and-forget hooks see it. +> **Note**: All three modes (`enforce`, `permissive`, `fire_and_forget`) are supported by the ContextForge Plugin Framework's `PluginMode` enum. The additional modes `enforce_ignore_error` and `disabled` remain available in the `PluginMode` enum and YAML configuration for deployment-time control, but are not exposed as `@hook` decorator values. They are deployment concerns, not definition-time concerns. ### Plugin Framework The hook system is backed by a lightweight plugin framework built as a Mellea dependency (not a separate user-facing package). This framework: -- Provides APIs to define hook invocation points and base data objects for plugin payload, context, and result -- Exposes a base class and decorator to implement concrete plugins and register hook functions +- Provides the `@hook` decorator for registering standalone async functions as hook handlers +- Provides the `@plugin` decorator for marking plain classes as multi-hook plugins +- Provides the `MelleaPlugin` base class for stateful plugins that need lifecycle hooks (`initialize`/`shutdown`) and typed context accessors +- Exposes `PluginSet` for grouping related hooks/plugins into composable, reusable units +- Exposes `register()` for global plugin registration and `block()` as a convenience for returning blocking `PluginResult`s - Implements a plugin manager that loads, registers, and governs the execution of plugins +The public API surface: + +```python +from mellea.plugins import hook, plugin, block, PluginSet, register, MelleaPlugin +``` + +### Global vs Session-Scoped Plugins + +Plugins can be registered at two scopes: + +- **Global**: Registered via `register()` at module or application startup. Global plugins fire for every hook invocation — both session-based (`m.instruct(...)`) and functional (`instruct(backend, context, ...)`). +- **Session-scoped**: Passed via the `plugins` parameter to `start_session()`. Session-scoped plugins fire only for hook invocations within that session. + +Both scopes coexist. When a hook fires within a session, both global plugins and that session's plugins execute, ordered by priority. When a hook fires via the functional API outside a session, only global plugins execute. + +**Implementation**: A single `PluginManager` instance manages all plugins. Plugins are tagged with an optional `session_id`. At dispatch time, the manager filters: global plugins (no session tag) always run; session-tagged plugins run only when the dispatch context matches their session ID. + +**Functional API support**: The functional API (`instruct(backend, context, ...)`) does not require a session. Hooks still fire at the same execution points. If global plugins are registered, they execute. If no plugins are registered, hooks are no-ops with zero overhead. + ### Hook Invocation Responsibilities Hooks are called from Mellea's base classes (`Component.aact()`, `Backend.generate()`, `SamplingStrategy.run()`, etc.). This means hook invocation is a framework-level concern, and authors of new backends, sampling strategies, or components do not need to manually insert hook calls. @@ -88,7 +129,7 @@ All hook payloads inherit these base fields: ```python class BasePayload(PluginPayload): - session_id: str # Unique session identifier + session_id: str | None = None # Session identifier (None for functional API calls) request_id: str # Unique ID for this execution chain timestamp: datetime # When the event fired hook: str # Name of the hook (e.g., "generation_pre_call") @@ -1031,110 +1072,207 @@ return PluginResult( ## 7. Registration & Configuration -### Plugin Registration - -Plugins register programmatically or via YAML configuration: +### Public API -```yaml -plugins: - - name: content-policy - kind: mellea.plugins.ContentPolicyPlugin - hooks: - - component_pre_create - - generation_post_call - mode: enforce - execution: blocking - priority: 10 - config: - blocked_terms: ["term1", "term2"] +All plugin registration APIs are available from `mellea.plugins`: - - name: telemetry - kind: mellea.plugins.TelemetryPlugin - hooks: - - component_post_success - - validation_post_check - - sampling_loop_end - mode: permissive - execution: fire_and_forget - priority: 100 - config: - endpoint: "https://telemetry.example.com" +```python +from mellea.plugins import hook, plugin, block, PluginSet, register, MelleaPlugin ``` -### Execution Modes +### Standalone Function Hooks -- **`enforce`**: Block execution on violation -- **`enforce_ignore_error`**: Block on violation, but tolerate plugin errors -- **`permissive`**: Log violations without blocking -- **`disabled`**: Skip hook execution +The simplest way to define a hook handler is with the `@hook` decorator on a plain async function: -### Execution Timing +```python +from mellea.plugins import hook, block -- **`blocking`** (default): Hook is awaited inline before continuing -- **`fire_and_forget`**: Hook is dispatched as an `asyncio.create_task()` — cannot modify payloads or block execution +@hook("generation_pre_call", mode="enforce", priority=10) +async def enforce_budget(payload, ctx): + if (payload.estimated_tokens or 0) > 4000: + return block("Token budget exceeded") -### Priority +@hook("component_post_success", mode="fire_and_forget") +async def log_result(payload, ctx): + print(f"[{payload.component_type}] {payload.latency_ms}ms") +``` -- Lower numbers execute first -- Hooks with same priority may execute in parallel -- Default priority: 50 +**Parameters**: +- `hook_type: str` — the hook point name (required, first positional argument) +- `mode: str` — `"enforce"` (default), `"permissive"`, or `"fire_and_forget"` +- `priority: int` — lower numbers execute first (default: 50) + +The `block()` helper is shorthand for returning `PluginResult(continue_processing=False, violation=PluginViolation(reason=...))`. It accepts an optional `code`, `description`, and `details` for structured violation information. -### Convention-Based Registration +### Class-Based Plugins -Plugins can use method naming conventions: +For plugins that need shared state across multiple hooks, use the `@plugin` decorator on a class or subclass `MelleaPlugin`: + +**`@plugin` decorator** — marks a plain class as a multi-hook plugin: ```python -class MyPlugin(MelleaPlugin): - async def on_generation_pre_call(self, payload, context): - # Automatically registered for generation_pre_call +from mellea.plugins import plugin, hook + +@plugin("pii-redactor", priority=5) +class PIIRedactor: + def __init__(self, patterns: list[str] | None = None): + self.patterns = patterns or [] + + @hook("component_pre_create") + async def redact_input(self, payload, ctx): ... - async def on_validation_post_check(self, payload, context): - # Automatically registered for validation_post_check + @hook("generation_post_call") + async def redact_output(self, payload, ctx): ... ``` -### Programmatic Registration +The `@plugin` decorator accepts: +- `name: str` — plugin name (required, first positional argument) +- `priority: int` — default priority for all hooks in this plugin (default: 50). Individual `@hook` decorators on methods can override. + +**`MelleaPlugin` subclass** — for plugins that need lifecycle hooks (`initialize`/`shutdown`) or typed context accessors: ```python -class PIIRedactionPlugin(Plugin): - def name(self): - return "PII_Redactor" +from mellea.plugins import MelleaPlugin, hook - def register(self, hooks): - hooks.register("component_pre_create", self.redact_input) - hooks.register("generation_post_call", self.redact_output) +class MetricsPlugin(MelleaPlugin): + def __init__(self, endpoint: str): + super().__init__() + self.endpoint = endpoint + self._buffer = [] - async def redact_input(self, payload, context): - # Redact PII from input - ... + async def initialize(self): + self._client = await connect(self.endpoint) + + async def shutdown(self): + await self._client.flush(self._buffer) + await self._client.close() + + @hook("component_post_success") + async def collect(self, payload, ctx): + backend = self.get_backend(ctx) # typed accessor + self._buffer.append({"latency": payload.latency_ms}) ``` -### Session-Level Configuration +Convention-based registration (methods named `on_`) remains supported for `MelleaPlugin` subclasses. + +### Composing Plugins with PluginSet + +`PluginSet` groups related hooks and plugins for reuse across sessions: + +```python +from mellea.plugins import PluginSet + +security = PluginSet("security", [ + enforce_budget, + PIIRedactor(patterns=[r"\d{3}-\d{2}-\d{4}"]), +]) + +observability = PluginSet("observability", [ + log_result, + MetricsPlugin(endpoint="https://..."), +]) +``` + +`PluginSet` accepts standalone hook functions, `@plugin`-decorated class instances, and `MelleaPlugin` instances. PluginSets can be nested. + +### Global Registration + +Register plugins globally so they fire for all hook invocations — both session-based and functional API: + +```python +from mellea.plugins import register + +register(security) # single item +register([security, observability]) # list +register(enforce_budget) # standalone function +``` + +`register()` accepts a single item or a list. Items can be standalone hook functions, plugin instances, or `PluginSet`s. + +### Session-Scoped Registration + +Pass plugins to `start_session()` to scope them to that session: ```python m = mellea.start_session( - ..., - plugin_manager=pm, - hooks_enabled=["component_pre_create", "generation_post_call"] + backend_name="openai", + model_id="gpt-4", + plugins=[security, observability], ) ``` -### Global PluginManager +The `plugins` parameter accepts the same types as `register()`: standalone hook functions, plugin instances, and `PluginSet`s. These plugins fire only within this session, in addition to any globally registered plugins. They are automatically deregistered when the session is cleaned up. + +### Functional API (No Session) + +When using the functional API directly: + +```python +from mellea.stdlib.functional import instruct + +result = instruct(backend, context, "Extract the user's age") +``` + +Only globally registered plugins fire. If no global plugins are registered, hooks are no-ops with zero overhead. Session-scoped plugins do not apply because there is no session. + +### Priority + +- Lower numbers execute first +- Within the same priority, execution order is deterministic but unspecified +- Default priority: 50 +- Priority can be set on `@hook` (per-handler), `@plugin` (per-plugin default), or `PluginSet` (per-set default). Most specific wins: per-handler > per-plugin > per-set. + +### YAML Configuration (Secondary) + +For deployment-time configuration, plugins can also be loaded from YAML. This is useful for enabling/disabling plugins or changing priorities without code changes: + +```yaml +plugins: + - name: content-policy + kind: mellea.plugins.ContentPolicyPlugin + hooks: + - component_pre_create + - generation_post_call + mode: enforce + priority: 10 + config: + blocked_terms: ["term1", "term2"] + + - name: telemetry + kind: mellea.plugins.TelemetryPlugin + hooks: + - component_post_success + - validation_post_check + - sampling_loop_end + mode: fire_and_forget + priority: 100 + config: + endpoint: "https://telemetry.example.com" +``` + +### Execution Modes (YAML / PluginMode Enum) + +The following modes are available in the ContextForge `PluginMode` enum and YAML configuration: -The hook system uses a **singleton PluginManager** that is initialized once (typically at application startup via YAML config) and shared globally. Session-level configuration (e.g., `hooks_enabled`) is for scoped overrides — selectively enabling or disabling specific hooks for a particular session — not for owning or replacing the plugin manager. +- **`enforce`** (`PluginMode.ENFORCE`): Awaited inline, block execution on violation +- **`permissive`** (`PluginMode.PERMISSIVE`): Awaited inline, log violations without blocking +- **`fire_and_forget`** (`PluginMode.FIRE_AND_FORGET`): Background task, result ignored +- **`enforce_ignore_error`** (`PluginMode.ENFORCE_IGNORE_ERROR`): Like `enforce`, but tolerate plugin errors +- **`disabled`** (`PluginMode.DISABLED`): Skip hook execution -For the functional (non-session) path (e.g., calling `instruct()` or `generate()` directly without a `MelleaSession`), the PluginManager is accessed directly. Hooks still fire at the same points in the execution lifecycle; the only difference is that session-scoped overrides do not apply. +The `@hook` decorator exposes `enforce`, `permissive`, and `fire_and_forget` — all backed by ContextForge's `PluginMode` enum. The others (`enforce_ignore_error`, `disabled`) are deployment-time concerns configured via YAML or programmatic `PluginConfig`. ### Custom Hook Types The plugin framework supports custom hook types for domain-specific extension points beyond the built-in lifecycle hooks. This is particularly relevant for agentic patterns (ReAct, tool-use loops, etc.) where the execution flow is application-defined. -Custom hooks are registered using the `@hook` decorator: +Custom hooks use the same `@hook` decorator: ```python -@hook("react_pre_reasoning", ReactReasoningPayload, ReactReasoningResult) -async def before_reasoning(self, payload, context): +@hook("react_pre_reasoning") +async def before_reasoning(payload, ctx): ... ``` @@ -1142,126 +1280,158 @@ Custom hooks follow the same calling convention, payload chaining, and result se ## 8. Example Implementations -### Content Policy Plugin +### Token Budget Enforcement (Standalone Function) ```python -class ContentPolicyPlugin(MelleaPlugin): - async def on_component_pre_create( - self, - payload: ComponentPreCreatePayload, - context: PluginContext - ) -> PluginResult | None: - # Only enforce on Instructions and GenerativeSlots - if payload.component_type not in ("Instruction", "GenerativeSlot"): - return None +from mellea.plugins import hook, block + +@hook("generation_pre_call", mode="enforce", priority=10) +async def enforce_token_budget(payload, ctx): + budget = 4000 + estimated = payload.estimated_tokens or 0 + if estimated > budget: + return block( + f"Estimated {estimated} tokens exceeds budget of {budget}", + code="TOKEN_BUDGET_001", + details={"estimated": estimated, "budget": budget}, + ) +``` - blocked_terms = self.config.get("blocked_terms", []) +### Content Policy (Standalone Function) - for term in blocked_terms: - if term.lower() in payload.description.lower(): - return PluginResult( - continue_processing=False, - violation=PluginViolation( - reason="Blocked content detected", - description=f"Component contains blocked term: {term}", - code="CONTENT_POLICY_001" - ) +```python +from mellea.plugins import hook, block - async def on_generation_post_call( - self, - payload: GenerationPostCallPayload, - context: PluginContext - ) -> PluginResult | None: - # Redact sensitive patterns from output - redacted = self._redact_pii(payload.processed_output) +BLOCKED_TERMS = ["term1", "term2"] - if redacted != payload.processed_output: - payload.processed_output = redacted - return PluginResult(modified_payload=payload) +@hook("component_pre_create", mode="enforce", priority=10) +async def enforce_content_policy(payload, ctx): + # Only enforce on Instructions and GenerativeSlots + if payload.component_type not in ("Instruction", "GenerativeSlot"): + return None + + for term in BLOCKED_TERMS: + if term.lower() in payload.description.lower(): + return block( + f"Component contains blocked term: {term}", + code="CONTENT_POLICY_001", + ) ``` -### Audit Logging Plugin +### Audit Logger (Fire-and-Forget) ```python -class AuditLoggingPlugin(MelleaPlugin): - async def on_component_post_success( - self, - payload: ComponentPostSuccessPayload, - context: PluginContext - ) -> PluginResult | None: - self._log_audit_event({ - "event": "generation_success", - "session_id": payload.session_id, - "component_type": payload.component_type, - "latency_ms": payload.latency_ms, - "token_usage": context.get("token_usage"), - "timestamp": payload.timestamp.isoformat() - }) - - - async def on_component_post_error( - self, - payload: ComponentPostErrorPayload, - context: PluginContext - ) -> PluginResult | None: - self._log_audit_event({ - "event": "generation_error", - "session_id": payload.session_id, - "component_type": payload.component_type, - "error_type": payload.error_type, - "stack_trace": payload.stack_trace, - "timestamp": payload.timestamp.isoformat() - }) +from mellea.plugins import hook + +@hook("component_post_success", mode="fire_and_forget") +async def audit_log_success(payload, ctx): + await send_to_audit_service({ + "event": "generation_success", + "session_id": payload.session_id, + "component_type": payload.component_type, + "latency_ms": payload.latency_ms, + "timestamp": payload.timestamp.isoformat(), + }) + +@hook("component_post_error", mode="fire_and_forget") +async def audit_log_error(payload, ctx): + await send_to_audit_service({ + "event": "generation_error", + "session_id": payload.session_id, + "error_type": payload.error_type, + "timestamp": payload.timestamp.isoformat(), + }) ``` -### Token Budget Plugin +### PII Redaction Plugin (Class-Based with `@plugin`) ```python -class TokenBudgetPlugin(MelleaPlugin): - async def on_generation_pre_call( - self, - payload: GenerationPreCallPayload, - context: PluginContext - ) -> PluginResult | None: - budget = self.config.get("max_tokens_per_request", 4000) - estimated = payload.estimated_tokens or 0 - - if estimated > budget: - return PluginResult( - continue_processing=False, - violation=PluginViolation( - reason="Token budget exceeded", - description=f"Estimated {estimated} tokens exceeds budget of {budget}", - code="TOKEN_BUDGET_001", - details={"estimated": estimated, "budget": budget} - ) +import re +from mellea.plugins import plugin, hook, PluginResult + +@plugin("pii-redactor", priority=5) +class PIIRedactor: + def __init__(self, patterns: list[str] | None = None): + self.patterns = patterns or [r"\d{3}-\d{2}-\d{4}"] + + @hook("component_pre_create") + async def redact_input(self, payload, ctx): + redacted = self._redact(payload.description) + if redacted != payload.description: + payload.description = redacted + return PluginResult(continue_processing=True, modified_payload=payload) + + @hook("generation_post_call") + async def redact_output(self, payload, ctx): + redacted = self._redact(payload.processed_output) + if redacted != payload.processed_output: + payload.processed_output = redacted + return PluginResult(continue_processing=True, modified_payload=payload) + + def _redact(self, text: str) -> str: + for pattern in self.patterns: + text = re.sub(pattern, "[REDACTED]", text) + return text ``` -### Generative Slot Profiler +### Generative Slot Profiler (`MelleaPlugin` Subclass) ```python -class SlotProfilerPlugin(MelleaPlugin): +from collections import defaultdict +from mellea.plugins import MelleaPlugin, hook + +class SlotProfiler(MelleaPlugin): + """Uses MelleaPlugin for lifecycle hooks and typed context accessors.""" + def __init__(self): - self._stats = defaultdict(lambda: {"calls": 0, "total_ms": 0, "errors": 0}) + super().__init__() + self._stats = defaultdict(lambda: {"calls": 0, "total_ms": 0}) - async def on_component_post_success( - self, - payload: ComponentPostSuccessPayload, - context: PluginContext - ) -> PluginResult | None: - # Only profile GenerativeSlot components + async def initialize(self): + # Called once when the plugin manager starts + self._stats.clear() + + @hook("component_post_success") + async def profile(self, payload, ctx): if payload.component_type != "GenerativeSlot": return None - stats = self._stats[payload.action.__name__] stats["calls"] += 1 stats["total_ms"] += payload.latency_ms +``` - context.emit_metric( - "slot_latency_ms", - payload.latency_ms, - tags={"slot": payload.action.__name__, "success": True} - ) +### Composition Example + +```python +from mellea.plugins import PluginSet, register +import mellea + +# Group by concern +security = PluginSet("security", [ + enforce_token_budget, + enforce_content_policy, + PIIRedactor(patterns=[r"\d{3}-\d{2}-\d{4}"]), +]) + +observability = PluginSet("observability", [ + audit_log_success, + audit_log_error, + SlotProfiler(), +]) + +# Global: fires for all invocations (session and functional API) +register(observability) + +# Session-scoped: security only for this session +m = mellea.start_session( + backend_name="openai", + model_id="gpt-4", + plugins=[security], +) + +# Functional API: only global plugins (observability) fire +from mellea.stdlib.functional import instruct +result = instruct(backend, context, "Extract the user's age") ``` diff --git a/docs/dev/hook_system_implementation_plan.md b/docs/dev/hook_system_implementation_plan.md index c3d2ae500..259d50126 100644 --- a/docs/dev/hook_system_implementation_plan.md +++ b/docs/dev/hook_system_implementation_plan.md @@ -2,16 +2,23 @@ This document describes the implementation plan for the extensibility hook system specified in [`docs/dev/hook_system.md`](hook_system.md). The implementation uses the [ContextForge plugin framework](https://github.com/IBM/mcp-context-forge) (`mcpgateway.plugins.framework`) as an optional external dependency for core plumbing, while all Mellea-specific types — hook enums, payload models, and the plugin base class — are owned by Mellea under a new `mellea/plugins/` subpackage. +The primary developer-facing API is Python decorators (`@hook`, `@plugin`) and programmatic registration (`register()`, `PluginSet`). YAML configuration is supported as a secondary mechanism for deployment-time overrides. Plugins work identically whether invoked through a session or via the functional API (`instruct(backend, context, ...)`). + +**Note**: The plugin framework is in the process of being extracted as a standalone Python package. Once completed, the package import path prefix will look like `cpex.framework`. + ## 1. Package Structure ``` mellea/plugins/ -├── __init__.py # Public API with try/except ImportError guard -├── _manager.py # Lazy singleton wrapper around PluginManager +├── __init__.py # Public API: hook, plugin, block, PluginSet, register, MelleaPlugin +├── _manager.py # Singleton wrapper + session-tag filtering ├── _base.py # MelleaBasePayload, MelleaPlugin base class ├── _types.py # MelleaHookType enum + hook registration -├── _context.py # Plugins context factory helper +├── _context.py # Plugin context factory helper +├── _decorators.py # @hook and @plugin decorator implementations +├── _pluginset.py # PluginSet class +├── _registry.py # register(), block() helpers + global/session dispatch logic └── hooks/ ├── __init__.py # Re-exports all payload classes ├── session.py # session lifecycle payloads @@ -37,7 +44,7 @@ The following types from `mcpgateway.plugins.framework` form the plumbing layer. | `PluginResult[T]` | Generic result: `continue_processing: bool`, `modified_payload: T | None`, `violation: PluginViolation | None`, `metadata: dict`. | | `PluginViolation` | `reason`, `description`, `code`, `details`. | | `PluginConfig` | `name`, `kind`, `hooks`, `mode`, `priority`, `conditions`, `config`, ... | -| `PluginMode` | `ENFORCE`, `ENFORCE_IGNORE_ERROR`, `PERMISSIVE`, `DISABLED`. | +| `PluginMode` | `ENFORCE`, `ENFORCE_IGNORE_ERROR`, `PERMISSIVE`, `FIRE_AND_FORGET`, `DISABLED`. | | `PluginContext` | `state: dict`, `global_context: GlobalContext`, `metadata: dict`. | | `HookRegistry` | `get_hook_registry()`, `register_hook(hook_type, payload_class, result_class)`, `is_registered(hook_type)`. | | `@hook` decorator | `@hook("hook_type")` or `@hook("hook_type", PayloadType, ResultType)` for custom method names. | @@ -86,6 +93,7 @@ classDiagram ENFORCE ENFORCE_IGNORE_ERROR PERMISSIVE + FIRE_AND_FORGET DISABLED } @@ -139,7 +147,7 @@ classDiagram ### YAML Plugin Configuration (reference) -Plugins can also be configured programmatically without a YAML file. +Plugins can also be configured via YAML as a secondary mechanism. Programmatic registration via `@hook`, `@plugin`, and `register()` is the primary approach. ```yaml plugins: @@ -149,6 +157,7 @@ plugins: - component_pre_create - generation_post_call mode: enforce + mode: enforce priority: 10 config: blocked_terms: ["term1", "term2"] @@ -158,12 +167,21 @@ plugins: hooks: - component_post_success - sampling_loop_end - mode: permissive + mode: fire_and_forget priority: 100 config: endpoint: "https://telemetry.example.com" ``` +### Mellea Wrapper Layer + +Mellea exposes its own `@hook` and `@plugin` decorators that translate to ContextForge registrations internally. This serves two purposes: + +1. **Mellea-aligned API**: The `@hook` decorator accepts a `mode` parameter with three string values (`"enforce"`, `"permissive"`, `"fire_and_forget"`) that map directly to ContextForge's `PluginMode` enum (`ENFORCE`, `PERMISSIVE`, `FIRE_AND_FORGET`), matching Mellea's code-first ergonomics without requiring users to import the enum. +2. **Session tagging**: Mellea's wrapper adds session-scoping metadata that ContextForge's `PluginManager` does not natively support. The `_manager.py` layer filters hooks at dispatch time based on session tags. + +Users never import from `mcpgateway.plugins.framework` directly. + ## 3. Core Types ### 3.1 `MelleaHookType` enum (`mellea/plugins/_types.py`) @@ -226,7 +244,7 @@ All Mellea hook payloads inherit from this base, which extends `PluginPayload` w class MelleaBasePayload(PluginPayload): model_config = ConfigDict(arbitrary_types_allowed=True) - session_id: str + session_id: str | None = None request_id: str timestamp: datetime = Field(default_factory=datetime.utcnow) hook: str @@ -274,11 +292,13 @@ def build_global_context( ### 3.5 `MelleaPlugin` Base Class (`mellea/plugins/_base.py`) +`MelleaPlugin` is one of three ways to define plugins, alongside `@hook` on standalone functions (primary) and `@plugin` on plain classes. Use `MelleaPlugin` when you need lifecycle hooks (`initialize`/`shutdown`) or typed context accessors. + Extends ContextForge `Plugin` with typed context accessor helpers so plugin authors don't need to know about the `GlobalContext.state` mapping: ```python class MelleaPlugin(Plugin): - """Base class for Mellea plugins.""" + """Base class for Mellea plugins with lifecycle hooks and typed accessors.""" def get_backend(self, context: PluginContext) -> Backend | None: return context.global_context.state.get("backend") @@ -296,14 +316,131 @@ class MelleaPlugin(Plugin): No new abstract methods. ContextForge's `initialize()` and `shutdown()` suffice. +### 3.6 `@hook` Decorator (`mellea/plugins/_decorators.py`) + +The `@hook` decorator works on both standalone async functions and class methods: + +```python +@dataclass(frozen=True) +class HookMeta: + hook_type: str + mode: Literal["enforce", "permissive", "fire_and_forget"] = "enforce" + priority: int = 50 + +def hook( + hook_type: str, + *, + mode: Literal["enforce", "permissive", "fire_and_forget"] = "enforce", + priority: int = 50, +) -> Callable: + """Register an async function or method as a hook handler.""" + def decorator(fn): + fn._mellea_hook_meta = HookMeta( + hook_type=hook_type, + mode=mode, + priority=priority, + ) + return fn + return decorator +``` + +The `mode` parameter controls both execution strategy and result handling. These map directly to ContextForge's `PluginMode` enum: +- `"enforce"` → `PluginMode.ENFORCE` / `"permissive"` → `PluginMode.PERMISSIVE`: Hook is awaited inline (blocking). Difference is whether violations halt execution or are logged only. +- `"fire_and_forget"` → `PluginMode.FIRE_AND_FORGET`: Hook is dispatched as a background `asyncio.create_task()`. Result is ignored. This is handled by ContextForge's `PluginManager` dispatch logic. + +When used on a standalone function, the metadata is read at `register()` time or when passed to `start_session(plugins=[...])`. When used on a class method, it is discovered during class registration (either via `@plugin` or `MelleaPlugin` introspection). + +### 3.7 `@plugin` Decorator (`mellea/plugins/_decorators.py`) + +The `@plugin` decorator marks a plain class as a multi-hook plugin: + +```python +@dataclass(frozen=True) +class PluginMeta: + name: str + priority: int = 50 + +def plugin( + name: str, + *, + priority: int = 50, +) -> Callable: + """Mark a class as a Mellea plugin.""" + def decorator(cls): + cls._mellea_plugin_meta = PluginMeta( + name=name, + priority=priority, + ) + return cls + return decorator +``` + +On registration, all methods with `_mellea_hook_meta` are discovered and registered as hook handlers bound to the instance. Methods without `@hook` are ignored. + +### 3.8 `PluginSet` (`mellea/plugins/_pluginset.py`) + +A named, composable group of hook functions and plugin instances: + +```python +class PluginSet: + def __init__( + self, + name: str, + items: list[Callable | Any | "PluginSet"], + *, + priority: int | None = None, + ): + self.name = name + self.items = items + self.priority = priority + + def flatten(self) -> list[tuple[Callable | Any, int | None]]: + """Recursively flatten nested PluginSets into (item, priority_override) pairs.""" + result = [] + for item in self.items: + if isinstance(item, PluginSet): + result.extend(item.flatten()) + else: + result.append((item, self.priority)) + return result +``` + +PluginSets are inert containers — they do not register anything themselves. Registration happens when they are passed to `register()` or `start_session(plugins=[...])`. + +### 3.9 `block()` Helper (`mellea/plugins/_registry.py`) + +Convenience function for returning a blocking result from a hook: + +```python +def block( + reason: str, + *, + code: str = "", + description: str = "", + details: dict[str, Any] | None = None, +) -> PluginResult: + return PluginResult( + continue_processing=False, + violation=PluginViolation( + reason=reason, + description=description or reason, + code=code, + details=details or {}, + ), + ) +``` + ## 4. Plugin Manager Integration (`mellea/plugins/_manager.py`) ### 4.1 Lazy Singleton Wrapper +The `PluginManager` is lazily initialized on first use (either via `register()` or `start_session(plugins=[...])`). A config path is no longer required — code-first registration may be the only path. + ```python _plugin_manager: PluginManager | None = None _plugins_enabled: bool = False +_session_tags: dict[str, set[str]] = {} # session_id -> set of plugin keys def has_plugins() -> bool: """Fast check: are plugins configured and available?""" @@ -313,10 +450,21 @@ def get_plugin_manager() -> PluginManager | None: """Returns the initialized PluginManager, or None if plugins are not configured.""" return _plugin_manager +def _ensure_plugin_manager() -> PluginManager: + """Lazily initialize the PluginManager if not already created.""" + global _plugin_manager, _plugins_enabled + if _plugin_manager is None: + _register_mellea_hooks() + pm = PluginManager("", timeout=5) + _run_async_in_thread(pm.initialize()) + _plugin_manager = pm + _plugins_enabled = True + return _plugin_manager + async def initialize_plugins( config_path: str | None = None, *, timeout: float = 5.0 ) -> PluginManager: - """Initialize the PluginManager with Mellea hook registrations.""" + """Initialize the PluginManager with Mellea hook registrations and optional YAML config.""" global _plugin_manager, _plugins_enabled _register_mellea_hooks() pm = PluginManager(config_path or "", timeout=int(timeout)) @@ -327,11 +475,12 @@ async def initialize_plugins( async def shutdown_plugins() -> None: """Shut down the PluginManager.""" - global _plugin_manager, _plugins_enabled + global _plugin_manager, _plugins_enabled, _session_tags if _plugin_manager is not None: await _plugin_manager.shutdown() _plugin_manager = None _plugins_enabled = False + _session_tags.clear() ``` ### 4.2 `invoke_hook()` Central Helper @@ -342,11 +491,14 @@ All hook call sites use this single function. Three layers of no-op guards ensur 2. **`has_hooks_for(hook_type)`** — skips invocation when no plugin subscribes to this hook 3. **Returns `(None, original_payload)` immediately** when either guard fails +When `session_id` is provided, the manager invokes both global plugins (those registered without a session tag) and session-scoped plugins matching that session ID. When `session_id` is `None` (functional API path), only global plugins are invoked. + ```python async def invoke_hook( hook_type: MelleaHookType, payload: MelleaBasePayload, *, + session_id: str | None = None, session: MelleaSession | None = None, backend: Backend | None = None, context: Context | None = None, @@ -358,6 +510,10 @@ async def invoke_hook( Returns (result, possibly-modified-payload). If plugins are not configured, returns (None, original_payload) immediately. + + When session_id is provided, both global plugins and session-scoped + plugins matching that session ID are invoked. When session_id is None + (functional API path), only global plugins are invoked. """ if not _plugins_enabled or _plugin_manager is None: return None, payload @@ -366,12 +522,13 @@ async def invoke_hook( return None, payload payload.hook = hook_type.value + payload.session_id = session_id if not payload.request_id: payload.request_id = request_id global_ctx = build_global_context( session=session, backend=backend, context=context, - request_id=request_id, **context_fields, + request_id=request_id, session_id=session_id, **context_fields, ) result, _ = await _plugin_manager.invoke_hook( @@ -385,19 +542,18 @@ async def invoke_hook( return result, modified ``` -### 4.3 Session-Level Configuration +### 4.3 Session-Scoped Registration -`start_session()` in `mellea/stdlib/session.py` gains two optional keyword-only parameters: +`start_session()` in `mellea/stdlib/session.py` gains an optional `plugins` keyword parameter: ```python def start_session( ..., - plugin_config: str | None = None, # Path to plugin YAML config - plugin_manager: PluginManager | None = None, # Pre-configured manager + plugins: list[Callable | Any | PluginSet] | None = None, ) -> MelleaSession: ``` -If `plugin_manager` is provided, it is used directly. If `plugin_config` is a path, `initialize_plugins()` is called. Backward-compatible: existing code without these parameters sees no change. +When `plugins` is provided, `start_session()` registers each item with the session's ID via `register(items, session_id=session.id)`. These plugins fire only within this session, in addition to any globally registered plugins. They are automatically deregistered when the session is cleaned up (at `session_cleanup`). ### 4.4 Dependency Management @@ -409,18 +565,108 @@ plugins = ["contextforge-plugin-framework>=0.1.0"] All imports in `mellea/plugins/` are guarded with `try/except ImportError`. +### 4.5 Global Registration (`mellea/plugins/_registry.py`) + +Global registration happens via `register()` at application startup: + +```python +def register( + items: Callable | Any | PluginSet | list[Callable | Any | PluginSet], + *, + session_id: str | None = None, +) -> None: + """Register plugins globally or for a specific session. + + When session_id is None, plugins are global (fire for all invocations). + When session_id is provided, plugins fire only within that session. + + Accepts standalone @hook functions, @plugin-decorated class instances, + MelleaPlugin instances, PluginSets, or lists thereof. + """ + pm = _ensure_plugin_manager() + + if not isinstance(items, list): + items = [items] + + for item in items: + if isinstance(item, PluginSet): + for flattened_item, priority_override in item.flatten(): + _register_single(pm, flattened_item, session_id, priority_override) + else: + _register_single(pm, item, session_id, None) + + +def _register_single( + pm: PluginManager, + item: Callable | Any, + session_id: str | None, + priority_override: int | None, +) -> None: + """Register a single hook function or plugin instance. + + - Standalone functions with _mellea_hook_meta: wrapped in _FunctionHookAdapter + - @plugin-decorated class instances: methods with _mellea_hook_meta discovered and registered + - MelleaPlugin instances: registered directly with ContextForge + """ + ... +``` + +A `_FunctionHookAdapter` internal class wraps a standalone `@hook`-decorated function into a ContextForge `Plugin` for the `PluginManager`: + +```python +class _FunctionHookAdapter(Plugin): + """Adapts a standalone @hook-decorated function into a ContextForge Plugin.""" + + def __init__(self, fn: Callable, session_id: str | None = None): + meta = fn._mellea_hook_meta + config = PluginConfig( + name=fn.__qualname__, + kind=fn.__module__ + "." + fn.__qualname__, + hooks=[meta.hook_type], + mode=_map_mode(meta.mode), + priority=meta.priority, + ) + super().__init__(config) + self._fn = fn + self._session_id = session_id + + async def initialize(self): + pass + + async def shutdown(self): + pass +``` + ## 5. Hook Call Sites +**Session context threading**: All `invoke_hook` calls pass `session_id` when operating within a session. For the functional API path, `session_id` is `None` and only globally registered plugins are dispatched. Session-scoped plugins (registered via `start_session(plugins=[...])`) fire only when the dispatch context matches their session ID. + ### 5.1 Session Lifecycle **File**: `mellea/stdlib/session.py` +`start_session()` gains the `plugins` parameter for session-scoped registration: + +```python +def start_session( + backend_name: ... = "ollama", + model_id: ... = IBM_GRANITE_4_MICRO_3B, + ctx: Context | None = None, + *, + model_options: dict | None = None, + plugins: list[Callable | Any | PluginSet] | None = None, + **backend_kwargs, +) -> MelleaSession: +``` + +Session-scoped plugins passed via `plugins=[...]` are registered with this session's ID and deregistered at `session_cleanup`. + | Hook | Location | Trigger | Result Handling | |------|----------|---------|-----------------| | `session_pre_init` | `start_session()`, before `backend_class(model_id, ...)` (~L163) | Before backend instantiation | Supports payload modification: updated `model_options`, `backend_name`. Violation blocks session creation. | | `session_post_init` | `start_session()`, after `MelleaSession(backend, ctx)` (~L191) | Session fully created | Observability-only. | | `session_reset` | `MelleaSession.reset()`, before `self.ctx.reset_to_new()` (~L269) | Context about to reset | Observability-only. | -| `session_cleanup` | `MelleaSession.cleanup()`, at top of method (~L272) | Before teardown | Observability-only. Must not raise. | +| `session_cleanup` | `MelleaSession.cleanup()`, at top of method (~L272) | Before teardown | Observability-only. Must not raise. Deregisters session-scoped plugins. | **Sync/async bridge**: These are sync methods. Use `_run_async_in_thread(invoke_hook(...))` from `mellea/helpers/__init__.py`. @@ -716,14 +962,22 @@ async def fire_error_hook( | File | Changes | |------|---------| | `mellea/stdlib/functional.py` | ~12 hook insertions (component lifecycle, validation, tools, error) | -| `mellea/stdlib/session.py` | 4 session hooks + `plugin_config`/`plugin_manager` params on `start_session()` | +| `mellea/stdlib/session.py` | 4 session hooks + `plugins` param on `start_session()` + session-scoped plugin registration/deregistration | | `mellea/stdlib/sampling/base.py` | 4 sampling hooks + `generate_from_context` → `generate_from_context_with_hooks` | | `mellea/core/backend.py` | Add `generate_from_context_with_hooks()` wrapper method to `Backend` ABC | | `mellea/stdlib/context.py` | 2 context operation hooks in `ChatContext.add()`, `SimpleContext.add()` | | `mellea/backends/openai.py` | 4 adapter hooks in `load_adapter()` / `unload_adapter()` | | `mellea/backends/huggingface.py` | 4 adapter hooks in `load_adapter()` / `unload_adapter()` | | `pyproject.toml` | Add `plugins` optional dependency + `plugins` test marker | -| `mellea/plugins/` (new) | Plugin subpackage | +| `mellea/plugins/__init__.py` (new) | Public API: `hook`, `plugin`, `block`, `PluginSet`, `register`, `MelleaPlugin` | +| `mellea/plugins/_decorators.py` (new) | `@hook` and `@plugin` decorator implementations, `HookMeta`, `PluginMeta` | +| `mellea/plugins/_pluginset.py` (new) | `PluginSet` class with `flatten()` for recursive expansion | +| `mellea/plugins/_registry.py` (new) | `register()`, `block()`, `_FunctionHookAdapter`, `_register_single()` | +| `mellea/plugins/_manager.py` (new) | Singleton wrapper, `invoke_hook()` with session-tag filtering, `_ensure_plugin_manager()` | +| `mellea/plugins/_base.py` (new) | `MelleaBasePayload`, `MelleaPlugin` base class | +| `mellea/plugins/_types.py` (new) | `MelleaHookType` enum, `_register_mellea_hooks()` | +| `mellea/plugins/_context.py` (new) | `build_global_context()` factory | +| `mellea/plugins/hooks/` (new) | Hook payload dataclasses (session, component, generation, etc.) | | `test/plugins/` (new) | Tests for plugins subpackage | > Note: + update docs and add examples. From 2ef5ebf584fec26b4f9ca3aed0b0b722dda36733 Mon Sep 17 00:00:00 2001 From: Frederico Araujo Date: Tue, 17 Feb 2026 07:50:09 -0500 Subject: [PATCH 08/10] feat: specify hook payload write protection Signed-off-by: Frederico Araujo --- docs/dev/hook_system.md | 162 ++++++++++++++++++-- docs/dev/hook_system_implementation_plan.md | 150 ++++++++++++++++-- 2 files changed, 285 insertions(+), 27 deletions(-) diff --git a/docs/dev/hook_system.md b/docs/dev/hook_system.md index 8fa78a607..c5247d611 100644 --- a/docs/dev/hook_system.md +++ b/docs/dev/hook_system.md @@ -37,7 +37,7 @@ class MyPlugin(MelleaPlugin): ) -> PluginResult | None ``` -- **`payload`**: Mutable, strongly-typed data specific to the hook point +- **`payload`**: Immutable (frozen), strongly-typed data specific to the hook point. Plugins use `model_copy(update={...})` to propose modifications - **`context`**: Read-only shared context with session metadata and utilities - **`mode`**: `"enforce"` (default), `"permissive"`, or `"fire_and_forget"` — controls execution behavior (see Execution Mode below) - **`priority`**: Lower numbers execute first (default: 50) @@ -75,6 +75,7 @@ The hook system is backed by a lightweight plugin framework built as a Mellea de - Exposes `PluginSet` for grouping related hooks/plugins into composable, reusable units - Exposes `register()` for global plugin registration and `block()` as a convenience for returning blocking `PluginResult`s - Implements a plugin manager that loads, registers, and governs the execution of plugins +- Enforces per-hook-type payload policies via `HookPayloadPolicy`, accepting only writable-field changes from plugins The public API surface: @@ -108,20 +109,21 @@ result = await plugin_manager.invoke_hook(hook_type, payload, context) The caller (the base class method) is responsible for both invoking the hook and processing the result. Processing means checking the result for one of three possible outcomes: 1. **Continue with original payload**: — `PluginResult(continue_processing=True)` with no `modified_payload`. The caller proceeds unchanged. -2. **Continue with modified payload**: — `PluginResult(continue_processing=True, modified_payload=...)`. The caller uses the modified payload fields in place of the originals. +2. **Continue with modified payload**: — `PluginResult(continue_processing=True, modified_payload=...)`. The plugin manager applies the hook's payload policy, accepting only changes to writable fields and discarding unauthorized modifications. The caller uses the policy-filtered payload in place of the original. 3. **Block execution** — `PluginResult(continue_processing=False, violation=...)`. The caller raises or returns early with structured error information. Hooks cannot redirect control flow, jump to arbitrary code, or alter the calling method's logic beyond these outcomes. This is enforced by the `PluginResult` type. ### Payload Design Principles -Hook payloads follow five design principles: +Hook payloads follow six design principles: 1. **Strongly typed** — Each hook has a dedicated payload dataclass (not a generic dict). This enables IDE autocompletion, static analysis, and clear documentation of what each hook receives. 2. **Sufficient (maximize-at-boundary)** — Each payload includes everything available at that point in time. Post-hooks include the pre-hook fields plus results. This avoids forcing plugins to maintain their own state across pre/post pairs. -3. **Immutable context** — `PluginContext` fields are read-only; only the `payload` is mutable. This separates "what the plugin can observe" from "what the plugin can change." -4. **Serializable** — Payloads should be serializable for external (MCP-based) plugins that run out-of-process. All payload fields use types that can round-trip through JSON or similar formats. -5. **Versioned** — Payload schemas carry a `payload_version` so plugins can detect incompatible changes at registration time rather than at runtime. +3. **Frozen (immutable)** — Payloads are frozen Pydantic models (`model_config = ConfigDict(frozen=True)`). Plugins cannot mutate payload attributes in place. To propose changes, plugins must call `payload.model_copy(update={...})` and return the copy via `PluginResult.modified_payload`. This ensures every modification is explicit and flows through the policy system. +4. **Policy-controlled** — Each hook type declares a `HookPayloadPolicy` specifying which fields are writable. The plugin manager applies the policy after each plugin returns, accepting only changes to writable fields and silently discarding unauthorized modifications. This separates "what the plugin can observe" from "what the plugin can change" — and enforces it at the framework level. See [Hook Payload Policies](#hook-payload-policies) for the full policy table. +5. **Serializable** — Payloads should be serializable for external (MCP-based) plugins that run out-of-process. All payload fields use types that can round-trip through JSON or similar formats. +6. **Versioned** — Payload schemas carry a `payload_version` so plugins can detect incompatible changes at registration time rather than at runtime. ## 2. Common Payload Fields @@ -129,6 +131,9 @@ All hook payloads inherit these base fields: ```python class BasePayload(PluginPayload): + """Frozen base — all payloads are immutable by design.""" + model_config = ConfigDict(frozen=True, arbitrary_types_allowed=True) + session_id: str | None = None # Session identifier (None for functional API calls) request_id: str # Unique ID for this execution chain timestamp: datetime # When the event fired @@ -168,6 +173,131 @@ class BasePayload(PluginPayload): | `context_prune` | Context Operations | Context | When context is trimmed | | `error_occurred` | Error Handling | Cross-cutting | When an unrecoverable error occurs | +## 3b. Hook Payload Policies + +Each hook type declares a `HookPayloadPolicy` that specifies which payload fields plugins are allowed to modify. The plugin manager enforces these policies after each plugin returns: only changes to writable fields are accepted; all other modifications are silently discarded. + +Hooks not listed in the policy table are **observe-only** — plugins can read the payload but cannot modify any fields. + +### Policy Types + +```python +from dataclasses import dataclass +from enum import Enum + +class DefaultHookPolicy(str, Enum): + """Controls behavior for hooks without an explicit policy.""" + ALLOW = "allow" # Accept all modifications (backwards-compatible) + DENY = "deny" # Reject all modifications (strict mode, default for Mellea) + +@dataclass(frozen=True) +class HookPayloadPolicy: + """Defines which payload fields plugins may modify.""" + writable_fields: frozenset[str] +``` + +### Policy Enforcement + +When a plugin returns `PluginResult(modified_payload=...)`, the plugin manager applies `apply_policy()`: + +```python +def apply_policy( + original: BaseModel, + modified: BaseModel, + policy: HookPayloadPolicy, +) -> BaseModel | None: + """Accept only changes to writable fields; discard all others. + + Returns an updated payload via model_copy(update=...), or None + if the plugin made no effective (allowed) changes. + """ + updates: dict[str, Any] = {} + for field in policy.writable_fields: + old_val = getattr(original, field, _SENTINEL) + new_val = getattr(modified, field, _SENTINEL) + if new_val is not _SENTINEL and new_val != old_val: + updates[field] = new_val + return original.model_copy(update=updates) if updates else None +``` + +### Policy Table + +| Hook Point | Writable Fields | +|------------|----------------| +| **Session Lifecycle** | | +| `session_pre_init` | `backend_name`, `model_id`, `model_options`, `backend_kwargs` | +| `session_post_init` | *(observe-only)* | +| `session_reset` | *(observe-only)* | +| `session_cleanup` | *(observe-only)* | +| **Component Lifecycle** | | +| `component_pre_create` | `description`, `images`, `requirements`, `icl_examples`, `grounding_context`, `user_variables`, `prefix`, `template_id` | +| `component_post_create` | `component` | +| `component_pre_execute` | `action`, `context`, `context_view`, `requirements`, `model_options`, `format`, `strategy`, `tool_calls_enabled` | +| `component_post_success` | `result` | +| `component_post_error` | *(observe-only)* | +| **Generation Pipeline** | | +| `generation_pre_call` | `model_options`, `tools`, `format`, `formatted_prompt` | +| `generation_post_call` | `processed_output`, `model_output` | +| `generation_stream_chunk` | `chunk`, `accumulated` | +| **Validation** | | +| `validation_pre_check` | `requirements`, `model_options` | +| `validation_post_check` | `results`, `all_passed` | +| **Sampling Pipeline** | | +| `sampling_loop_start` | `loop_budget` | +| `sampling_iteration` | *(observe-only)* | +| `sampling_repair` | `repair_action`, `repair_context` | +| `sampling_loop_end` | `final_result` | +| **Tool Execution** | | +| `tool_pre_invoke` | `tool_args` | +| `tool_post_invoke` | `tool_output` | +| **Backend Adapter Ops** | | +| `adapter_pre_load` | *(observe-only)* | +| `adapter_post_load` | *(observe-only)* | +| `adapter_pre_unload` | *(observe-only)* | +| `adapter_post_unload` | *(observe-only)* | +| **Context Operations** | | +| `context_update` | *(observe-only)* | +| `context_prune` | *(observe-only)* | +| **Error Handling** | | +| `error_occurred` | *(observe-only)* | + +### Default Policy + +Mellea uses `DefaultHookPolicy.DENY` as the default for hooks without an explicit policy. This means: + +- **Hooks with an explicit policy**: Only writable fields are accepted; other changes are discarded. +- **Hooks without a policy** (observe-only): All modifications are rejected with a warning log. +- **Custom hooks**: Custom hooks registered by users default to `DENY`. To allow modifications, pass a `HookPayloadPolicy` when registering the custom hook type. + +### Modification Pattern + +Because payloads are frozen, plugins must use `model_copy(update={...})` to create a modified copy: + +```python +@hook("generation_pre_call", mode="enforce", priority=10) +async def enforce_budget(payload, ctx): + if (payload.estimated_tokens or 0) > 4000: + return block("Token budget exceeded") + + # Modify a writable field — use model_copy, not direct assignment + modified = payload.model_copy(update={"model_options": {**payload.model_options, "max_tokens": 4000}}) + return PluginResult(continue_processing=True, modified_payload=modified) +``` + +Attempting to set attributes directly (e.g., `payload.model_options = {...}`) raises a `FrozenModelError`. + +### Chaining + +When multiple plugins modify the same hook's payload, modifications are chained: + +1. Plugin A receives the original payload, returns a modified copy. +2. The policy filters Plugin A's changes to writable fields only. +3. Plugin B receives the policy-filtered result from Plugin A. +4. The policy filters Plugin B's changes. +5. The final policy-filtered payload is returned to the caller. + +This ensures each plugin sees the cumulative effect of prior plugins, and all modifications pass through the policy filter. + ## 4. Hook Definitions ### A. Session Lifecycle Hooks @@ -1039,20 +1169,26 @@ class ContextSnapshot: Hooks can return different result types to control execution: 1. **Continue (no-op)** — `PluginResult(continue_processing=True)` with no `modified_payload`. Execution proceeds with the original payload unchanged. -2. **Continue with modification** — `PluginResult(continue_processing=True, modified_payload=...)`. Execution proceeds with the modified payload fields in place of the originals. +2. **Continue with modification** — `PluginResult(continue_processing=True, modified_payload=...)`. The plugin manager applies the hook's `HookPayloadPolicy`, accepting only changes to writable fields. Execution proceeds with the policy-filtered payload. 3. **Block execution** — `PluginResult(continue_processing=False, violation=...)`. Execution halts with structured error information via `PluginViolation`. These three outcomes are exhaustive. Hooks cannot redirect control flow, throw arbitrary exceptions, or alter the calling method's logic beyond these outcomes. This is enforced by the `PluginResult` type — there is no escape hatch. The `violation` field provides structured error information but does not influence which code path runs next. +Because payloads are frozen, the `modified_payload` in option 2 must be a new object created via `payload.model_copy(update={...})` — not a mutated version of the original. + ### Modify Payload ```python +# Create an immutable copy with only the desired changes +modified = payload.model_copy(update={"model_options": new_options}) return PluginResult( - continue_processing=True - modified_payload=modified_payload, + continue_processing=True, + modified_payload=modified, ) ``` +> **Note**: Only changes to fields listed in the hook's `HookPayloadPolicy.writable_fields` will be accepted. Changes to other fields are silently discarded by the policy enforcement layer. + ### Block Execution ```python @@ -1358,15 +1494,15 @@ class PIIRedactor: async def redact_input(self, payload, ctx): redacted = self._redact(payload.description) if redacted != payload.description: - payload.description = redacted - return PluginResult(continue_processing=True, modified_payload=payload) + modified = payload.model_copy(update={"description": redacted}) + return PluginResult(continue_processing=True, modified_payload=modified) @hook("generation_post_call") async def redact_output(self, payload, ctx): redacted = self._redact(payload.processed_output) if redacted != payload.processed_output: - payload.processed_output = redacted - return PluginResult(continue_processing=True, modified_payload=payload) + modified = payload.model_copy(update={"processed_output": redacted}) + return PluginResult(continue_processing=True, modified_payload=modified) def _redact(self, text: str) -> str: for pattern in self.patterns: diff --git a/docs/dev/hook_system_implementation_plan.md b/docs/dev/hook_system_implementation_plan.md index 259d50126..971dc932a 100644 --- a/docs/dev/hook_system_implementation_plan.md +++ b/docs/dev/hook_system_implementation_plan.md @@ -15,6 +15,7 @@ mellea/plugins/ ├── _manager.py # Singleton wrapper + session-tag filtering ├── _base.py # MelleaBasePayload, MelleaPlugin base class ├── _types.py # MelleaHookType enum + hook registration +├── _policies.py # HookPayloadPolicy table + DefaultHookPolicy for Mellea hooks ├── _context.py # Plugin context factory helper ├── _decorators.py # @hook and @plugin decorator implementations ├── _pluginset.py # PluginSet class @@ -39,13 +40,16 @@ The following types from `mcpgateway.plugins.framework` form the plumbing layer. | Type | Role | |------|------| | `Plugin` | ABC base class. `__init__(config: PluginConfig)`, `initialize()`, `shutdown()`. Hook methods discovered by convention (method name = hook type) or `@hook()` decorator. Signature: `async def hook_name(self, payload, context) -> PluginResult`. | -| `PluginManager` | Borg singleton. `__init__(config_path, timeout, observability)`. Key methods: `invoke_hook(hook_type, payload, global_context, ...) -> (PluginResult, PluginContextTable)`, `has_hooks_for(hook_type) -> bool`, `initialize()`, `shutdown()`. | -| `PluginPayload` | Type alias for `pydantic.BaseModel`. Base type for all hook payloads. | +| `PluginManager` | Borg singleton. `__init__(config_path, timeout, observability, hook_policies)`. Key methods: `invoke_hook(hook_type, payload, global_context, ...) -> (PluginResult, PluginContextTable)`, `has_hooks_for(hook_type) -> bool`, `initialize()`, `shutdown()`. The `hook_policies` parameter accepts a `dict[str, HookPayloadPolicy]` mapping hook types to their writable-field policies. | +| `PluginPayload` | Base type for all hook payloads. Frozen Pydantic `BaseModel` (`ConfigDict(frozen=True)`). Plugins use `model_copy(update={...})` to propose modifications. | | `PluginResult[T]` | Generic result: `continue_processing: bool`, `modified_payload: T | None`, `violation: PluginViolation | None`, `metadata: dict`. | | `PluginViolation` | `reason`, `description`, `code`, `details`. | | `PluginConfig` | `name`, `kind`, `hooks`, `mode`, `priority`, `conditions`, `config`, ... | | `PluginMode` | `ENFORCE`, `ENFORCE_IGNORE_ERROR`, `PERMISSIVE`, `FIRE_AND_FORGET`, `DISABLED`. | | `PluginContext` | `state: dict`, `global_context: GlobalContext`, `metadata: dict`. | +| `HookPayloadPolicy` | Frozen dataclass with `writable_fields: frozenset[str]`. Defines which payload fields plugins may modify for a given hook type. | +| `DefaultHookPolicy` | Enum: `ALLOW` (accept all modifications), `DENY` (reject all modifications). Controls behavior for hooks without an explicit policy. | +| `apply_policy()` | `apply_policy(original, modified, policy) -> BaseModel \| None`. Accepts only changes to writable fields via `model_copy(update=...)`, discarding unauthorized changes. Returns `None` if no effective changes. | | `HookRegistry` | `get_hook_registry()`, `register_hook(hook_type, payload_class, result_class)`, `is_registered(hook_type)`. | | `@hook` decorator | `@hook("hook_type")` or `@hook("hook_type", PayloadType, ResultType)` for custom method names. | @@ -70,7 +74,8 @@ classDiagram -timeout: int -observability: Any -hook_registry: HookRegistry - +__init__(config_path, timeout, observability) + -hook_policies: dict~str, HookPayloadPolicy~ + +__init__(config_path, timeout, observability, hook_policies) +invoke_hook(hook_type, payload, global_context, ...) tuple~PluginResult, PluginContextTable~ +has_hooks_for(hook_type: str) bool +initialize() async @@ -97,10 +102,23 @@ classDiagram DISABLED } + %% Policy + class HookPayloadPolicy { + <> + +writable_fields: frozenset~str~ + } + + class DefaultHookPolicy { + <> + ALLOW + DENY + } + %% Payload & Result class PluginPayload { <> pydantic.BaseModel + model_config: frozen=True } class PluginResult~T~ { @@ -138,6 +156,7 @@ classDiagram Plugin ..> hook : decorated by PluginManager --> Plugin : manages 0..* + PluginManager --> HookPayloadPolicy : enforces per hook PluginConfig --> PluginMode : has @@ -157,7 +176,6 @@ plugins: - component_pre_create - generation_post_call mode: enforce - mode: enforce priority: 10 config: blocked_terms: ["term1", "term2"] @@ -242,7 +260,14 @@ All Mellea hook payloads inherit from this base, which extends `PluginPayload` w ```python class MelleaBasePayload(PluginPayload): - model_config = ConfigDict(arbitrary_types_allowed=True) + """Frozen base — all payloads are immutable by design. + + Plugins must use ``model_copy(update={...})`` to propose modifications + and return the copy via ``PluginResult.modified_payload``. The plugin + manager applies the hook's ``HookPayloadPolicy`` to filter changes to + writable fields only. + """ + model_config = ConfigDict(frozen=True, arbitrary_types_allowed=True) session_id: str | None = None request_id: str @@ -251,7 +276,7 @@ class MelleaBasePayload(PluginPayload): user_metadata: dict[str, Any] = Field(default_factory=dict) ``` -`arbitrary_types_allowed=True` is required because payloads include non-serializable Mellea objects (`Backend`, `Context`, `Component`, `ModelOutputThunk`). This means external plugins cannot receive these payloads directly; they are designed for native in-process plugins. +`frozen=True` prevents in-place mutations — attribute assignment on a payload instance raises `FrozenModelError`. `arbitrary_types_allowed=True` is required because payloads include non-serializable Mellea objects (`Backend`, `Context`, `Component`, `ModelOutputThunk`). This means external plugins cannot receive these payloads directly; they are designed for native in-process plugins. ### 3.3 Hook Registration (`mellea/plugins/_types.py`) @@ -431,6 +456,86 @@ def block( ``` +### 3.10 Hook Payload Policies (`mellea/plugins/_policies.py`) + +Defines the concrete per-hook-type policies for Mellea hooks. These are injected into the `PluginManager` at initialization time via the `hook_policies` parameter. + +```python +from mcpgateway.plugins.framework.hooks.policies import HookPayloadPolicy + +MELLEA_HOOK_PAYLOAD_POLICIES: dict[str, HookPayloadPolicy] = { + # Session Lifecycle + "session_pre_init": HookPayloadPolicy( + writable_fields=frozenset({"backend_name", "model_id", "model_options", "backend_kwargs"}), + ), + # session_post_init, session_reset, session_cleanup: observe-only (no entry) + + # Component Lifecycle + "component_pre_create": HookPayloadPolicy( + writable_fields=frozenset({ + "description", "images", "requirements", "icl_examples", + "grounding_context", "user_variables", "prefix", "template_id", + }), + ), + "component_post_create": HookPayloadPolicy( + writable_fields=frozenset({"component"}), + ), + "component_pre_execute": HookPayloadPolicy( + writable_fields=frozenset({ + "action", "context", "context_view", "requirements", + "model_options", "format", "strategy", "tool_calls_enabled", + }), + ), + "component_post_success": HookPayloadPolicy( + writable_fields=frozenset({"result"}), + ), + # component_post_error: observe-only + + # Generation Pipeline + "generation_pre_call": HookPayloadPolicy( + writable_fields=frozenset({"model_options", "tools", "format", "formatted_prompt"}), + ), + "generation_post_call": HookPayloadPolicy( + writable_fields=frozenset({"processed_output", "model_output"}), + ), + "generation_stream_chunk": HookPayloadPolicy( + writable_fields=frozenset({"chunk", "accumulated"}), + ), + + # Validation + "validation_pre_check": HookPayloadPolicy( + writable_fields=frozenset({"requirements", "model_options"}), + ), + "validation_post_check": HookPayloadPolicy( + writable_fields=frozenset({"results", "all_passed"}), + ), + + # Sampling Pipeline + "sampling_loop_start": HookPayloadPolicy( + writable_fields=frozenset({"loop_budget"}), + ), + # sampling_iteration: observe-only + "sampling_repair": HookPayloadPolicy( + writable_fields=frozenset({"repair_action", "repair_context"}), + ), + "sampling_loop_end": HookPayloadPolicy( + writable_fields=frozenset({"final_result"}), + ), + + # Tool Execution + "tool_pre_invoke": HookPayloadPolicy( + writable_fields=frozenset({"tool_args"}), + ), + "tool_post_invoke": HookPayloadPolicy( + writable_fields=frozenset({"tool_output"}), + ), + + # adapter_*, context_*, error_occurred: observe-only (no entry) +} +``` + +Hooks absent from this table are observe-only. With `DefaultHookPolicy.DENY` (the Mellea default), any modification attempt on an observe-only hook is rejected with a warning log. + ## 4. Plugin Manager Integration (`mellea/plugins/_manager.py`) ### 4.1 Lazy Singleton Wrapper @@ -455,7 +560,11 @@ def _ensure_plugin_manager() -> PluginManager: global _plugin_manager, _plugins_enabled if _plugin_manager is None: _register_mellea_hooks() - pm = PluginManager("", timeout=5) + pm = PluginManager( + "", + timeout=5, + hook_policies=MELLEA_HOOK_PAYLOAD_POLICIES, + ) _run_async_in_thread(pm.initialize()) _plugin_manager = pm _plugins_enabled = True @@ -467,7 +576,11 @@ async def initialize_plugins( """Initialize the PluginManager with Mellea hook registrations and optional YAML config.""" global _plugin_manager, _plugins_enabled _register_mellea_hooks() - pm = PluginManager(config_path or "", timeout=int(timeout)) + pm = PluginManager( + config_path or "", + timeout=int(timeout), + hook_policies=MELLEA_HOOK_PAYLOAD_POLICIES, + ) await pm.initialize() _plugin_manager = pm _plugins_enabled = True @@ -521,10 +634,11 @@ async def invoke_hook( if not _plugin_manager.has_hooks_for(hook_type.value): return None, payload - payload.hook = hook_type.value - payload.session_id = session_id + # Payloads are frozen — use model_copy to set dispatch-time fields + updates: dict[str, Any] = {"hook": hook_type.value, "session_id": session_id} if not payload.request_id: - payload.request_id = request_id + updates["request_id"] = request_id + payload = payload.model_copy(update=updates) global_ctx = build_global_context( session=session, backend=backend, context=context, @@ -772,14 +886,16 @@ async def generate_from_context_with_hooks( if has_plugins(): pre_payload = GenerationPreCallPayload( action=action, context=ctx, + formatted_prompt="", # Populated after linearization; writable by plugins model_options=model_options or {}, format=format, tools=None, ) result, pre_payload = await invoke_hook( MelleaHookType.GENERATION_PRE_CALL, pre_payload, backend=self, context=ctx, ) - if result and result.modified_payload: - model_options = result.modified_payload.model_options + # pre_payload is the policy-filtered result — extract all writable fields + model_options = pre_payload.model_options + format = pre_payload.format t0 = time.monotonic() out_result, new_ctx = await self.generate_from_context( @@ -788,8 +904,13 @@ async def generate_from_context_with_hooks( if has_plugins(): post_payload = GenerationPostCallPayload( + prompt=..., # Sent prompt (from linearization) + raw_response=..., # Full JSON response from provider + processed_output=..., # Extracted text from response model_output=out_result, + token_usage=..., # From backend response metadata latency_ms=int((time.monotonic() - t0) * 1000), + finish_reason=..., # From backend response metadata ) await invoke_hook( MelleaHookType.GENERATION_POST_CALL, post_payload, @@ -974,8 +1095,9 @@ async def fire_error_hook( | `mellea/plugins/_pluginset.py` (new) | `PluginSet` class with `flatten()` for recursive expansion | | `mellea/plugins/_registry.py` (new) | `register()`, `block()`, `_FunctionHookAdapter`, `_register_single()` | | `mellea/plugins/_manager.py` (new) | Singleton wrapper, `invoke_hook()` with session-tag filtering, `_ensure_plugin_manager()` | -| `mellea/plugins/_base.py` (new) | `MelleaBasePayload`, `MelleaPlugin` base class | +| `mellea/plugins/_base.py` (new) | `MelleaBasePayload` (frozen), `MelleaPlugin` base class | | `mellea/plugins/_types.py` (new) | `MelleaHookType` enum, `_register_mellea_hooks()` | +| `mellea/plugins/_policies.py` (new) | `MELLEA_HOOK_PAYLOAD_POLICIES` table, injected into `PluginManager` at init | | `mellea/plugins/_context.py` (new) | `build_global_context()` factory | | `mellea/plugins/hooks/` (new) | Hook payload dataclasses (session, component, generation, etc.) | | `test/plugins/` (new) | Tests for plugins subpackage | From 1d084eacd126d610ad03b8ebea49297344a2f06f Mon Sep 17 00:00:00 2001 From: Hendrik Strobelt Date: Tue, 17 Feb 2026 11:00:13 -0500 Subject: [PATCH 09/10] Update hook_system.md Added implementation priorities to Hook Table. --- docs/dev/hook_system.md | 58 ++++++++++++++++++++--------------------- 1 file changed, 29 insertions(+), 29 deletions(-) diff --git a/docs/dev/hook_system.md b/docs/dev/hook_system.md index c5247d611..7b70d2671 100644 --- a/docs/dev/hook_system.md +++ b/docs/dev/hook_system.md @@ -143,35 +143,35 @@ class BasePayload(PluginPayload): ## 3. Hook Summary Table -| Hook Point | Category | Domain | Description | -|------------|----------|--------|-------------| -| `session_pre_init` | Session Lifecycle | Session | Before session initialization | -| `session_post_init` | Session Lifecycle | Session | After session is fully initialized | -| `session_reset` | Session Lifecycle | Session | When session context is reset | -| `session_cleanup` | Session Lifecycle | Session | Before session cleanup/teardown | -| `component_pre_create` | Component Lifecycle | Component / (Backend, Context) | Before component creation | -| `component_post_create` | Component Lifecycle | Component / (Backend, Context) | After component created, before execution | -| `component_pre_execute` | Component Lifecycle | Component / (Backend, Context) | Before component execution via `aact()` | -| `component_post_success` | Component Lifecycle | Component / (Backend, Context) | After successful component execution | -| `component_post_error` | Component Lifecycle | Component / (Backend, Context) | After component execution fails | -| `generation_pre_call` | Generation Pipeline | (Backend, Context) | Before LLM backend call | -| `generation_post_call` | Generation Pipeline | (Backend, Context) | After LLM response received | -| `generation_stream_chunk` | Generation Pipeline | (Backend, Context) | For each streaming chunk | -| `validation_pre_check` | Validation | (Backend, Context) | Before requirement validation | -| `validation_post_check` | Validation | (Backend, Context) | After validation completes | -| `sampling_loop_start` | Sampling Pipeline | (Backend, Context) | When sampling strategy begins | -| `sampling_iteration` | Sampling Pipeline | (Backend, Context) | After each sampling attempt | -| `sampling_repair` | Sampling Pipeline | (Backend, Context) | When repair is invoked | -| `sampling_loop_end` | Sampling Pipeline | (Backend, Context) | When sampling completes | -| `tool_pre_invoke` | Tool Execution | (Backend, Context) | Before tool/function invocation | -| `tool_post_invoke` | Tool Execution | (Backend, Context) | After tool execution | -| `adapter_pre_load` | Backend Adapter Ops | Backend | Before `backend.load_adapter()` | -| `adapter_post_load` | Backend Adapter Ops | Backend | After adapter loaded | -| `adapter_pre_unload` | Backend Adapter Ops | Backend | Before `backend.unload_adapter()` | -| `adapter_post_unload` | Backend Adapter Ops | Backend | After adapter unloaded | -| `context_update` | Context Operations | Context | When context changes | -| `context_prune` | Context Operations | Context | When context is trimmed | -| `error_occurred` | Error Handling | Cross-cutting | When an unrecoverable error occurs | +| Prio | Hook Point | Category | Domain | Description | +|--|------------|----------|--------|-------------| +|3| `session_pre_init` | Session Lifecycle | Session | Before session initialization | +|3| `session_post_init` | Session Lifecycle | Session | After session is fully initialized | +|3| `session_reset` | Session Lifecycle | Session | When session context is reset | +|3| `session_cleanup` | Session Lifecycle | Session | Before session cleanup/teardown | +|7| `component_pre_create` | Component Lifecycle | Component / (Backend, Context) | Before component creation | +|7| `component_post_create` | Component Lifecycle | Component / (Backend, Context) | After component created, before execution | +|7| `component_pre_execute` | Component Lifecycle | Component / (Backend, Context) | Before component execution via `aact()` | +|7| `component_post_success` | Component Lifecycle | Component / (Backend, Context) | After successful component execution | +|7| `component_post_error` | Component Lifecycle | Component / (Backend, Context) | After component execution fails | +|1| `generation_pre_call` | Generation Pipeline | (Backend, Context) | Before LLM backend call | +|1| `generation_post_call` | Generation Pipeline | (Backend, Context) | After LLM response received | +|1| `generation_stream_chunk` | Generation Pipeline | (Backend, Context) | For each streaming chunk | +|1| `validation_pre_check` | Validation | (Backend, Context) | Before requirement validation | +|1| `validation_post_check` | Validation | (Backend, Context) | After validation completes | +|3| `sampling_loop_start` | Sampling Pipeline | (Backend, Context) | When sampling strategy begins | +|3| `sampling_iteration` | Sampling Pipeline | (Backend, Context) | After each sampling attempt | +|3| `sampling_repair` | Sampling Pipeline | (Backend, Context) | When repair is invoked | +|3| `sampling_loop_end` | Sampling Pipeline | (Backend, Context) | When sampling completes | +|1| `tool_pre_invoke` | Tool Execution | (Backend, Context) | Before tool/function invocation | +|1| `tool_post_invoke` | Tool Execution | (Backend, Context) | After tool execution | +|5| `adapter_pre_load` | Backend Adapter Ops | Backend | Before `backend.load_adapter()` | +|5| `adapter_post_load` | Backend Adapter Ops | Backend | After adapter loaded | +|5| `adapter_pre_unload` | Backend Adapter Ops | Backend | Before `backend.unload_adapter()` | +|5| `adapter_post_unload` | Backend Adapter Ops | Backend | After adapter unloaded | +|???| `context_update` | Context Operations | Context | When context changes | +|???| `context_prune` | Context Operations | Context | When context is trimmed | +|???| `error_occurred` | Error Handling | Cross-cutting | When an unrecoverable error occurs | ## 3b. Hook Payload Policies From 8b051acfcb5e625785dba7f8ed4d765b9e4ad938 Mon Sep 17 00:00:00 2001 From: Frederico Araujo Date: Tue, 17 Feb 2026 23:01:48 -0500 Subject: [PATCH 10/10] docs: update hook system specification to document with-block support Signed-off-by: Frederico Araujo --- docs/dev/hook_system.md | 105 +++++++++++ docs/dev/hook_system_implementation_plan.md | 190 +++++++++++++++++++- 2 files changed, 287 insertions(+), 8 deletions(-) diff --git a/docs/dev/hook_system.md b/docs/dev/hook_system.md index 7b70d2671..8eeec9cd3 100644 --- a/docs/dev/hook_system.md +++ b/docs/dev/hook_system.md @@ -96,6 +96,111 @@ Both scopes coexist. When a hook fires within a session, both global plugins and **Functional API support**: The functional API (`instruct(backend, context, ...)`) does not require a session. Hooks still fire at the same execution points. If global plugins are registered, they execute. If no plugins are registered, hooks are no-ops with zero overhead. +### With-Block-Scoped Context Managers + +In addition to global and session-scoped registration, plugins can be activated for a specific block of code using the context manager protocol. Plugins are registered on entry and deregistered on exit, even if the block raises an exception. + +This is useful for: +- Feature flags: enable a plugin only during a specific operation +- Testing: activate a mock or spy plugin around a single call +- Middleware injection: wrap a third-party call with additional hooks without polluting global state +- Composing scopes: stack independent scopes that each clean up after themselves + +Four equivalent forms are supported: + +**1. `plugin_scope(*items)` factory** + +Accepts standalone `@hook` functions, `@plugin`-decorated instances, `PluginSet`s, or any mix: + +```python +from mellea.plugins import plugin_scope + +with plugin_scope(log_request, log_response): + result = m.instruct("Name the planets.") +# log_request and log_response are deregistered here + +with plugin_scope(pii_redactor, observability_set, enforce_budget): + result = m.instruct("Summarize the customer record.") +``` + +**2. `@plugin`-decorated class instance as context manager** + +Any instance of a `@plugin`-decorated class can be entered directly as a context manager: + +```python +guard = ContentGuard() +with guard: + result = m.instruct("What is the boiling point of water?") +# ContentGuard hooks are deregistered here +``` + +**3. `PluginSet` as context manager** + +A `PluginSet` can be entered directly, activating all of its contained hooks and plugins: + +```python +with observability: # observability is a PluginSet + result = m.instruct("What is the capital of France?") +# All observability hooks are deregistered here +``` + +**4. `MelleaPlugin` subclass instance as context manager** + +Any `MelleaPlugin` instance supports the same `with` syntax: + +```python +profiler = SlotProfiler() +with profiler: + result = m.instruct("Generate a report.") +``` + +**Async variants** + +All four forms also support `async with` for use in async code: + +```python +async with plugin_scope(log_request, ContentGuard()): + result = await m.ainstruct("Describe the solar system.") + +async with observability: + result = await m.ainstruct("What is the capital of France?") +``` + +**Nesting and mixing** + +Scopes stack cleanly, i.e., each exit deregisters only its own plugins. Nesting is independent of form: + +```python +with plugin_scope(log_request): # outer scope + with ContentGuard() as guard: # inner scope: @plugin instance + result = m.instruct("...") # log_request + ContentGuard active + result = m.instruct("...") # only log_request active +# no plugins active +``` + +**Cleanup guarantee** + +Plugins are always deregistered on scope exit, even if the block raises an exception. There is no resource leak on error. + +**Re-entrant restriction** + +The same instance cannot be active in two overlapping scopes simultaneously. Attempting to re-enter an already-active instance raises `RuntimeError`. To run the same plugin logic in parallel or in nested scopes, create separate instances: + +```python +guard1 = ContentGuard() +guard2 = ContentGuard() # separate instance + +with guard1: + with guard2: # OK — different instances + ... + +with guard1: + with guard1: # RuntimeError — same instance already active + ... +``` + +**Implementation note**: With-block scopes use the same `session_id` tagging mechanism as session-scoped plugins. Each `with` block gets a unique UUID scope ID; the plugin manager filters plugins by scope ID at dispatch time and deregisters them by scope ID on exit. This means with-block plugins coexist with global and session-scoped plugins: all three layers execute together, ordered by priority. + ### Hook Invocation Responsibilities Hooks are called from Mellea's base classes (`Component.aact()`, `Backend.generate()`, `SamplingStrategy.run()`, etc.). This means hook invocation is a framework-level concern, and authors of new backends, sampling strategies, or components do not need to manually insert hook calls. diff --git a/docs/dev/hook_system_implementation_plan.md b/docs/dev/hook_system_implementation_plan.md index 971dc932a..992e5bbba 100644 --- a/docs/dev/hook_system_implementation_plan.md +++ b/docs/dev/hook_system_implementation_plan.md @@ -669,7 +669,181 @@ def start_session( When `plugins` is provided, `start_session()` registers each item with the session's ID via `register(items, session_id=session.id)`. These plugins fire only within this session, in addition to any globally registered plugins. They are automatically deregistered when the session is cleaned up (at `session_cleanup`). -### 4.4 Dependency Management +### 4.4 With-Block-Scoped Registration (Context Managers) + +All three plugin forms — standalone `@hook` functions, `@plugin`-decorated class instances, and `MelleaPlugin` subclass instances — plus `PluginSet` support the Python context manager protocol for block-scoped activation. This is a fourth registration scope complementing global, session-scoped, and YAML-configured plugins. + +#### Mechanism + +With-block scopes reuse the existing `session_id` tagging infrastructure from section 4.3. Each `with` entry generates a fresh UUID scope ID, registers plugins with that scope ID, and deregisters them by scope ID on exit. The `_session_tags` dict in `_manager.py` tracks these scope IDs alongside session IDs — the manager makes no distinction between them at dispatch time. + +#### `plugin_scope()` factory (`mellea/plugins/_registry.py`) + +A `_PluginScope` internal class and `plugin_scope()` public factory serve as the universal entry point, accepting any mix of standalone functions, `@plugin` instances, and `PluginSet`s: + +```python +class _PluginScope: + """Context manager that activates a set of plugins for a block of code.""" + + def __init__(self, items: list[Callable | Any | PluginSet]) -> None: + self._items = items + self._scope_id: str | None = None + + def _activate(self) -> None: + self._scope_id = str(uuid.uuid4()) + register(self._items, session_id=self._scope_id) + + def _deactivate(self) -> None: + if self._scope_id is not None: + deregister_session_plugins(self._scope_id) + self._scope_id = None + + def __enter__(self) -> _PluginScope: + self._activate() + return self + + def __exit__(self, exc_type, exc_val, exc_tb) -> None: + self._deactivate() + + async def __aenter__(self) -> _PluginScope: + self._activate() + return self + + async def __aexit__(self, exc_type, exc_val, exc_tb) -> None: + self._deactivate() + + +def plugin_scope(*items: Callable | Any | PluginSet) -> _PluginScope: + """Create a context manager that activates the given plugins for a block of code.""" + return _PluginScope(list(items)) +``` + +#### `@plugin`-decorated class instances (`mellea/plugins/_decorators.py`) + +The `@plugin` decorator injects `__enter__`, `__exit__`, `__aenter__`, `__aexit__` into every decorated class. The methods are defined as module-level helpers (not lambdas) so they work correctly as unbound methods: + +```python +def _plugin_cm_enter(self: Any) -> Any: + if getattr(self, "_scope_id", None) is not None: + meta = getattr(type(self), "_mellea_plugin_meta", None) + plugin_name = meta.name if meta else type(self).__name__ + raise RuntimeError( + f"Plugin {plugin_name!r} is already active as a context manager. " + "Concurrent or nested reuse of the same instance is not supported; " + "create a new instance instead." + ) + self._scope_id = str(uuid.uuid4()) + register(self, session_id=self._scope_id) + return self + + +def _plugin_cm_exit(self: Any, exc_type: Any, exc_val: Any, exc_tb: Any) -> None: + scope_id = getattr(self, "_scope_id", None) + if scope_id is not None: + deregister_session_plugins(scope_id) + self._scope_id = None + + +async def _plugin_cm_aenter(self: Any) -> Any: + return self.__enter__() + + +async def _plugin_cm_aexit(self: Any, exc_type: Any, exc_val: Any, exc_tb: Any) -> None: + self.__exit__(exc_type, exc_val, exc_tb) + + +def plugin(name: str, *, priority: int = 50) -> Callable: + def decorator(cls: Any) -> Any: + cls._mellea_plugin_meta = PluginMeta(name=name, priority=priority) + cls.__enter__ = _plugin_cm_enter # injected here + cls.__exit__ = _plugin_cm_exit + cls.__aenter__ = _plugin_cm_aenter + cls.__aexit__ = _plugin_cm_aexit + return cls + return decorator +``` + +#### `PluginSet` (`mellea/plugins/_pluginset.py`) + +`PluginSet` gains the same context manager protocol using the same UUID scope ID pattern: + +```python +def __enter__(self) -> PluginSet: + if self._scope_id is not None: + raise RuntimeError( + f"PluginSet {self.name!r} is already active as a context manager. " + "Create a new instance to use in a separate scope." + ) + self._scope_id = str(uuid.uuid4()) + register(self, session_id=self._scope_id) + return self + +def __exit__(self, exc_type, exc_val, exc_tb) -> None: + if self._scope_id is not None: + deregister_session_plugins(self._scope_id) + self._scope_id = None + +async def __aenter__(self) -> PluginSet: + return self.__enter__() + +async def __aexit__(self, exc_type, exc_val, exc_tb) -> None: + self.__exit__(exc_type, exc_val, exc_tb) +``` + +#### `MelleaPlugin` (`mellea/plugins/_base.py`) + +`MelleaPlugin` gains the same protocol. Because `MelleaPlugin` subclasses ContextForge `Plugin` (which owns `__init__`), the scope ID is stored as an instance attribute accessed via `getattr` with a default rather than declared in `__init__`: + +```python +def __enter__(self) -> MelleaPlugin: + if getattr(self, "_scope_id", None) is not None: + raise RuntimeError( + f"MelleaPlugin {self.name!r} is already active as a context manager. " + "Create a new instance to use in a separate scope." + ) + self._scope_id = str(uuid.uuid4()) + register(self, session_id=self._scope_id) + return self + +def __exit__(self, exc_type, exc_val, exc_tb) -> None: + scope_id = getattr(self, "_scope_id", None) + if scope_id is not None: + deregister_session_plugins(scope_id) + self._scope_id = None + +async def __aenter__(self) -> MelleaPlugin: + return self.__enter__() + +async def __aexit__(self, exc_type, exc_val, exc_tb) -> None: + self.__exit__(exc_type, exc_val, exc_tb) +``` + +#### Deregistration helper (`mellea/plugins/_manager.py`) + +A `deregister_session_plugins(scope_id)` function removes all plugins tagged with a given scope ID from the `PluginManager` and cleans up the `_session_tags` entry. This is the same function used by `session_cleanup` to deregister session-scoped plugins: + +```python +def deregister_session_plugins(session_id: str) -> None: + """Deregister all plugins associated with a given session or scope ID.""" + pm = _plugin_manager + if pm is None: + return + plugin_keys = _session_tags.pop(session_id, set()) + for key in plugin_keys: + pm.deregister(key) +``` + +#### Public API + +`plugin_scope` is exported from `mellea.plugins`: + +```python +from mellea.plugins import plugin_scope +``` + +All four forms (`plugin_scope`, `@plugin` instance, `PluginSet`, `MelleaPlugin` instance) support both `with` and `async with`. The same-instance re-entrant restriction applies to all forms: attempting to re-enter an already-active instance raises `RuntimeError`. Create separate instances to activate the same plugin logic in nested or concurrent scopes. + +### 4.5 Dependency Management Add to `pyproject.toml` under `[project.optional-dependencies]`: @@ -679,7 +853,7 @@ plugins = ["contextforge-plugin-framework>=0.1.0"] All imports in `mellea/plugins/` are guarded with `try/except ImportError`. -### 4.5 Global Registration (`mellea/plugins/_registry.py`) +### 4.6 Global Registration (`mellea/plugins/_registry.py`) Global registration happens via `register()` at application startup: @@ -1090,12 +1264,12 @@ async def fire_error_hook( | `mellea/backends/openai.py` | 4 adapter hooks in `load_adapter()` / `unload_adapter()` | | `mellea/backends/huggingface.py` | 4 adapter hooks in `load_adapter()` / `unload_adapter()` | | `pyproject.toml` | Add `plugins` optional dependency + `plugins` test marker | -| `mellea/plugins/__init__.py` (new) | Public API: `hook`, `plugin`, `block`, `PluginSet`, `register`, `MelleaPlugin` | -| `mellea/plugins/_decorators.py` (new) | `@hook` and `@plugin` decorator implementations, `HookMeta`, `PluginMeta` | -| `mellea/plugins/_pluginset.py` (new) | `PluginSet` class with `flatten()` for recursive expansion | -| `mellea/plugins/_registry.py` (new) | `register()`, `block()`, `_FunctionHookAdapter`, `_register_single()` | -| `mellea/plugins/_manager.py` (new) | Singleton wrapper, `invoke_hook()` with session-tag filtering, `_ensure_plugin_manager()` | -| `mellea/plugins/_base.py` (new) | `MelleaBasePayload` (frozen), `MelleaPlugin` base class | +| `mellea/plugins/__init__.py` (new) | Public API: `hook`, `plugin`, `block`, `PluginSet`, `register`, `MelleaPlugin`, `plugin_scope` | +| `mellea/plugins/_decorators.py` (new) | `@hook` and `@plugin` decorator implementations, `HookMeta`, `PluginMeta`; `@plugin` injects `__enter__`/`__exit__`/`__aenter__`/`__aexit__` into decorated classes | +| `mellea/plugins/_pluginset.py` (new) | `PluginSet` class with `flatten()` for recursive expansion; context manager protocol (`__enter__`/`__exit__`/`__aenter__`/`__aexit__`) for with-block scoping | +| `mellea/plugins/_registry.py` (new) | `register()`, `block()`, `_FunctionHookAdapter`, `_register_single()`; `_PluginScope` class and `plugin_scope()` factory for with-block scoping | +| `mellea/plugins/_manager.py` (new) | Singleton wrapper, `invoke_hook()` with session-tag filtering, `_ensure_plugin_manager()`; `deregister_session_plugins()` for scope cleanup (used by both session and with-block exit) | +| `mellea/plugins/_base.py` (new) | `MelleaBasePayload` (frozen), `MelleaPlugin` base class with context manager protocol for with-block scoping | | `mellea/plugins/_types.py` (new) | `MelleaHookType` enum, `_register_mellea_hooks()` | | `mellea/plugins/_policies.py` (new) | `MELLEA_HOOK_PAYLOAD_POLICIES` table, injected into `PluginManager` at init | | `mellea/plugins/_context.py` (new) | `build_global_context()` factory |