feat: hook system and plugin support for Mellea#582
feat: hook system and plugin support for Mellea#582araujof wants to merge 52 commits intogenerative-computing:mainfrom
Conversation
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🟢 Enforce conventional commitWonderful, this rule succeeded.Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/
|
65d11b1 to
5b9d663
Compare
jakelorocco
left a comment
There was a problem hiding this comment.
It's looking good! I will take a deeper look at the new tests and make sure our existing tests haven't broken.
mellea/core/backend.py
Outdated
| async def generate_from_context_with_hooks( | ||
| self, | ||
| action: Component[C] | CBlock, | ||
| ctx: Context, | ||
| *, | ||
| format: type[BaseModelSubclass] | None = None, | ||
| model_options: dict | None = None, | ||
| tool_calls: bool = False, | ||
| ) -> tuple[ModelOutputThunk[C], Context]: |
There was a problem hiding this comment.
We should change this to generate_from_context and have the other backend classes implement a generate function that gets called here.
@nrfulton @HendrikStrobelt, we should also make a note that generate_from_raw won't have these hooks. Or we should decide now if we want them to and quickly create separate hooks for generate from raw. It will mostly be the same interface except actions can be an array.
There was a problem hiding this comment.
We briefly mentioned this before, but sampling hooks will only impact sampling strategies derived from BaseSampling strategy that don't override the sample method. I think this okay for now, but we should eventually refactor our sampling classes to handle this better. @nrfulton and @HendrikStrobelt, do you agree?
pyproject.toml
Outdated
| cpex = [ | ||
| "cpex; python_version >= '3.11'", | ||
| ] |
There was a problem hiding this comment.
Will talk with the team; one option is to disable plugins if python <= 3.10. Putting our version download numbers here for future reference:
┌────────────────┬───────────┬───────┐
│ Python Version │ Downloads │ Share │
├────────────────┼───────────┼───────┤
│ 3.12 │ 76,864 │ 59.0% │
├────────────────┼───────────┼───────┤
│ 3.13 │ 27,919 │ 21.4% │
├────────────────┼───────────┼───────┤
│ 3.11 │ 11,428 │ 8.8% │
├────────────────┼───────────┼───────┤
│ 3.14 │ 7,416 │ 5.7% │
├────────────────┼───────────┼───────┤
│ 3.10 │ 4,106 │ 3.2% │
├────────────────┼───────────┼───────┤
│ 3.9 │ 50 │ ~0% │
└────────────────┴───────────┴───────┘
There was a problem hiding this comment.
Decided that we will move to stop explicitly supporting python 3.10. I think there are changes to do so (ie change testing setup and some linting rules) that are outside the scope of this PR. I think we can continue with this PR even if "breaks" python 3.10 and make the version changes in a separate PR.
| parser_angular_patch_types = "fix,perf,feat" | ||
|
|
||
| [tool.uv.sources] | ||
| cpex = { git = "https://github.com/contextforge-org/contextforge-plugins-framework.git", tag = "0.1.0.dev3" } |
There was a problem hiding this comment.
Adding a comment here so that we remember to update this before the final merge.
test/plugins/test_example_scripts.py
Outdated
There was a problem hiding this comment.
I believe you can remove this. We run our examples as part of our testing pipeline already.
There was a problem hiding this comment.
Comment in the summary. I think it's good to be able to run it as part of dev pytest. Let me know if I'm missing any other dev tooling otherwise.
|
Thanks for the thorough review, @jakelorocco! I've pushed changes addressing most of your comments. Here's the breakdown: ChangesTiming removed from Sync
Payload field docstrings with expected types (all
Renamed Misc Items Deferred (need team discussion)
|
pyproject.toml
Outdated
| ] | ||
|
|
||
| all = ["mellea[watsonx,docling,hf,vllm,litellm,tools,telemetry]"] | ||
| cpex = [ |
There was a problem hiding this comment.
Would it make sense to group this into the telemetry group? This definitely feels like it makes sense in that group (especially since I plan to update the telemetry to use the hooks)
There was a problem hiding this comment.
Hi @ajbozarth ! Good timing :)
We may want to keep it separate, at least for now. The way I see it, telemetry will be built as a plugin extension, but there are many other capabilities you could build with the new hook system that aren't observability-related. Maybe, in the future we will end-up with a superset group, but for now, I'd just keep them separate. I added cpex to the all group in my latest commit.
There was a problem hiding this comment.
in that case I'll just make telemetry group dependent on this one once I implement that then.
I do wonder about the name though, cpex is very specific and most users might not know what it is, maybe we should call it hooks?
There was a problem hiding this comment.
that's a good call; yes, we should change the group name to hooks / plugins or something of that nature
There was a problem hiding this comment.
also worth noting you'll need to rebase/merge once the merge queue is finished. I just reworked the pyproject file to clean up dependencies in #588
There was a problem hiding this comment.
good call, let's rename the group. Are we good with hooks/plugins?
There was a problem hiding this comment.
I think hooks for this because it enables hooks, whereas we might have a plugins group in the future which contains dependencies for actual plugin implementations
IMO, this must be moved before merging if we want to support this hook.
I would prefer we just move this hook to Component init. We can change Component from being a protocol, but I don't think this hook makes sense if it doesn't run for every component. I also don't like re-routing instruct / chat.
I think these fixes don't resolve the redaction issue alone.
I don't think this needs to be a large refactor. We can simply add an intermediary function. I think this is also a must have / must fix before merging.
Is this saying that we do actually prevent modification of subfields? I might have misread the code previously, but I think it only prevents it for known object types.
If this is the case, why do we duplicate this information?
I would prefer we don't keep this. Any solution to this should be done for the whole code base, not a one off solution. We will make sure examples work (mocked or not). Also, please install and run our pre-commit hooks to make sure formatting is okay. To prevent having to fix each commit individually, you should be able to run |
Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>
…esign drifts Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>
…onal suggestions by maintainers Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>
Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>
Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>
Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>
Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>
Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>
Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>
Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>
Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>
Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>
Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>
Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>
Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>
Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>
Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>
Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>
Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>
Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>
Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>
Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>
Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>
Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>
Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>
Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>
Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>
Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>
Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>
Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>
Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>
Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>
Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>
Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>
e300d2c to
af760bd
Compare
Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>
Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>
Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>
Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>
Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>
Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>
Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>
Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>
|
Hi @jakelorocco, thanks for the follow-up feedback. Here's a summary of the latest round of changes (commits after Addressed from your feedback1. Renamed
2.
3. Component hooks: removed 4. Sync 5. Dependency group renamed from 6. Removed New additions7. 8. return modify(payload, model_output=new_mot)9. Housekeeping
Still open / deferred
|
| backends = ["mellea[watsonx,hf,vllm,litellm]"] | ||
|
|
||
| all = ["mellea[backends,docling,tools,telemetry,server,sandbox,granite_retriever]"] | ||
| hooks = [ |
There was a problem hiding this comment.
Nit: should move this up into the relevant section above (note section comments)
Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>
…n branch) Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>
…r objects passed to plugins Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>
Signed-off-by: Frederico Araujo <frederico.araujo@ibm.com>
Summary
This PR introduces a plugin and hook system for Mellea, built on top of the cpex framework. It adds an extensibility layer that allows users to register handlers at defined points in the session, generation, sampling, component, tool, and validation pipelines.
Changes
New:
mellea/plugins/packagetypes.py— DefinesHookTypeenum (30 hook points across 7 categories) andPluginModeenum (SEQUENTIAL,CONCURRENT,AUDIT,FIRE_AND_FORGET). Registers all hook types with the ContextForge hook registry.base.py—MelleaBasePayload(frozen Pydantic model shared by all payloads),MelleaPluginbase class with typed accessors for backend/context/session and context manager support, andPluginViolationErrorfor blocked-execution errors.decorators.py—@hook(hook_type, mode, priority)for standalone functions or class methods;@plugin(name, priority)for plain classes; both inject context manager support (with/async with).policies.py—HookPayloadPolicyper hook type declaring which fields plugins may write;DefaultHookPolicyenum;MELLEA_HOOK_PAYLOAD_POLICIESregistry.registry.py—register(),deregister_session_plugins(),shutdown_plugins(),has_plugins()(accepts optionalHookTypefilter), andinit_plugins().manager.py— SingletonPluginManagerwrapper;invoke_hook()dispatches payloads, appliesHookPayloadPolicyto filter plugin-proposed modifications to writable fields only, raisesPluginViolationErrorwhen a plugin blocks in enforce mode, and routesFIRE_AND_FORGEThooks to background tasks.pluginset.py—PluginSetfor grouping multiple plugins or hook functions with a shared priority and context manager support.context.py—build_global_context()assembles thePluginContextpassed to every hook invocation.hooks/— Typed payload classes for each hook category:session.py:SessionPreInitPayload,SessionPostInitPayload,SessionResetPayload,SessionCleanupPayloadgeneration.py:GenerationPreCallPayload,GenerationPostCallPayload,GenerationStreamChunkPayloadsampling.py:SamplingLoopStartPayload,SamplingIterationPayload,SamplingRepairPayload,SamplingLoopEndPayloadcomponent.py:ComponentPreCreatePayload,ComponentPostCreatePayload,ComponentPreExecutePayload,ComponentPostSuccessPayload,ComponentPostErrorPayloadtool.py:ToolPreInvokePayload,ToolPostInvokePayloadvalidation.py:ValidationPreCheckPayload,ValidationPostCheckPayloadModified: stdlib integration
mellea/stdlib/functional.py—invoke_hook()calls added at: generation pre/post call, sampling loop start/iteration/repair/end, component pre-create/post-create/pre-execute/post-success/post-error.mellea/stdlib/sampling/base.py— Hook call sites added to the sampling loop; passessample_contexts[best_failed_index](not original context) on failure path.mellea/stdlib/session.py— Session lifecycle hooks (SESSION_PRE_INIT,SESSION_POST_INIT,SESSION_RESET,SESSION_CLEANUP) added; session state threaded into global plugin context.mellea/core/backend.py— Backend reference surfaced in plugin context.Documentation
docs/dev/hook_system.md— Full specification: hook types, payload schemas, write-protection rules, execution modes, plugin scoping (global vs session), policy enforcement, known limitations.docs/dev/hook_system_implementation_plan.md— Phased implementation plan.Examples
docs/examples/plugins/— Six example scripts covering: standalone hook functions, class-based plugins (@plugin), session-scoped plugins, plugin set composition, concurrent hooks, and tool hooks.Tests
test/plugins/:test_all_payloads.py)PluginViolationError(test_blocking.py)test_build_global_context.py)test_decorators.py)test_example_scripts.py)test_execution_modes.py)test_hook_call_sites.py)test_manager.py)MelleaPluginbase class (test_mellea_plugin.py)test_payload_memory_references.py)test_payloads.py)PluginSet(test_pluginset.py)test_policies.py)test_policy_enforcement.py)test_priority_ordering.py)test_scoping.py)Dependencies
cpexextra.Notes
try/except ImportError; stdlib call sites are no-ops whencpexis not installed.PluginViolationErroris raised only when a plugin explicitly setsblock=Truein its result and the hook's policy isDefaultHookPolicy.SEQUENTIAL.__name__attribute; dynamically generated hooks with identical names will collide with "Plugin already registered" errors.