feat: normailze intinsics interfaces by akihikokuroda · Pull Request #1028 · generative-computing/mellea

akihikokuroda · 2026-05-06T17:18:23Z

Misc PR

Type of PR

Bug Fix
New Feature
Documentation
Other

Description

Link to Issue: fix: fix: intrinsic function signatures #1003

The return type of factuality_detection / factuality_correction is changed float → str in this PR.

Testing

Tests added to the respective file if code was changed
New code has 100% coverage if code as added
Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

Attribution

AI coding assistants used: claude

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>

github-actions · 2026-05-06T17:20:48Z

The PR description has been updated. Please fill out the template for your PR to be reviewed.

planetf1

threading looks good, and catching the float → str return type mismatch was worth doing. Two things I'd want resolved before merging: the from_context tests won't actually exercise the code path they're meant to cover, and the documents positional change breaks existing callers without any mention in the issue or PR. The latter is easy to fix without the break — see inline.

planetf1 · 2026-05-07T09:58:37Z

+    # Add documents to the last assistant message
+    last_turn = context.last_turn()
+    assert last_turn is not None and last_turn.output is not None
+    assert last_turn.output.value is not None


These assertions will never pass as written. _read_guardian_input builds context via context.add(Message(...)), which means the last node is a Message, not a ModelOutputThunk — so last_turn() returns ContextTurn(last_element, None) (base.py:919). .output is always None here. The test aborts before reaching the code it's supposed to cover, so the documents=None branch in factuality_detection is entirely uncovered.

The fix is to use the same extraction path that guardian.py itself falls back to:

last_turn = context.last_turn() assert last_turn is not None if last_turn.output is not None and last_turn.output.value is not None: content = last_turn.output.value else: assert isinstance(last_turn.model_input, Message) content = last_turn.model_input.content rewound = context.previous_node assert rewound is not None context_with_docs: ChatContext = rewound.add( # type: ignore[assignment] Message("assistant", content, documents=documents) )

Same issue at lines 220–221 in test_factuality_correction_from_context.

planetf1 · 2026-05-07T09:58:37Z

-def factuality_detection(context: ChatContext, backend: AdapterMixin) -> float:
-    """Determine is the last response is factually incorrect.
+def factuality_detection(
+    documents: collections.abc.Iterable[str | Document] | None,


Making documents the new first positional arg is a silent break — existing callers doing factuality_detection(ctx, backend) now pass ctx as documents and get a confusing error rather than a helpful one. Neither #1003 nor this PR mentions it as a breaking change, and our policy is to avoid that without explicit callout.

The good news is the issue just asks for parity with the other intrinsics — we can get there without the break by making documents keyword-only:

def factuality_detection( context: ChatContext, backend: AdapterMixin, *, documents: collections.abc.Iterable[str | Document] | None = None, model_options: dict | None = None, ) -> str:

Existing call sites continue to work, new callers pass documents= explicitly, and it's consistent with how model_options is handled everywhere else in this PR. Worth also updating the AGENTS.md Section 13 table with the new signature.

planetf1 · 2026-05-07T09:58:37Z


+    # If documents are provided, add them to the last assistant message
+    if documents is not None:
+        # Get the last turn and add documents to it


This block extracting the assistant content, rewinding the context, and reattaching documents is identical in factuality_correction (~lines 303–336). If either of the fixes above lands here, it'll need to land in both places. Would be cleaner to pull it into a private helper — something like _reattach_documents(context, documents) -> ChatContext — which also gives us one place to sort out the cast() vs type: ignore question below.

planetf1 · 2026-05-07T09:58:37Z

+
+        # Extract assistant content from either output (if generated) or model_input (if manually added)
+        if last_turn.output is not None and last_turn.output.value is not None:
+            assistant_content = last_turn.output.value


Small edge case: if last_turn.output is not None but .value is None (uncomputed thunk, failed generation), the code falls through to model_input and ends up wrapping the user's question as the assistant's response. That'll produce a factuality score on the wrong text with no indication anything went wrong. Raising explicitly there rather than falling through would be much easier to debug.

planetf1 · 2026-05-07T09:58:37Z

+            raise ValueError("Cannot rewind context past the root node")
+        # Type ignore because rewound is Context but we know it's ChatContext
+        context = rewound.add(  # type: ignore[assignment]
+            Message(


# type: ignore[assignment] is hiding a real mismatch: previous_node.add(...) returns Context, not ChatContext. Worth using cast(ChatContext, rewound.add(...)) with a short note explaining why, rather than silencing mypy here — if the type hierarchy shifts this will fail at runtime instead of at type-check time. As above, extracting the helper means one fix instead of two.

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>

jakelorocco · 2026-05-08T16:30:18Z

+def _reattach_documents(
+    context: ChatContext, documents: collections.abc.Iterable[str | Document]
+) -> ChatContext:
+    """Rewind context and re-add the last message with documents attached.
+
+    Extracts the assistant's response from the last turn (handling both generated
+    output and manually-added messages), then rewinds the context to before that
+    message and re-adds it with the provided documents.
+
+    Args:
+        context: Context containing the conversation.
+        documents: Documents to attach to the last assistant message.
+
+    Returns:
+        New context with documents attached to the last assistant message.
+
+    Raises:
+        ValueError: If context cannot be rewound or assistant content cannot be extracted.
+    """
+    last_turn = context.last_turn()
+    if last_turn is None:
+        raise ValueError("Cannot reattach documents: context has no last turn")
+
+    # Extract assistant content, preferring generated output over input
+    if last_turn.output is not None and last_turn.output.value is not None:
+        assistant_content = last_turn.output.value
+    elif last_turn.output is not None and last_turn.output.value is None:
+        # Uncomputed thunk — avoid silent fallthrough to model_input
+        raise ValueError(
+            "Cannot reattach documents: last turn output is uncomputed (thunk with no value)"
+        )
+    elif last_turn.model_input is not None and isinstance(
+        last_turn.model_input, Message
+    ):
+        assistant_content = last_turn.model_input.content
+    else:
+        raise ValueError(
+            "Cannot reattach documents: cannot extract assistant content from last turn"
+        )
+
+    # Rewind and re-add with documents
+    rewound = context.previous_node
+    if rewound is None:
+        raise ValueError("Cannot rewind context past the root node")
+
+    return cast(
+        ChatContext,
+        rewound.add(
+            Message(
+                "assistant",
+                assistant_content,
+                documents=_coerce_to_documents(documents),
+            )
+        ),
+    )


We actually have two helper functions already (_resolve_question and _resolve_response) utilized by the other intrinsics. Could we utilize that function? I believe it serves the same purpose.

I'm still working on this.

Now it's fixed.

jakelorocco · 2026-05-08T16:32:59Z

+        context = context.add(Message(m["role"], m["content"]))

-    result = guardian.factuality_detection(context, call_intrinsic_backend)
+    result = guardian.factuality_detection(docs, context, call_intrinsic_backend)


Please make sure the tests run. I might be misreading things but shouldn't this throw an error given the function signature define above?

jakelorocco · 2026-05-08T16:34:00Z

+    # Extract assistant content using the same logic as _reattach_documents
+    last_turn = context.last_turn()
+    assert last_turn is not None
+    if last_turn.output is not None and last_turn.output.value is not None:
+        content = last_turn.output.value
+    else:
+        assert isinstance(last_turn.model_input, Message)
+        content = last_turn.model_input.content
+
+    rewound = context.previous_node
+    assert rewound is not None
+    context_with_docs: ChatContext = rewound.add(  # type: ignore[assignment]
+        Message("assistant", content, documents=documents)
+    )

-    result = guardian.factuality_correction(context, backend)
+    # Test with documents=None (should extract from context)
+    result = guardian.factuality_correction(context_with_docs, backend)


Isn't this replicating the logic from the helper function? Should we be able to use that here? I might be missing what this is testing.

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>

planetf1 · 2026-05-11T11:53:09Z

Just noticed the AGENTS.md table entries for factuality_detection / factuality_correction are missing response as the first arg — they show (context, backend, *, ...) but the actual signature is (response, context, backend, *, ...). Anyone copying from the table will get a confusing TypeError. Small fix.

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>

jakelorocco · 2026-05-11T14:58:45Z

+def _reattach_documents(
+    context: ChatContext, documents: collections.abc.Iterable[str | Document]
+) -> ChatContext:
+    """Rewind context and re-add the last message with documents attached.


Instead of adding this new function, can you please fix the _resolve_response / _resolve_question functions? I see that you already fixed some of the message handling there. I think the current difference between those functions and this one, is that you are ensuring documents aren't accidentally dropped? If so, please remove this function and fix the others. At the very least, we should not be duplicating this logic which appears to mostly match _resolve_response.

And if this is a real issue, please add tests directly to _resolve_response and _resolve_question that ensure we don't drop the documents accidentally in the future. I'm fine if you need to change the function signature, perhaps we should pass back the actual Message / ModelOutputThunk, etc...

planetf1

LGTM — the AGENTS.md additions for the guardian intrinsics look correct and complete.

Side note: AGENTS.md currently has two sections both numbered §13 ("Feedback Loop" and "Working with Intrinsics"). Pre-existing, not introduced by this PR, so not blocking — just flagging for a future tidy-up pass.

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>

jakelorocco · 2026-05-11T17:54:38Z

        their respective UTF-8 strings.
    """
-    response, context = _resolve_response(response, context)
+    response, context, _ = _resolve_response(response, context)


If there are returned documents, shouldn't they be getting utilized here?

jakelorocco · 2026-05-11T17:55:34Z

+        if explicit_docs is not None and resolved_docs is not None:
+            explicit_docs.extend(resolved_docs)
+            context = context.add(
+                Message("assistant", response, documents=explicit_docs)
+            )
+        elif explicit_docs is not None:
+            context = context.add(
+                Message("assistant", response, documents=explicit_docs)
+            )
+        elif resolved_docs is not None:
+            context = context.add(
+                Message("assistant", response, documents=resolved_docs)
+            )
+        else:
+            context = context.add(Message("assistant", response))


Can this be condensed into something that just sets the value of docs conditionally, and then creates the Message in one spot?

jakelorocco · 2026-05-11T17:56:48Z

    if documents is not None:
-        if response is not None:
-            # Response was explicitly provided, add it with documents
-            context = context.add(
-                Message(
-                    "assistant", response, documents=_coerce_to_documents(documents)
-                )
-            )
-        else:
-            # Response came from context output, reattach documents
-            context = _reattach_documents(context, documents)
+        # Explicit documents take precedence
+        context = context.add(
+            Message("assistant", response, documents=_coerce_to_documents(documents))
+        )
+    elif resolved_docs is not None:
+        # Use documents from the last message if available
+        context = context.add(Message("assistant", response, documents=resolved_docs))
    else:
-        # No documents provided, add response to context if it was explicitly provided
-        if response is not None:
-            context = context.add(Message("assistant", response))
+        # No documents available
+        context = context.add(Message("assistant", response))


Same as above. I also believe this one is following different logic and not adding both sets of documents if there were docs attached to the Message and the passed in.

jakelorocco · 2026-05-11T17:57:22Z

    if documents is not None:
-        if response is not None:
-            # Response was explicitly provided, add it with documents
-            context = context.add(
-                Message(
-                    "assistant", response, documents=_coerce_to_documents(documents)
-                )
-            )
-        else:
-            # Response came from context output, reattach documents
-            context = _reattach_documents(context, documents)
+        # Explicit documents take precedence
+        context = context.add(
+            Message("assistant", response, documents=_coerce_to_documents(documents))
+        )
+    elif resolved_docs is not None:
+        # Use documents from the last message if available
+        context = context.add(Message("assistant", response, documents=resolved_docs))
    else:
-        # No documents provided, add response to context if it was explicitly provided
-        if response is not None:
-            context = context.add(Message("assistant", response))
+        # No documents available
+        context = context.add(Message("assistant", response))


Same as above.

jakelorocco · 2026-05-11T17:57:48Z

        A string value of either "answerable" or "unanswerable"
    """
-    question, context = _resolve_question(question, context, backend)
+    question, context, _ = _resolve_question(question, context, backend)


In all of these use cases, we should be grabbing documents if they exist. We should not be silently dropping them.

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>

jakelorocco · 2026-05-12T15:16:52Z

+    response, context, resolved_docs = _resolve_response(response, context)
+    explicit_docs = _coerce_to_documents(documents, auto_doc_id=False)
+    docs_to_use = explicit_docs
+    if resolved_docs is not None:
+        if docs_to_use is not None:
+            docs_to_use.extend(resolved_docs)
+        else:
+            docs_to_use = resolved_docs


I feel like this a little verbose. Can we switch to something like:

response, context, resolved_docs = _resolve_response(response, context) explicit_docs = _coerce_to_documents(documents, auto_doc_id=False) docs_to_use = [*(explicit_docs or []), *(resolved_docs or [])] or None

jakelorocco · 2026-05-12T15:20:17Z

+def factuality_detection(
+    response: str | None,


For factuality_detection and factuality_correction, lets make sure that all of our call sites using it are updated as well. Please manually run the tests and examples since our CICD pipeline doesn't run all of these I think. We will also need to coordinate with the intrinsic teams / external examples to update their Mellea code blocks.

I believe that I look through all intrinsic tests and examples.
cc: @generative-computing/mellea-intrinsics

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>

planetf1

Thanks @akihikokuroda for picking this up — model_options threading and document support on the guardian helpers are exactly what #1003 asked for, and the signature alignment is overdue.

One coordination check before merging. Three items in this PR touch areas the in-flight adapter lifecycle design proposal is actively working through — epic #929, draft doc here:

1. Positional reorder of the guardian signatures. factuality_detection(context, backend) → factuality_detection(response, context, backend, …). #1003 asked for documents as an explicit parameter — that's clean as a keyword-only add (*, documents=None). The positional reorder is an extra breaking change, and it conflicts with the design proposal §6, which commits to "high-level helpers keep their current signatures" through the migration.

2. _resolve_response manual-Message fallback — the new elif turn.model_input … role == "assistant" branch. This pre-empts §17 Q5 of the proposal, which asks whether rewind logic belongs in io_contract.build_prompt or stays in the helpers. Landing it here silently answers the question.

3. 3-tuple returns in _resolve_question / _resolve_response. Scaffolding for (2); reverts cleanly if (2) defers.

Proposed split — a targeted revert across _util.py, core.py, guardian.py, and the two example/test files gets this to the pure #1003 delivery:

Keep (aligned with both #1003 and #929):

model_options: dict | None = None on all top-level helpers
documents=None as keyword-only on factuality_detection / factuality_correction, preserving the existing context-based fallback as default

Defer to #929 (design discussion first):

Positional reorder of the two guardian helpers
_resolve_response manual-Message fallback branch
3-tuple returns in the resolve utilities

Net effect: #1003 delivered in full, no breaking changes, §17 Q5 stays open for proper design.

If you'd like to preserve the rewind enhancement (item 2), a separate PR once §17 Q5 is decided would be the natural place for it — happy to coordinate the timing.

@jakelorocco @nrfulton — flagging since you've been engaged with #929. Does this split look right? Happy to be overruled if you see this differently.

jakelorocco · 2026-05-13T15:46:04Z

@planetf1 @akihikokuroda, I think the split sounds reasonable. I think the current placement of the _resolve_question context editing / rolling back is reasonable, but will respond in the intrinsic refactor issue.

In other words,

I think if we modify this PR to merge the items under Keep, we can merge it immediately.
After we finalize the adapter refactoring, I believe we will be able to open a follow up PR to this one with the _resolve_response and resolve utilities under the Defer section.

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>

planetf1 · 2026-05-14T12:26:11Z

Heads-up: overlaps with PR #935

PR #935 (docs: migrate Guardian documentation) also edits mellea/stdlib/components/intrinsic/guardian.py — specifically it corrects the factuality_detection and factuality_correction return type annotations from float to str, which this PR also does as part of its broader rewrite.

Suggest merging this PR first: it subsumes the annotation fix and adds the fuller documents= / model_options= improvements. PR #935 can simply drop its guardian.py hunk on rebase with no loss.

planetf1 · 2026-05-14T12:37:50Z

 | `rag` | `check_context_relevance(question, document, context, backend)` | Whether a document is relevant (0–1); only supported for granite-4.0, not granite-4.1 |
 | `rag` | `flag_hallucinated_content(response, documents, context, backend)` | Flag potentially hallucinated sentences |
+| `guardian` | `factuality_detection(response, context, backend, *, documents=None, model_options=None)` | Determine if the last response is factually incorrect ("yes"/"no") |
+| `guardian` | `factuality_correction(response, context, backend, *, documents=None, model_options=None)` | Correct the last response to be factually accurate |


BLOCKER: Phantom response parameter in signature. The documented signature includes response as a positional parameter that does not exist on the function. Copy-pasting from the table will cause TypeError. Same issue on the factuality_correction row below. Please remove response, from both entries.

planetf1 · 2026-05-14T12:38:03Z

        Risk score as a float between 0.0 (no risk) and 1.0 (risk detected).
    """
+    if documents is not None and target_role == "assistant":
+        response, context, resolved_docs = _resolve_response(None, context)


WARNING: When documents is provided but target_role == "user", the documents are silently discarded — no error, no warning. A caller who passes documents for a user-targeted check (e.g. groundedness on the user turn) will get a result computed without them and have no signal that anything went wrong. Suggest raising ValueError("documents parameter is only supported when target_role='assistant'") here, or handling the user role analogously to the assistant path.

planetf1 · 2026-05-14T12:38:14Z

+
+@pytest.mark.qualitative
+def test_factuality_detection_from_context(backend):
+    """Verify factuality detection works when documents are in the last message."""


WARNING: This test body is identical to test_factuality_detection above — same input, same call with documents=documents, same assertion. It adds no coverage. The "from_context" name suggests it was meant to test that documents already attached to the last assistant Message in context are picked up when documents= is not passed. To actually test that path, build a context where Message("assistant", ..., documents=[doc]) is the last turn and call factuality_detection(context, backend) without a documents= argument. As written, this can either be deleted or repurposed.

planetf1

Thanks for this PR — the core improvement (adding documents= and model_options= to the factuality/guardian intrinsics, and supporting manually-added assistant Messages in _resolve_response) is a solid fix for #1003 and good to land.

Three things to address before merge:

1. BLOCKER — phantom response param in AGENTS.md (inline comment on AGENTS.md)
Both new table rows list response as the first positional arg. The functions have no such parameter. Noted inline.

2. WARNING — guardian_check silently drops documents when target_role == "user" (inline comment on guardian.py)
When documents is not None and target_role == "user", the documents are quietly discarded. The caller gets no error or warning. Noted inline.

3. WARNING — test_factuality_detection_from_context / test_factuality_correction_from_context are exact duplicates (inline comment on test_guardian.py)
Both _from_context tests are byte-for-byte identical to the preceding tests. The name implies testing document extraction from context when documents= is not passed — but that path is not exercised. Either delete or repurpose. Noted inline.

Also worth noting: the float → str return type correction on factuality_detection / factuality_correction is the right fix (runtime was always a string). Worth a line in the PR description since it is technically a breaking annotation change for any callers who typed against -> float.

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>

normailze intinsics interfaces

c0221b1

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>

akihikokuroda requested a review from a team as a code owner May 6, 2026 17:18

akihikokuroda requested review from avinash2692 and planetf1 May 6, 2026 17:18

github-actions Bot added the enhancement New feature or request label May 6, 2026

planetf1 mentioned this pull request May 7, 2026

Epic: Fix Intrinsic Adapter Lifecycle & Consistency in Mellea #929

Open

planetf1 requested changes May 7, 2026

View reviewed changes

akihikokuroda added 2 commits May 7, 2026 09:20

review comments

4ddb641

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>

review comments

6733711

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>

akihikokuroda requested a review from planetf1 May 7, 2026 14:13

psschwei requested a review from jakelorocco May 8, 2026 15:08

jakelorocco reviewed May 8, 2026

View reviewed changes

akihikokuroda requested a review from jakelorocco May 8, 2026 22:51

review comments

b6cc1e6

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>

update AGENTS.md

1059a20

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>

jakelorocco reviewed May 11, 2026

View reviewed changes

planetf1 approved these changes May 11, 2026

View reviewed changes

review comments

1c9c6c2

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>

jakelorocco reviewed May 11, 2026

View reviewed changes

review comments

3827c00

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>

jakelorocco requested changes May 12, 2026

View reviewed changes

review comment

b15723b

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>

akihikokuroda requested review from jakelorocco and planetf1 May 12, 2026 19:23

planetf1 requested changes May 13, 2026

View reviewed changes

implement Proposed split

5914136

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>

planetf1 mentioned this pull request May 14, 2026

docs: migrate Guardian documentation from deprecated GuardianCheck to Intrinsics API #935

Open

9 tasks

planetf1 reviewed May 14, 2026

View reviewed changes

review comment

744d568

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>

akihikokuroda requested a review from planetf1 May 14, 2026 14:16

Conversation

akihikokuroda commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Misc PR

Type of PR

Description

Testing

Attribution

Uh oh!

github-actions Bot commented May 6, 2026

Uh oh!

planetf1 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

planetf1 commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

planetf1 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

planetf1 left a comment

Choose a reason for hiding this comment

Uh oh!

jakelorocco commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

planetf1 commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

planetf1 left a comment

Choose a reason for hiding this comment

akihikokuroda commented May 6, 2026 •

edited

Loading

planetf1 commented May 11, 2026 •

edited

Loading

jakelorocco commented May 13, 2026 •

edited

Loading

planetf1 commented May 14, 2026 •

edited

Loading