feat: normailze intinsics interfaces#1028
Conversation
Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
|
The PR description has been updated. Please fill out the template for your PR to be reviewed. |
planetf1
left a comment
There was a problem hiding this comment.
threading looks good, and catching the float → str return type mismatch was worth doing. Two things I'd want resolved before merging: the from_context tests won't actually exercise the code path they're meant to cover, and the documents positional change breaks existing callers without any mention in the issue or PR. The latter is easy to fix without the break — see inline.
| # Add documents to the last assistant message | ||
| last_turn = context.last_turn() | ||
| assert last_turn is not None and last_turn.output is not None | ||
| assert last_turn.output.value is not None |
There was a problem hiding this comment.
These assertions will never pass as written. _read_guardian_input builds context via context.add(Message(...)), which means the last node is a Message, not a ModelOutputThunk — so last_turn() returns ContextTurn(last_element, None) (base.py:919). .output is always None here. The test aborts before reaching the code it's supposed to cover, so the documents=None branch in factuality_detection is entirely uncovered.
The fix is to use the same extraction path that guardian.py itself falls back to:
last_turn = context.last_turn()
assert last_turn is not None
if last_turn.output is not None and last_turn.output.value is not None:
content = last_turn.output.value
else:
assert isinstance(last_turn.model_input, Message)
content = last_turn.model_input.content
rewound = context.previous_node
assert rewound is not None
context_with_docs: ChatContext = rewound.add( # type: ignore[assignment]
Message("assistant", content, documents=documents)
)Same issue at lines 220–221 in test_factuality_correction_from_context.
| def factuality_detection(context: ChatContext, backend: AdapterMixin) -> float: | ||
| """Determine is the last response is factually incorrect. | ||
| def factuality_detection( | ||
| documents: collections.abc.Iterable[str | Document] | None, |
There was a problem hiding this comment.
Making documents the new first positional arg is a silent break — existing callers doing factuality_detection(ctx, backend) now pass ctx as documents and get a confusing error rather than a helpful one. Neither #1003 nor this PR mentions it as a breaking change, and our policy is to avoid that without explicit callout.
The good news is the issue just asks for parity with the other intrinsics — we can get there without the break by making documents keyword-only:
def factuality_detection(
context: ChatContext,
backend: AdapterMixin,
*,
documents: collections.abc.Iterable[str | Document] | None = None,
model_options: dict | None = None,
) -> str:Existing call sites continue to work, new callers pass documents= explicitly, and it's consistent with how model_options is handled everywhere else in this PR. Worth also updating the AGENTS.md Section 13 table with the new signature.
|
|
||
| # If documents are provided, add them to the last assistant message | ||
| if documents is not None: | ||
| # Get the last turn and add documents to it |
There was a problem hiding this comment.
This block extracting the assistant content, rewinding the context, and reattaching documents is identical in factuality_correction (~lines 303–336). If either of the fixes above lands here, it'll need to land in both places. Would be cleaner to pull it into a private helper — something like _reattach_documents(context, documents) -> ChatContext — which also gives us one place to sort out the cast() vs type: ignore question below.
|
|
||
| # Extract assistant content from either output (if generated) or model_input (if manually added) | ||
| if last_turn.output is not None and last_turn.output.value is not None: | ||
| assistant_content = last_turn.output.value |
There was a problem hiding this comment.
Small edge case: if last_turn.output is not None but .value is None (uncomputed thunk, failed generation), the code falls through to model_input and ends up wrapping the user's question as the assistant's response. That'll produce a factuality score on the wrong text with no indication anything went wrong. Raising explicitly there rather than falling through would be much easier to debug.
| raise ValueError("Cannot rewind context past the root node") | ||
| # Type ignore because rewound is Context but we know it's ChatContext | ||
| context = rewound.add( # type: ignore[assignment] | ||
| Message( |
There was a problem hiding this comment.
# type: ignore[assignment] is hiding a real mismatch: previous_node.add(...) returns Context, not ChatContext. Worth using cast(ChatContext, rewound.add(...)) with a short note explaining why, rather than silencing mypy here — if the type hierarchy shifts this will fail at runtime instead of at type-check time. As above, extracting the helper means one fix instead of two.
Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
| def _reattach_documents( | ||
| context: ChatContext, documents: collections.abc.Iterable[str | Document] | ||
| ) -> ChatContext: | ||
| """Rewind context and re-add the last message with documents attached. | ||
|
|
||
| Extracts the assistant's response from the last turn (handling both generated | ||
| output and manually-added messages), then rewinds the context to before that | ||
| message and re-adds it with the provided documents. | ||
|
|
||
| Args: | ||
| context: Context containing the conversation. | ||
| documents: Documents to attach to the last assistant message. | ||
|
|
||
| Returns: | ||
| New context with documents attached to the last assistant message. | ||
|
|
||
| Raises: | ||
| ValueError: If context cannot be rewound or assistant content cannot be extracted. | ||
| """ | ||
| last_turn = context.last_turn() | ||
| if last_turn is None: | ||
| raise ValueError("Cannot reattach documents: context has no last turn") | ||
|
|
||
| # Extract assistant content, preferring generated output over input | ||
| if last_turn.output is not None and last_turn.output.value is not None: | ||
| assistant_content = last_turn.output.value | ||
| elif last_turn.output is not None and last_turn.output.value is None: | ||
| # Uncomputed thunk — avoid silent fallthrough to model_input | ||
| raise ValueError( | ||
| "Cannot reattach documents: last turn output is uncomputed (thunk with no value)" | ||
| ) | ||
| elif last_turn.model_input is not None and isinstance( | ||
| last_turn.model_input, Message | ||
| ): | ||
| assistant_content = last_turn.model_input.content | ||
| else: | ||
| raise ValueError( | ||
| "Cannot reattach documents: cannot extract assistant content from last turn" | ||
| ) | ||
|
|
||
| # Rewind and re-add with documents | ||
| rewound = context.previous_node | ||
| if rewound is None: | ||
| raise ValueError("Cannot rewind context past the root node") | ||
|
|
||
| return cast( | ||
| ChatContext, | ||
| rewound.add( | ||
| Message( | ||
| "assistant", | ||
| assistant_content, | ||
| documents=_coerce_to_documents(documents), | ||
| ) | ||
| ), | ||
| ) |
There was a problem hiding this comment.
We actually have two helper functions already (_resolve_question and _resolve_response) utilized by the other intrinsics. Could we utilize that function? I believe it serves the same purpose.
There was a problem hiding this comment.
I'm still working on this.
| context = context.add(Message(m["role"], m["content"])) | ||
|
|
||
| result = guardian.factuality_detection(context, call_intrinsic_backend) | ||
| result = guardian.factuality_detection(docs, context, call_intrinsic_backend) |
There was a problem hiding this comment.
Please make sure the tests run. I might be misreading things but shouldn't this throw an error given the function signature define above?
| # Extract assistant content using the same logic as _reattach_documents | ||
| last_turn = context.last_turn() | ||
| assert last_turn is not None | ||
| if last_turn.output is not None and last_turn.output.value is not None: | ||
| content = last_turn.output.value | ||
| else: | ||
| assert isinstance(last_turn.model_input, Message) | ||
| content = last_turn.model_input.content | ||
|
|
||
| rewound = context.previous_node | ||
| assert rewound is not None | ||
| context_with_docs: ChatContext = rewound.add( # type: ignore[assignment] | ||
| Message("assistant", content, documents=documents) | ||
| ) | ||
|
|
||
| result = guardian.factuality_correction(context, backend) | ||
| # Test with documents=None (should extract from context) | ||
| result = guardian.factuality_correction(context_with_docs, backend) |
There was a problem hiding this comment.
Isn't this replicating the logic from the helper function? Should we be able to use that here? I might be missing what this is testing.
Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
|
Just noticed the AGENTS.md table entries for |
Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
| def _reattach_documents( | ||
| context: ChatContext, documents: collections.abc.Iterable[str | Document] | ||
| ) -> ChatContext: | ||
| """Rewind context and re-add the last message with documents attached. |
There was a problem hiding this comment.
Instead of adding this new function, can you please fix the _resolve_response / _resolve_question functions? I see that you already fixed some of the message handling there. I think the current difference between those functions and this one, is that you are ensuring documents aren't accidentally dropped? If so, please remove this function and fix the others. At the very least, we should not be duplicating this logic which appears to mostly match _resolve_response.
There was a problem hiding this comment.
And if this is a real issue, please add tests directly to _resolve_response and _resolve_question that ensure we don't drop the documents accidentally in the future. I'm fine if you need to change the function signature, perhaps we should pass back the actual Message / ModelOutputThunk, etc...
planetf1
left a comment
There was a problem hiding this comment.
LGTM — the AGENTS.md additions for the guardian intrinsics look correct and complete.
Side note: AGENTS.md currently has two sections both numbered §13 ("Feedback Loop" and "Working with Intrinsics"). Pre-existing, not introduced by this PR, so not blocking — just flagging for a future tidy-up pass.
Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
| their respective UTF-8 strings. | ||
| """ | ||
| response, context = _resolve_response(response, context) | ||
| response, context, _ = _resolve_response(response, context) |
There was a problem hiding this comment.
If there are returned documents, shouldn't they be getting utilized here?
| if explicit_docs is not None and resolved_docs is not None: | ||
| explicit_docs.extend(resolved_docs) | ||
| context = context.add( | ||
| Message("assistant", response, documents=explicit_docs) | ||
| ) | ||
| elif explicit_docs is not None: | ||
| context = context.add( | ||
| Message("assistant", response, documents=explicit_docs) | ||
| ) | ||
| elif resolved_docs is not None: | ||
| context = context.add( | ||
| Message("assistant", response, documents=resolved_docs) | ||
| ) | ||
| else: | ||
| context = context.add(Message("assistant", response)) |
There was a problem hiding this comment.
Can this be condensed into something that just sets the value of docs conditionally, and then creates the Message in one spot?
| if documents is not None: | ||
| if response is not None: | ||
| # Response was explicitly provided, add it with documents | ||
| context = context.add( | ||
| Message( | ||
| "assistant", response, documents=_coerce_to_documents(documents) | ||
| ) | ||
| ) | ||
| else: | ||
| # Response came from context output, reattach documents | ||
| context = _reattach_documents(context, documents) | ||
| # Explicit documents take precedence | ||
| context = context.add( | ||
| Message("assistant", response, documents=_coerce_to_documents(documents)) | ||
| ) | ||
| elif resolved_docs is not None: | ||
| # Use documents from the last message if available | ||
| context = context.add(Message("assistant", response, documents=resolved_docs)) | ||
| else: | ||
| # No documents provided, add response to context if it was explicitly provided | ||
| if response is not None: | ||
| context = context.add(Message("assistant", response)) | ||
| # No documents available | ||
| context = context.add(Message("assistant", response)) |
There was a problem hiding this comment.
Same as above. I also believe this one is following different logic and not adding both sets of documents if there were docs attached to the Message and the passed in.
| if documents is not None: | ||
| if response is not None: | ||
| # Response was explicitly provided, add it with documents | ||
| context = context.add( | ||
| Message( | ||
| "assistant", response, documents=_coerce_to_documents(documents) | ||
| ) | ||
| ) | ||
| else: | ||
| # Response came from context output, reattach documents | ||
| context = _reattach_documents(context, documents) | ||
| # Explicit documents take precedence | ||
| context = context.add( | ||
| Message("assistant", response, documents=_coerce_to_documents(documents)) | ||
| ) | ||
| elif resolved_docs is not None: | ||
| # Use documents from the last message if available | ||
| context = context.add(Message("assistant", response, documents=resolved_docs)) | ||
| else: | ||
| # No documents provided, add response to context if it was explicitly provided | ||
| if response is not None: | ||
| context = context.add(Message("assistant", response)) | ||
| # No documents available | ||
| context = context.add(Message("assistant", response)) |
| A string value of either "answerable" or "unanswerable" | ||
| """ | ||
| question, context = _resolve_question(question, context, backend) | ||
| question, context, _ = _resolve_question(question, context, backend) |
There was a problem hiding this comment.
In all of these use cases, we should be grabbing documents if they exist. We should not be silently dropping them.
Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
| response, context, resolved_docs = _resolve_response(response, context) | ||
| explicit_docs = _coerce_to_documents(documents, auto_doc_id=False) | ||
| docs_to_use = explicit_docs | ||
| if resolved_docs is not None: | ||
| if docs_to_use is not None: | ||
| docs_to_use.extend(resolved_docs) | ||
| else: | ||
| docs_to_use = resolved_docs |
There was a problem hiding this comment.
I feel like this a little verbose. Can we switch to something like:
response, context, resolved_docs = _resolve_response(response, context)
explicit_docs = _coerce_to_documents(documents, auto_doc_id=False)
docs_to_use = [*(explicit_docs or []), *(resolved_docs or [])] or None
| def factuality_detection( | ||
| response: str | None, |
There was a problem hiding this comment.
For factuality_detection and factuality_correction, lets make sure that all of our call sites using it are updated as well. Please manually run the tests and examples since our CICD pipeline doesn't run all of these I think. We will also need to coordinate with the intrinsic teams / external examples to update their Mellea code blocks.
There was a problem hiding this comment.
I believe that I look through all intrinsic tests and examples.
cc: @generative-computing/mellea-intrinsics
Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
planetf1
left a comment
There was a problem hiding this comment.
Thanks @akihikokuroda for picking this up — model_options threading and document support on the guardian helpers are exactly what #1003 asked for, and the signature alignment is overdue.
One coordination check before merging. Three items in this PR touch areas the in-flight adapter lifecycle design proposal is actively working through — epic #929, draft doc here:
1. Positional reorder of the guardian signatures. factuality_detection(context, backend) → factuality_detection(response, context, backend, …). #1003 asked for documents as an explicit parameter — that's clean as a keyword-only add (*, documents=None). The positional reorder is an extra breaking change, and it conflicts with the design proposal §6, which commits to "high-level helpers keep their current signatures" through the migration.
2. _resolve_response manual-Message fallback — the new elif turn.model_input … role == "assistant" branch. This pre-empts §17 Q5 of the proposal, which asks whether rewind logic belongs in io_contract.build_prompt or stays in the helpers. Landing it here silently answers the question.
3. 3-tuple returns in _resolve_question / _resolve_response. Scaffolding for (2); reverts cleanly if (2) defers.
Proposed split — a targeted revert across _util.py, core.py, guardian.py, and the two example/test files gets this to the pure #1003 delivery:
Keep (aligned with both #1003 and #929):
model_options: dict | None = Noneon all top-level helpersdocuments=Noneas keyword-only onfactuality_detection/factuality_correction, preserving the existing context-based fallback as default
Defer to #929 (design discussion first):
- Positional reorder of the two guardian helpers
_resolve_responsemanual-Message fallback branch- 3-tuple returns in the resolve utilities
Net effect: #1003 delivered in full, no breaking changes, §17 Q5 stays open for proper design.
If you'd like to preserve the rewind enhancement (item 2), a separate PR once §17 Q5 is decided would be the natural place for it — happy to coordinate the timing.
@jakelorocco @nrfulton — flagging since you've been engaged with #929. Does this split look right? Happy to be overruled if you see this differently.
|
@planetf1 @akihikokuroda, I think the split sounds reasonable. I think the current placement of the In other words,
|
Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
|
Heads-up: overlaps with PR #935 PR #935 (docs: migrate Guardian documentation) also edits Suggest merging this PR first: it subsumes the annotation fix and adds the fuller |
| | `rag` | `check_context_relevance(question, document, context, backend)` | Whether a document is relevant (0–1); only supported for granite-4.0, not granite-4.1 | | ||
| | `rag` | `flag_hallucinated_content(response, documents, context, backend)` | Flag potentially hallucinated sentences | | ||
| | `guardian` | `factuality_detection(response, context, backend, *, documents=None, model_options=None)` | Determine if the last response is factually incorrect ("yes"/"no") | | ||
| | `guardian` | `factuality_correction(response, context, backend, *, documents=None, model_options=None)` | Correct the last response to be factually accurate | |
There was a problem hiding this comment.
BLOCKER: Phantom response parameter in signature. The documented signature includes response as a positional parameter that does not exist on the function. Copy-pasting from the table will cause TypeError. Same issue on the factuality_correction row below. Please remove response, from both entries.
| Risk score as a float between 0.0 (no risk) and 1.0 (risk detected). | ||
| """ | ||
| if documents is not None and target_role == "assistant": | ||
| response, context, resolved_docs = _resolve_response(None, context) |
There was a problem hiding this comment.
WARNING: When documents is provided but target_role == "user", the documents are silently discarded — no error, no warning. A caller who passes documents for a user-targeted check (e.g. groundedness on the user turn) will get a result computed without them and have no signal that anything went wrong. Suggest raising ValueError("documents parameter is only supported when target_role='assistant'") here, or handling the user role analogously to the assistant path.
|
|
||
| @pytest.mark.qualitative | ||
| def test_factuality_detection_from_context(backend): | ||
| """Verify factuality detection works when documents are in the last message.""" |
There was a problem hiding this comment.
WARNING: This test body is identical to test_factuality_detection above — same input, same call with documents=documents, same assertion. It adds no coverage. The "from_context" name suggests it was meant to test that documents already attached to the last assistant Message in context are picked up when documents= is not passed. To actually test that path, build a context where Message("assistant", ..., documents=[doc]) is the last turn and call factuality_detection(context, backend) without a documents= argument. As written, this can either be deleted or repurposed.
planetf1
left a comment
There was a problem hiding this comment.
Thanks for this PR — the core improvement (adding documents= and model_options= to the factuality/guardian intrinsics, and supporting manually-added assistant Messages in _resolve_response) is a solid fix for #1003 and good to land.
Three things to address before merge:
1. BLOCKER — phantom response param in AGENTS.md (inline comment on AGENTS.md)
Both new table rows list response as the first positional arg. The functions have no such parameter. Noted inline.
2. WARNING — guardian_check silently drops documents when target_role == "user" (inline comment on guardian.py)
When documents is not None and target_role == "user", the documents are quietly discarded. The caller gets no error or warning. Noted inline.
3. WARNING — test_factuality_detection_from_context / test_factuality_correction_from_context are exact duplicates (inline comment on test_guardian.py)
Both _from_context tests are byte-for-byte identical to the preceding tests. The name implies testing document extraction from context when documents= is not passed — but that path is not exercised. Either delete or repurpose. Noted inline.
Also worth noting: the float → str return type correction on factuality_detection / factuality_correction is the right fix (runtime was always a string). Worth a line in the PR description since it is technically a breaking annotation change for any callers who typed against -> float.
Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
Misc PR
Type of PR
Description
The return type of factuality_detection / factuality_correction is changed float → str in this PR.
Testing
Attribution