Skip to content

feat: normailze intinsics interfaces#1028

Open
akihikokuroda wants to merge 10 commits into
generative-computing:mainfrom
akihikokuroda:issue1003
Open

feat: normailze intinsics interfaces#1028
akihikokuroda wants to merge 10 commits into
generative-computing:mainfrom
akihikokuroda:issue1003

Conversation

@akihikokuroda
Copy link
Copy Markdown
Member

@akihikokuroda akihikokuroda commented May 6, 2026

Misc PR

Type of PR

  • Bug Fix
  • New Feature
  • Documentation
  • Other

Description

The return type of factuality_detection / factuality_correction is changed float → str in this PR.

Testing

  • Tests added to the respective file if code was changed
  • New code has 100% coverage if code as added
  • Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

Attribution

  • AI coding assistants used: claude

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
@akihikokuroda akihikokuroda requested a review from a team as a code owner May 6, 2026 17:18
@github-actions github-actions Bot added the enhancement New feature or request label May 6, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

The PR description has been updated. Please fill out the template for your PR to be reviewed.

Copy link
Copy Markdown
Contributor

@planetf1 planetf1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

threading looks good, and catching the float → str return type mismatch was worth doing. Two things I'd want resolved before merging: the from_context tests won't actually exercise the code path they're meant to cover, and the documents positional change breaks existing callers without any mention in the issue or PR. The latter is easy to fix without the break — see inline.

# Add documents to the last assistant message
last_turn = context.last_turn()
assert last_turn is not None and last_turn.output is not None
assert last_turn.output.value is not None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These assertions will never pass as written. _read_guardian_input builds context via context.add(Message(...)), which means the last node is a Message, not a ModelOutputThunk — so last_turn() returns ContextTurn(last_element, None) (base.py:919). .output is always None here. The test aborts before reaching the code it's supposed to cover, so the documents=None branch in factuality_detection is entirely uncovered.

The fix is to use the same extraction path that guardian.py itself falls back to:

last_turn = context.last_turn()
assert last_turn is not None
if last_turn.output is not None and last_turn.output.value is not None:
    content = last_turn.output.value
else:
    assert isinstance(last_turn.model_input, Message)
    content = last_turn.model_input.content
rewound = context.previous_node
assert rewound is not None
context_with_docs: ChatContext = rewound.add(  # type: ignore[assignment]
    Message("assistant", content, documents=documents)
)

Same issue at lines 220–221 in test_factuality_correction_from_context.

def factuality_detection(context: ChatContext, backend: AdapterMixin) -> float:
"""Determine is the last response is factually incorrect.
def factuality_detection(
documents: collections.abc.Iterable[str | Document] | None,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making documents the new first positional arg is a silent break — existing callers doing factuality_detection(ctx, backend) now pass ctx as documents and get a confusing error rather than a helpful one. Neither #1003 nor this PR mentions it as a breaking change, and our policy is to avoid that without explicit callout.

The good news is the issue just asks for parity with the other intrinsics — we can get there without the break by making documents keyword-only:

def factuality_detection(
    context: ChatContext,
    backend: AdapterMixin,
    *,
    documents: collections.abc.Iterable[str | Document] | None = None,
    model_options: dict | None = None,
) -> str:

Existing call sites continue to work, new callers pass documents= explicitly, and it's consistent with how model_options is handled everywhere else in this PR. Worth also updating the AGENTS.md Section 13 table with the new signature.


# If documents are provided, add them to the last assistant message
if documents is not None:
# Get the last turn and add documents to it
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block extracting the assistant content, rewinding the context, and reattaching documents is identical in factuality_correction (~lines 303–336). If either of the fixes above lands here, it'll need to land in both places. Would be cleaner to pull it into a private helper — something like _reattach_documents(context, documents) -> ChatContext — which also gives us one place to sort out the cast() vs type: ignore question below.


# Extract assistant content from either output (if generated) or model_input (if manually added)
if last_turn.output is not None and last_turn.output.value is not None:
assistant_content = last_turn.output.value
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small edge case: if last_turn.output is not None but .value is None (uncomputed thunk, failed generation), the code falls through to model_input and ends up wrapping the user's question as the assistant's response. That'll produce a factuality score on the wrong text with no indication anything went wrong. Raising explicitly there rather than falling through would be much easier to debug.

raise ValueError("Cannot rewind context past the root node")
# Type ignore because rewound is Context but we know it's ChatContext
context = rewound.add( # type: ignore[assignment]
Message(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# type: ignore[assignment] is hiding a real mismatch: previous_node.add(...) returns Context, not ChatContext. Worth using cast(ChatContext, rewound.add(...)) with a short note explaining why, rather than silencing mypy here — if the type hierarchy shifts this will fail at runtime instead of at type-check time. As above, extracting the helper means one fix instead of two.

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
@akihikokuroda akihikokuroda requested a review from planetf1 May 7, 2026 14:13
@psschwei psschwei requested a review from jakelorocco May 8, 2026 15:08
Comment on lines +195 to +249
def _reattach_documents(
context: ChatContext, documents: collections.abc.Iterable[str | Document]
) -> ChatContext:
"""Rewind context and re-add the last message with documents attached.

Extracts the assistant's response from the last turn (handling both generated
output and manually-added messages), then rewinds the context to before that
message and re-adds it with the provided documents.

Args:
context: Context containing the conversation.
documents: Documents to attach to the last assistant message.

Returns:
New context with documents attached to the last assistant message.

Raises:
ValueError: If context cannot be rewound or assistant content cannot be extracted.
"""
last_turn = context.last_turn()
if last_turn is None:
raise ValueError("Cannot reattach documents: context has no last turn")

# Extract assistant content, preferring generated output over input
if last_turn.output is not None and last_turn.output.value is not None:
assistant_content = last_turn.output.value
elif last_turn.output is not None and last_turn.output.value is None:
# Uncomputed thunk — avoid silent fallthrough to model_input
raise ValueError(
"Cannot reattach documents: last turn output is uncomputed (thunk with no value)"
)
elif last_turn.model_input is not None and isinstance(
last_turn.model_input, Message
):
assistant_content = last_turn.model_input.content
else:
raise ValueError(
"Cannot reattach documents: cannot extract assistant content from last turn"
)

# Rewind and re-add with documents
rewound = context.previous_node
if rewound is None:
raise ValueError("Cannot rewind context past the root node")

return cast(
ChatContext,
rewound.add(
Message(
"assistant",
assistant_content,
documents=_coerce_to_documents(documents),
)
),
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually have two helper functions already (_resolve_question and _resolve_response) utilized by the other intrinsics. Could we utilize that function? I believe it serves the same purpose.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still working on this.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it's fixed.

Comment thread mellea/stdlib/components/intrinsic/guardian.py
Comment thread mellea/stdlib/components/intrinsic/guardian.py
Comment thread test/backends/test_openai_intrinsics.py Outdated
context = context.add(Message(m["role"], m["content"]))

result = guardian.factuality_detection(context, call_intrinsic_backend)
result = guardian.factuality_detection(docs, context, call_intrinsic_backend)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please make sure the tests run. I might be misreading things but shouldn't this throw an error given the function signature define above?

Comment on lines +223 to +239
# Extract assistant content using the same logic as _reattach_documents
last_turn = context.last_turn()
assert last_turn is not None
if last_turn.output is not None and last_turn.output.value is not None:
content = last_turn.output.value
else:
assert isinstance(last_turn.model_input, Message)
content = last_turn.model_input.content

rewound = context.previous_node
assert rewound is not None
context_with_docs: ChatContext = rewound.add( # type: ignore[assignment]
Message("assistant", content, documents=documents)
)

result = guardian.factuality_correction(context, backend)
# Test with documents=None (should extract from context)
result = guardian.factuality_correction(context_with_docs, backend)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this replicating the logic from the helper function? Should we be able to use that here? I might be missing what this is testing.

Comment thread mellea/stdlib/components/intrinsic/guardian.py
@akihikokuroda akihikokuroda requested a review from jakelorocco May 8, 2026 22:51
Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
@planetf1
Copy link
Copy Markdown
Contributor

planetf1 commented May 11, 2026

Just noticed the AGENTS.md table entries for factuality_detection / factuality_correction are missing response as the first arg — they show (context, backend, *, ...) but the actual signature is (response, context, backend, *, ...). Anyone copying from the table will get a confusing TypeError. Small fix.

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
Comment on lines +204 to +207
def _reattach_documents(
context: ChatContext, documents: collections.abc.Iterable[str | Document]
) -> ChatContext:
"""Rewind context and re-add the last message with documents attached.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of adding this new function, can you please fix the _resolve_response / _resolve_question functions? I see that you already fixed some of the message handling there. I think the current difference between those functions and this one, is that you are ensuring documents aren't accidentally dropped? If so, please remove this function and fix the others. At the very least, we should not be duplicating this logic which appears to mostly match _resolve_response.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And if this is a real issue, please add tests directly to _resolve_response and _resolve_question that ensure we don't drop the documents accidentally in the future. I'm fine if you need to change the function signature, perhaps we should pass back the actual Message / ModelOutputThunk, etc...

Copy link
Copy Markdown
Contributor

@planetf1 planetf1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — the AGENTS.md additions for the guardian intrinsics look correct and complete.

Side note: AGENTS.md currently has two sections both numbered §13 ("Feedback Loop" and "Working with Intrinsics"). Pre-existing, not introduced by this PR, so not blocking — just flagging for a future tidy-up pass.

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
their respective UTF-8 strings.
"""
response, context = _resolve_response(response, context)
response, context, _ = _resolve_response(response, context)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there are returned documents, shouldn't they be getting utilized here?

Comment on lines +186 to +200
if explicit_docs is not None and resolved_docs is not None:
explicit_docs.extend(resolved_docs)
context = context.add(
Message("assistant", response, documents=explicit_docs)
)
elif explicit_docs is not None:
context = context.add(
Message("assistant", response, documents=explicit_docs)
)
elif resolved_docs is not None:
context = context.add(
Message("assistant", response, documents=resolved_docs)
)
else:
context = context.add(Message("assistant", response))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be condensed into something that just sets the value of docs conditionally, and then creates the Message in one spot?

Comment on lines +259 to +269
if documents is not None:
if response is not None:
# Response was explicitly provided, add it with documents
context = context.add(
Message(
"assistant", response, documents=_coerce_to_documents(documents)
)
)
else:
# Response came from context output, reattach documents
context = _reattach_documents(context, documents)
# Explicit documents take precedence
context = context.add(
Message("assistant", response, documents=_coerce_to_documents(documents))
)
elif resolved_docs is not None:
# Use documents from the last message if available
context = context.add(Message("assistant", response, documents=resolved_docs))
else:
# No documents provided, add response to context if it was explicitly provided
if response is not None:
context = context.add(Message("assistant", response))
# No documents available
context = context.add(Message("assistant", response))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above. I also believe this one is following different logic and not adding both sets of documents if there were docs attached to the Message and the passed in.

Comment on lines +316 to +326
if documents is not None:
if response is not None:
# Response was explicitly provided, add it with documents
context = context.add(
Message(
"assistant", response, documents=_coerce_to_documents(documents)
)
)
else:
# Response came from context output, reattach documents
context = _reattach_documents(context, documents)
# Explicit documents take precedence
context = context.add(
Message("assistant", response, documents=_coerce_to_documents(documents))
)
elif resolved_docs is not None:
# Use documents from the last message if available
context = context.add(Message("assistant", response, documents=resolved_docs))
else:
# No documents provided, add response to context if it was explicitly provided
if response is not None:
context = context.add(Message("assistant", response))
# No documents available
context = context.add(Message("assistant", response))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

A string value of either "answerable" or "unanswerable"
"""
question, context = _resolve_question(question, context, backend)
question, context, _ = _resolve_question(question, context, backend)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In all of these use cases, we should be grabbing documents if they exist. We should not be silently dropping them.

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
Comment on lines +110 to +117
response, context, resolved_docs = _resolve_response(response, context)
explicit_docs = _coerce_to_documents(documents, auto_doc_id=False)
docs_to_use = explicit_docs
if resolved_docs is not None:
if docs_to_use is not None:
docs_to_use.extend(resolved_docs)
else:
docs_to_use = resolved_docs
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this a little verbose. Can we switch to something like:

  response, context, resolved_docs = _resolve_response(response, context)
  explicit_docs = _coerce_to_documents(documents, auto_doc_id=False)
  docs_to_use = [*(explicit_docs or []), *(resolved_docs or [])] or None

Comment on lines +212 to +213
def factuality_detection(
response: str | None,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For factuality_detection and factuality_correction, lets make sure that all of our call sites using it are updated as well. Please manually run the tests and examples since our CICD pipeline doesn't run all of these I think. We will also need to coordinate with the intrinsic teams / external examples to update their Mellea code blocks.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that I look through all intrinsic tests and examples.
cc: @generative-computing/mellea-intrinsics

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
Copy link
Copy Markdown
Contributor

@planetf1 planetf1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @akihikokuroda for picking this up — model_options threading and document support on the guardian helpers are exactly what #1003 asked for, and the signature alignment is overdue.

One coordination check before merging. Three items in this PR touch areas the in-flight adapter lifecycle design proposal is actively working through — epic #929, draft doc here:

1. Positional reorder of the guardian signatures. factuality_detection(context, backend)factuality_detection(response, context, backend, …). #1003 asked for documents as an explicit parameter — that's clean as a keyword-only add (*, documents=None). The positional reorder is an extra breaking change, and it conflicts with the design proposal §6, which commits to "high-level helpers keep their current signatures" through the migration.

2. _resolve_response manual-Message fallback — the new elif turn.model_input … role == "assistant" branch. This pre-empts §17 Q5 of the proposal, which asks whether rewind logic belongs in io_contract.build_prompt or stays in the helpers. Landing it here silently answers the question.

3. 3-tuple returns in _resolve_question / _resolve_response. Scaffolding for (2); reverts cleanly if (2) defers.

Proposed split — a targeted revert across _util.py, core.py, guardian.py, and the two example/test files gets this to the pure #1003 delivery:

Keep (aligned with both #1003 and #929):

  • model_options: dict | None = None on all top-level helpers
  • documents=None as keyword-only on factuality_detection / factuality_correction, preserving the existing context-based fallback as default

Defer to #929 (design discussion first):

  • Positional reorder of the two guardian helpers
  • _resolve_response manual-Message fallback branch
  • 3-tuple returns in the resolve utilities

Net effect: #1003 delivered in full, no breaking changes, §17 Q5 stays open for proper design.

If you'd like to preserve the rewind enhancement (item 2), a separate PR once §17 Q5 is decided would be the natural place for it — happy to coordinate the timing.

@jakelorocco @nrfulton — flagging since you've been engaged with #929. Does this split look right? Happy to be overruled if you see this differently.

@jakelorocco
Copy link
Copy Markdown
Contributor

jakelorocco commented May 13, 2026

@planetf1 @akihikokuroda, I think the split sounds reasonable. I think the current placement of the _resolve_question context editing / rolling back is reasonable, but will respond in the intrinsic refactor issue.

In other words,

  • I think if we modify this PR to merge the items under Keep, we can merge it immediately.
  • After we finalize the adapter refactoring, I believe we will be able to open a follow up PR to this one with the _resolve_response and resolve utilities under the Defer section.

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
@planetf1
Copy link
Copy Markdown
Contributor

planetf1 commented May 14, 2026

Heads-up: overlaps with PR #935

PR #935 (docs: migrate Guardian documentation) also edits mellea/stdlib/components/intrinsic/guardian.py — specifically it corrects the factuality_detection and factuality_correction return type annotations from float to str, which this PR also does as part of its broader rewrite.

Suggest merging this PR first: it subsumes the annotation fix and adds the fuller documents= / model_options= improvements. PR #935 can simply drop its guardian.py hunk on rebase with no loss.

Comment thread AGENTS.md Outdated
| `rag` | `check_context_relevance(question, document, context, backend)` | Whether a document is relevant (0–1); only supported for granite-4.0, not granite-4.1 |
| `rag` | `flag_hallucinated_content(response, documents, context, backend)` | Flag potentially hallucinated sentences |
| `guardian` | `factuality_detection(response, context, backend, *, documents=None, model_options=None)` | Determine if the last response is factually incorrect ("yes"/"no") |
| `guardian` | `factuality_correction(response, context, backend, *, documents=None, model_options=None)` | Correct the last response to be factually accurate |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BLOCKER: Phantom response parameter in signature. The documented signature includes response as a positional parameter that does not exist on the function. Copy-pasting from the table will cause TypeError. Same issue on the factuality_correction row below. Please remove response, from both entries.

Risk score as a float between 0.0 (no risk) and 1.0 (risk detected).
"""
if documents is not None and target_role == "assistant":
response, context, resolved_docs = _resolve_response(None, context)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: When documents is provided but target_role == "user", the documents are silently discarded — no error, no warning. A caller who passes documents for a user-targeted check (e.g. groundedness on the user turn) will get a result computed without them and have no signal that anything went wrong. Suggest raising ValueError("documents parameter is only supported when target_role='assistant'") here, or handling the user role analogously to the assistant path.


@pytest.mark.qualitative
def test_factuality_detection_from_context(backend):
"""Verify factuality detection works when documents are in the last message."""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING: This test body is identical to test_factuality_detection above — same input, same call with documents=documents, same assertion. It adds no coverage. The "from_context" name suggests it was meant to test that documents already attached to the last assistant Message in context are picked up when documents= is not passed. To actually test that path, build a context where Message("assistant", ..., documents=[doc]) is the last turn and call factuality_detection(context, backend) without a documents= argument. As written, this can either be deleted or repurposed.

Copy link
Copy Markdown
Contributor

@planetf1 planetf1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR — the core improvement (adding documents= and model_options= to the factuality/guardian intrinsics, and supporting manually-added assistant Messages in _resolve_response) is a solid fix for #1003 and good to land.

Three things to address before merge:

1. BLOCKER — phantom response param in AGENTS.md (inline comment on AGENTS.md)
Both new table rows list response as the first positional arg. The functions have no such parameter. Noted inline.

2. WARNING — guardian_check silently drops documents when target_role == "user" (inline comment on guardian.py)
When documents is not None and target_role == "user", the documents are quietly discarded. The caller gets no error or warning. Noted inline.

3. WARNING — test_factuality_detection_from_context / test_factuality_correction_from_context are exact duplicates (inline comment on test_guardian.py)
Both _from_context tests are byte-for-byte identical to the preceding tests. The name implies testing document extraction from context when documents= is not passed — but that path is not exercised. Either delete or repurpose. Noted inline.

Also worth noting: the float → str return type correction on factuality_detection / factuality_correction is the right fix (runtime was always a string). Worth a line in the PR description since it is technically a breaking annotation change for any callers who typed against -> float.

Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
@akihikokuroda akihikokuroda requested a review from planetf1 May 14, 2026 14:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: intrinsic function signatures

3 participants