Skip to content

feat(renderer-client): give per-token prompt attribution to TrajectoryStep#1414

Open
snimu wants to merge 1 commit into
mainfrom
sebastian/renderers-pass-through-info-2026-05-19
Open

feat(renderer-client): give per-token prompt attribution to TrajectoryStep#1414
snimu wants to merge 1 commit into
mainfrom
sebastian/renderers-pass-through-info-2026-05-19

Conversation

@snimu
Copy link
Copy Markdown
Contributor

@snimu snimu commented May 19, 2026

renderers.client.generate() returns the renderer's RenderedTokens for the prompt as prompt_attribution (token_ids, message_indices, sampled_mask, is_content, message_roles). This change wires that sidecar through RendererClient → ResponseTokens →
TrajectoryStepTokens so prime-rl can build SFT-on-tool-body and other selective loss masks (e.g. attribution.content_mask_for_roles({"tool"})) without re-rendering at training time.

Plumbing:

  • ResponseTokens.prompt_attribution: Any | None and the matching NotRequired field on TrajectoryStepTokens. Stored as Any to avoid a hard renderers import at this layer — same precedent as multi_modal_data.
  • RendererClient.get_native_response now passes the bridge's RenderedTokens (already returned by _get_incremental_prompt_ids) into generate(prompt_attribution=...). On the first turn the bridge is None and generate does the render itself and surfaces the attribution on the result.
  • RendererClient.from_native_response lifts prompt_attribution from the raw result dict onto ResponseTokens. Missing-key callers (older renderers, non-renderer pathways) get None.
  • parse_response_tokens slices the per-token arrays (token_ids, message_indices, sampled_mask, is_content) in lockstep with prompt_ids on max_seq_len overflow. message_roles stays intact (per-message indexing, not per-token). The sidecar moves (not copies) from response.message.tokens to the TrajectoryStepTokens, matching the existing multi_modal_data move-not-copy policy.

Non-renderer clients (chat completions, completions, responses, Anthropic) leave the field empty by construction; their tokenizations aren't byte-stable and the body/scaffold cut is unknown.

Runtime dependency: requires the matching renderers PR (introduces prompt_attribution on renderers.client.generate). The renderers version constraint in pyproject.toml is left at >=0.1.8.dev4 for now — bump to >=0.1.8.dev5 (or whatever the next renderers tag turns out to be) in the same merge train as the renderers release so the constraint moves once the runtime is actually available on PyPI.

Tests:

  • tests/test_renderer_client.py — bridge path threads attribution into generate, from_native_response lifts the field onto ResponseTokens, missing-key fallback resolves to None.
  • tests/test_trajectory_processing.py — attribution survives parse, moves from response to step, truncates correctly on overlong prompt, and is left intact when only the completion needs truncation.

Description

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Test improvement

Testing

  • All existing tests pass when running uv run pytest locally.
  • New tests have been added to cover the changes

Checklist

  • My code follows the style guidelines of this project as outlined in AGENTS.md
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Additional Notes


Note

Medium Risk
Medium risk because it changes token plumbing and truncation/move semantics used during training sample assembly; mistakes could silently corrupt masks or duplicate-serialize sidecars.

Overview
Adds a new prompt_attribution sidecar (renderer RenderedTokens) to ResponseTokens and TrajectoryStepTokens so downstream training can build selective-loss masks without re-rendering.

Updates RendererClient.get_native_response to pass bridged RenderedTokens into renderers.client.generate(prompt_attribution=...), and from_native_response to lift prompt_attribution from the raw generate result (missing key defaults to None).

Extends parse_response_tokens to move-not-copy prompt_attribution onto the parsed step (clearing it on the response) and to truncate per-token attribution arrays when the prompt is truncated by max_seq_len. Adds focused tests covering pass-through, missing-key fallback, move semantics, and truncation cases.

Reviewed by Cursor Bugbot for commit 5171ae9. Bugbot is set up for automated code reviews on this repo. Configure here.

Note

Add per-token prompt attribution to TrajectoryStep via RendererClient

  • Adds an optional prompt_attribution field to ResponseTokens and TrajectoryStepTokens in verifiers/types.py to carry renderer-emitted per-token attribution data.
  • RendererClient.get_native_response in renderer_client.py now passes prompt_attribution (the bridged RenderedTokens) into generate(), and from_native_response reads it back into Response.message.tokens.prompt_attribution.
  • parse_response_tokens in response_utils.py extracts prompt_attribution into the returned dict and clears it from the response (move semantics). A new _truncate_prompt_attribution helper slices token-aligned attribution fields when the prompt is truncated to max_seq_len.
  • Behavioral Change: parse_response_tokens now mutates response.message.tokens.prompt_attribution to None after extraction, mirroring existing multi_modal_data handling.

Macroscope summarized 5171ae9.

…TrajectoryStep

renderers.client.generate() returns the renderer's RenderedTokens for
the prompt as ``prompt_attribution`` (token_ids, message_indices,
sampled_mask, is_content, message_roles). This change wires that
sidecar through RendererClient → ResponseTokens →
TrajectoryStepTokens so prime-rl can build SFT-on-tool-body and other
selective loss masks (e.g. ``attribution.content_mask_for_roles({"tool"})``)
without re-rendering at training time.

Plumbing:

- ResponseTokens.prompt_attribution: Any | None and the matching
  NotRequired field on TrajectoryStepTokens. Stored as ``Any`` to
  avoid a hard ``renderers`` import at this layer — same precedent as
  ``multi_modal_data``.
- RendererClient.get_native_response now passes the bridge's
  RenderedTokens (already returned by ``_get_incremental_prompt_ids``)
  into ``generate(prompt_attribution=...)``. On the first turn the
  bridge is None and ``generate`` does the render itself and surfaces
  the attribution on the result.
- RendererClient.from_native_response lifts ``prompt_attribution`` from
  the raw result dict onto ``ResponseTokens``. Missing-key callers
  (older renderers, non-renderer pathways) get ``None``.
- parse_response_tokens slices the per-token arrays (token_ids,
  message_indices, sampled_mask, is_content) in lockstep with
  prompt_ids on max_seq_len overflow. message_roles stays intact
  (per-message indexing, not per-token). The sidecar moves
  (not copies) from ``response.message.tokens`` to the
  TrajectoryStepTokens, matching the existing ``multi_modal_data``
  move-not-copy policy.

Non-renderer clients (chat completions, completions, responses,
Anthropic) leave the field empty by construction; their tokenizations
aren't byte-stable and the body/scaffold cut is unknown.

Runtime dependency: requires the matching renderers PR (introduces
``prompt_attribution`` on ``renderers.client.generate``). The
``renderers`` version constraint in pyproject.toml is left at
``>=0.1.8.dev4`` for now — bump to ``>=0.1.8.dev5`` (or whatever the
next renderers tag turns out to be) in the same merge train as the
renderers release so the constraint moves once the runtime is
actually available on PyPI.

Tests:

- tests/test_renderer_client.py — bridge path threads attribution into
  ``generate``, ``from_native_response`` lifts the field onto
  ResponseTokens, missing-key fallback resolves to ``None``.
- tests/test_trajectory_processing.py — attribution survives parse,
  moves from response to step, truncates correctly on overlong prompt,
  and is left intact when only the completion needs truncation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 5171ae9. Configure here.

model=model,
prompt_ids=prompt_ids,
multi_modal_data=multi_modal_data,
prompt_attribution=prompt_attribution,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renderers version constraint doesn't match runtime requirement

Medium Severity

The code now passes prompt_attribution=prompt_attribution to renderers.client.generate, which requires a renderers version that supports this keyword argument. However, pyproject.toml still constrains renderers>=0.1.8.dev4 in both the dev and optional dependency groups. Installing the minimum allowed version would cause a TypeError at runtime when generate() is called. The package configuration is inconsistent with the changed behavior, and the merge commit depends on an unpublished artifact, violating the "Releasable State" rule.

Fix in Cursor Fix in Web

Triggered by project rule: BugBot Instructions

Reviewed by Cursor Bugbot for commit 5171ae9. Configure here.

@macroscopeapp
Copy link
Copy Markdown

macroscopeapp Bot commented May 19, 2026

Approvability

Verdict: Needs human review

This PR adds new feature functionality (passing prompt_attribution through the renderer pipeline) and has an unresolved review comment about a potential version constraint mismatch that could cause runtime errors with older renderers versions.

You can customize Macroscope's approvability policy. Learn more.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant