Skip to content

feat(spec): add renderTiming to McpUiToolMeta for deferred View rendering#553

Open
netanelavr wants to merge 1 commit intomodelcontextprotocol:mainfrom
netanelavr:feat/render-timing
Open

feat(spec): add renderTiming to McpUiToolMeta for deferred View rendering#553
netanelavr wants to merge 1 commit intomodelcontextprotocol:mainfrom
netanelavr:feat/render-timing

Conversation

@netanelavr
Copy link

Summary

Adds a new renderTiming field to McpUiToolMeta that lets servers declare when a View should appear in the conversation, addressing a gap in the spec where hosts have no standardized way to distinguish between Views that should render immediately vs. after the agent finishes its turn.

Problem

The current spec defines displayMode (inline / fullscreen / pip) for visual layout, but has no concept of temporal presentation — i.e., when to show the View. In agentic workflows where the LLM makes multiple sequential tool calls, some Views (e.g., "Apply to Site", confirmation dialogs) should only appear after the agent is done reasoning, to prevent premature user interaction.

Today, hosts that need this behavior must invent proprietary metadata fields. This PR standardizes the pattern.

Solution

New type and field on McpUiToolMeta:

type McpUiRenderTiming = "inline" | "end-of-turn";

interface McpUiToolMeta {
  resourceUri?: string;
  visibility?: McpUiToolVisibility[];
  renderTiming?: McpUiRenderTiming;  // NEW
}
  • "inline" (default) — render the View as soon as the tool returns
  • "end-of-turn" — defer rendering until the agent's turn is complete (no more tool calls)

Design decisions

  • Server-declared hint: The server has domain knowledge about whether its View needs deferred rendering; the host respects it but MAY ignore it
  • Orthogonal to displayMode: Timing and layout are independent concerns — a View can be end-of-turn + fullscreen
  • Backward compatible: Optional field, defaults to "inline", existing tools are unaffected
  • Extensible: String union allows future values (e.g., "on-user-action") without breaking changes

Prior art

  • Elementor's Angie has shipped this pattern in production (as a vendor-specific _meta.ui.displayMode field with "inline" / "end-of-turn" values). This PR standardizes the concept.
  • Related to the deferred _meta["openai/toolInvocation/invoking"] / invoked fields tracked in Protocol discrepancies between MCP Apps and Apps SDK #201, though those are status text rather than timing control.

Changes

  • src/spec.types.ts — add McpUiRenderTiming type and renderTiming field to McpUiToolMeta
  • src/types.ts — re-export new type and schema
  • specification/draft/apps.mdx — document Render Timing section and design decision
  • src/generated/* — auto-regenerated schemas (Zod + JSON Schema + tests)

Test plan

  • npm test — all 121 tests pass
  • npm run build — builds successfully including all examples
  • Schema generation produces correct Zod and JSON Schema for the new type
  • Type-level integration tests verify McpUiRenderTiming round-trips correctly

Made with Cursor

…ring

Add a new `renderTiming` field to `McpUiToolMeta` that lets servers
declare when a View should appear in the conversation:

- "inline" (default): render as soon as the tool returns
- "end-of-turn": defer rendering until the agent's turn is complete

This addresses a gap in the spec where hosts have no standardized way
to know whether a View should be shown immediately or after the agent
finishes its turn. Tools like "Apply to Site" need deferred rendering
to prevent premature user interaction while the agent is still making
additional tool calls.

This is orthogonal to the existing visual `displayMode`
(inline/fullscreen/pip) which controls layout, not timing.

Changes:
- spec.types.ts: add McpUiRenderTiming type and renderTiming field
- types.ts: re-export new type and schema
- specification/draft/apps.mdx: document Render Timing section and
  design decision
- generated/schema.*: auto-regenerated from types

Made-with: Cursor
@idosal
Copy link
Contributor

idosal commented Mar 19, 2026

Thanks @netanelavr ! To understand the gap, could you please provide additional example cases that tool definition doesn't cover? For example, in your current example, I'd imagine the "approval" tool could be forced to be called after the reasoning by requiring the reason argument.

@liady
Copy link
Contributor

liady commented Mar 19, 2026

@netanelavr just to make sure - currently the host renders the view immediately (and doesn't actually wait for the tool result). The decision of what to show inside the view is being done by the view itself, according to the data it gets from the host (i.e no data -> loading state, tool inputs -> stateA, tool result -> stateB).

This mechanism can theoretically be extended so that the host will send a new type of message to signal that it has done reasoning (so that the view can respond to that).
What do you think? This might allow the most accurate visual feedback for the user.

So the view can change according to these lifecycle events:

  • The host decides to use the tool (renders the view)
  • The host calls the tool (streams tool inputs to the view)
  • The host receives the tool response (sends the tool result to the view)
  • The host finishes the agentic reasoning (sends a message to the view)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants