Add structured output format with assembly line mappings by mattgodbolt-molty · Pull Request #14 · compiler-explorer/explain

mattgodbolt-molty · 2026-02-21T18:40:46Z

Summary

Adds an optional format field to the explain API: "markdown" (default, fully backward compatible) or "structured" (JSON with assembly line mappings).

Structured format response

{
  "structuredExplanation": {
    "summary": "GCC -O2 compiles square() into three instructions",
    "sections": [
      {
        "title": "Multiply the input",
        "asmStartLine": 1,
        "asmEndLine": 1,
        "content": "`imul edi, edi` multiplies the register by itself..."
      }
    ],
    "keyInsight": "No stack frame needed — everything stays in registers"
  }
}

What this enables in CE

Highlight on hover: mouse over an explanation section → corresponding assembly lines light up
Bidirectional linking: click an assembly line → scroll to relevant explanation section
Progressive disclosure: show summary + key insight by default, expand sections for detail

Implementation

New format field on ExplainRequest ("markdown" default, "structured" opt-in)
Uses Anthropic's structured output API (output_config with json_schema) for guaranteed valid JSON
Structured path skips assistant prefill (not compatible with output_config), adds line indexing hint to system prompt
Pydantic models for StructuredExplanation and ExplanationSection

Testing

Unit tests: 95 pass (91 existing + 4 new covering both paths)

Structured format returns structuredExplanation, not explanation
Structured format passes output_config to API
Markdown format unchanged (no output_config, returns explanation)

Live testing: Tested on 3 examples (square -O2, fibonacci -O2, unoptimised add):

Line references 100% accurate across all tests
Quality matches markdown version
~15% more output tokens due to JSON structure overhead

Backward compatibility

Fully backward compatible:

Default format is "markdown" — existing clients see identical responses
structuredExplanation field is null for markdown requests
No changes to the GET / options endpoint

Next steps (CE frontend, separate PR)

The CE frontend would need to:

Send format: "structured" in requests
Render sections as expandable cards
Wire up assembly highlighting via the existing eventHub

(I'm Molty, an AI assistant acting on behalf of @mattgodbolt)

Adds an optional 'format' field to requests: 'markdown' (default, backward compatible) or 'structured' (JSON with assembly line ranges). Structured format returns: - summary: one-sentence overview - sections: array of {title, asmStartLine, asmEndLine, content} - keyInsight: the most important takeaway Each section maps to specific 0-indexed assembly lines, enabling frontends to highlight relevant assembly as the user reads each section. Uses Anthropic's structured output API (output_config with json_schema) for guaranteed valid JSON. Tested with Sonnet 4.6: line references are accurate across simple, complex, optimised, and unoptimised examples. Backward compatible: existing clients see no change. The structuredExplanation field is null when format is 'markdown'. 🤖 Generated by LLM (Claude, via OpenClaw)

Copilot

Pull request overview

This PR adds structured output support to the Claude Explain API, enabling assembly line mappings for the Compiler Explorer frontend. The API now accepts an optional format field that can be either "markdown" (default, backward compatible) or "structured" (JSON with assembly line references).

Changes:

Added ExplanationFormat enum and format field to ExplainRequest with "markdown" as default
Introduced StructuredExplanation and ExplanationSection models with assembly line mapping fields (asmStartLine, asmEndLine)
Modified explain logic to use Anthropic's structured output API (output_config with JSON schema) for structured format requests
Added comprehensive test coverage for both structured and markdown formats

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 7 comments.

File	Description
app/explain_api.py	Defines new data models (`ExplanationFormat`, `StructuredExplanation`, `ExplanationSection`) and adds `format` field to `ExplainRequest` and `structuredExplanation` to `ExplainResponse`
app/explain.py	Implements format-specific logic including output_config setup, system prompt modification, and response parsing for structured vs markdown formats
app/test_explain.py	Adds `TestStructuredOutput` class with 4 tests covering structured response format, output_config usage, and backward compatibility verification

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-21T18:47:41Z

app/explain.py

    if cache_provider is not None:
        cached_response = await get_cached_response(body, prompt, cache_provider)


The cache key generation doesn't account for the format field, which means structured and markdown format requests with identical inputs will share the same cache key. This could return cached responses in the wrong format.

For example, if a markdown response is cached first, a subsequent structured format request with the same inputs will receive the cached markdown text, which will then fail JSON validation at line 135 when calling StructuredExplanation.model_validate_json().

The format field must be included in the cache key calculation in cache.py's generate_cache_key function. Since the format affects the actual API parameters (system prompt via line 116-117, messages via line 115, and output_config via lines 120-125), these differences should be reflected in the cache key.

Copilot · 2026-02-21T18:47:42Z

app/explain.py

+    structured: StructuredExplanation | None = None
+
+    if use_structured:
+        structured = StructuredExplanation.model_validate_json(raw_text)


The JSON validation here can raise a ValidationError if the API returns JSON that doesn't match the StructuredExplanation schema. While Anthropic's structured output with output_config should guarantee schema-compliant JSON, it would be more robust to handle potential validation errors explicitly and return a meaningful error response rather than letting the exception propagate to FastAPI's default error handler.

Consider wrapping this in a try-except block to catch pydantic.ValidationError and return an appropriate error response with status='error'.

Copilot · 2026-02-21T18:47:42Z

app/explain_api.py

+class ExplanationSection(BaseModel):
+    """A section of a structured explanation, mapped to assembly lines."""
+
+    model_config = {"json_schema_extra": {"additionalProperties": False}}


The model_config setting may not correctly add additionalProperties: false to the generated JSON schema. In Pydantic v2, the json_schema_extra dict is merged with the generated schema, but the placement here might not achieve the desired effect.

To properly disallow additional properties in the JSON schema sent to Anthropic, you should verify that model_json_schema() actually includes "additionalProperties": false at the schema root level. Consider using Pydantic's ConfigDict with extra='forbid' instead, or verify the generated schema matches expectations:

model_config = ConfigDict(extra='forbid')

This ensures both runtime validation and JSON schema generation correctly reject additional properties.

Copilot · 2026-02-21T18:47:42Z

app/explain_api.py

+class StructuredExplanation(BaseModel):
+    """Structured explanation with assembly line mappings."""
+
+    model_config = {"json_schema_extra": {"additionalProperties": False}}


Same as above - the model_config setting may not correctly add additionalProperties: false to the generated JSON schema. Consider using ConfigDict(extra='forbid') instead to ensure both runtime validation and JSON schema generation correctly reject additional properties.

Copilot · 2026-02-21T18:47:42Z

app/explain.py

+        api_kwargs["max_tokens"] = max(prompt_data["max_tokens"], 2048)
+        api_kwargs["output_config"] = {
+            "format": {
+                "type": "json_schema",
+                "schema": StructuredExplanation.model_json_schema(),


The max_tokens is increased to a minimum of 2048 for structured output to account for JSON overhead. While this is reasonable, it means structured format requests may consume more tokens (and cost more) than markdown requests with the same prompt configuration. Consider documenting this behavior in the API documentation or PR description so users are aware of the potential cost difference.

Suggested change

api_kwargs["max_tokens"] = max(prompt_data["max_tokens"], 2048)

api_kwargs["output_config"] = {

"format": {

"type": "json_schema",

"schema": StructuredExplanation.model_json_schema(),

# Ensure enough tokens for JSON overhead in structured output. This may increase

# token usage and cost compared to markdown responses with the same prompt.

if prompt_data["max_tokens"] < 2048:

LOGGER.info(

"Structured output: increasing max_tokens from %s to 2048 to account for JSON "

"overhead. This may increase token usage and cost versus markdown output.",

prompt_data["max_tokens"],

)

api_kwargs["max_tokens"] = max(prompt_data["max_tokens"], 2048)

api_kwargs["output_config"] = {

"format": {

"type": "json_schema",

"schema": StructuredExplanation.model_json_schema(),

Copilot · 2026-02-21T18:47:43Z

app/explain_api.py

+    asmStartLine: int = Field(..., description="0-indexed start line in the assembly listing")
+    asmEndLine: int = Field(..., description="0-indexed end line (inclusive) in the assembly listing")


The asmStartLine and asmEndLine fields don't include validation to ensure they're within the bounds of the actual assembly array. While Anthropic's structured output should be accurate, there's no guarantee these indices won't exceed the assembly array length.

Consider adding a validator to check bounds, or at minimum document that API consumers (like the CE frontend) should validate these indices before attempting to use them for highlighting. You could add a Pydantic field validator that checks if end >= start at minimum.

Copilot · 2026-02-21T18:47:43Z

app/explain_api.py

+    model_config = {"json_schema_extra": {"additionalProperties": False}}
+
+    summary: str = Field(..., description="One-sentence overview of what the compiler did")
+    sections: list[ExplanationSection] = Field(..., description="Explanation sections mapped to assembly lines")


The sections field doesn't have a minimum length constraint, meaning an empty list would be valid. For most assembly explanations, having at least one section seems reasonable. Consider adding a validator to ensure at least one section exists:

sections: list[ExplanationSection] = Field(..., min_length=1, description="Explanation sections mapped to assembly lines")

This would make the schema more robust and prevent degenerate cases where the LLM returns no sections.

Suggested change

sections: list[ExplanationSection] = Field(..., description="Explanation sections mapped to assembly lines")

sections: list[ExplanationSection] = Field(

..., min_length=1, description="Explanation sections mapped to assembly lines"

)

mattgodbolt requested a review from Copilot February 21, 2026 18:42

Copilot started reviewing on behalf of mattgodbolt February 21, 2026 18:42 View session

Copilot AI reviewed Feb 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add structured output format with assembly line mappings#14

Add structured output format with assembly line mappings#14
mattgodbolt-molty wants to merge 1 commit intomainfrom
molty/structured-output

mattgodbolt-molty commented Feb 21, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 21, 2026

Uh oh!

Copilot AI Feb 21, 2026

Uh oh!

Copilot AI Feb 21, 2026

Uh oh!

Copilot AI Feb 21, 2026

Uh oh!

Copilot AI Feb 21, 2026

Uh oh!

Copilot AI Feb 21, 2026

Uh oh!

Copilot AI Feb 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		if cache_provider is not None:
		cached_response = await get_cached_response(body, prompt, cache_provider)

		asmStartLine: int = Field(..., description="0-indexed start line in the assembly listing")
		asmEndLine: int = Field(..., description="0-indexed end line (inclusive) in the assembly listing")

Conversation

mattgodbolt-molty commented Feb 21, 2026

Summary

Structured format response

What this enables in CE

Implementation

Testing

Backward compatibility

Next steps (CE frontend, separate PR)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 21, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants