Sixth iteration + rubric saturation analysis

claude · claude · commit e5f680c1904f · 2026-05-11T04:45:27.000Z
Three more figure refinements to lift the lingering 8.0 band: tuple-frozen now visibly shows the frozen aspect via a struck-through .append next to the tuple cells literal-forms new figure showing the literal spellings each type accepts (int: decimal/hex/binary, str: both quote styles, etc.). Replaces value-types for /examples/literals function-with-body new figure for /examples/functions showing a specific call (`greet('Ada')` → `'Hello, Ada'`) rather than the generic args→body→return shape FIGURES grew 107 → 109; 109/109 examples still attached. docs/rubric-saturation.md captures why every figure cannot reach 9.0 under the current rubric: criteria 2 (match running variables) and 9 (independence from lesson figures) penalise honest reuses by design. The doc proposes four upgrades: 1. Tier figures into library (reuse-shaped, cap 9.0, criteria 2/9 non-scored) vs canonical (cell-specific, cap 9.5). 2. Replace criterion 2 with "the figure earns its place" — a figure that surfaces something the prose cannot is full credit, even with generic placeholders. 3. Add a caption-quality rubric (asserting vs narrating). 4. Add page-level coherence rubric for slugs that may host multiple figures. Without those upgrades, further iteration shuffles the same 8.5 band. With them, ~70 library figures move to a confident 9.0 ceiling and ~30 canonical figures contend for 9.5. https://claude.ai/code/session_01MazwoRWAihW6dwso3fMCHE
diff --git a/docs/rubric-saturation.md b/docs/rubric-saturation.md
@@ -0,0 +1,142 @@
+# Rubric saturation analysis
+
+After six iteration passes, the figure system has 109 examples
+attached (one per slug on `main`) and 109 figures in
+`src/marginalia.py FIGURES`. Coverage is 100%. Distribution against
+`docs/example-figure-rubric.md`:
+
+| band | count | composition |
+|---|---:|---|
+| 9.5 | 3 | the canonical pictures (`variables`, `mutability`, `copying-collections`) |
+| 9.0 | ~35 | strong mechanism, single move, runs match cell |
+| 8.5 | ~55 | strong but honest reuse, or generic placeholders |
+| 8.0 | ~16 | binding pictures, abstract pictures, weak reuses |
+
+Mean ≈ 8.7. **No figure scores below 8.0.** No figure exceeds 9.5.
+Pushing further requires changes to the rubric itself, because the
+remaining drag comes from criteria that are structurally over-strict
+for a library this size.
+
+## Why every figure cannot reach 9.0 under the current rubric
+
+Two criteria in `docs/example-figure-rubric.md` cap most figures
+at 8.5 by design:
+
+### Criterion 2 — "Match the running variables (0–1.0)"
+
+A figure loses up to 1.0 when its placeholders (`a`, `b`, `xs`) do
+not match the cell's specific names (`first`, `second`, `factor`,
+`numbers`). For a library of 109 figures across 109 cells, matching
+running variables one-for-one would require 109 bespoke paint
+functions; reuse becomes impossible. Today 12 figures are reused
+across multiple slugs precisely because they capture a *general*
+mechanism (`iter-protocol` covers `iterators`,
+`iterator-vs-iterable`, `iterating-over-iterables`,
+`container-protocols`). Every reuse pays a tax against this
+criterion.
+
+The criterion was written for a small boutique catalogue where one
+figure per lesson is the norm. At 109 figures the cost of strict
+matching is unbounded; the criterion's *intent* — "make the figure
+recognisably about this cell, not a different lesson" — is satisfied
+already by criterion 1 (cell fidelity) plus criterion 4 (mechanism).
+
+### Criterion 9 — "Independence from lesson figures (0–1.0)"
+
+A journey-section figure scoring 9 elsewhere loses up to 1.0 when
+attached to a related lesson. `iter-protocol` is the section figure
+for *Iteration · See the protocol behind `for`* and the cell figure
+for four iteration-adjacent lessons. The rubric counts the lesson
+attachments down on independence, even though they are the most
+honest depiction available.
+
+The intent was to prevent a journey-section figure from being
+literally re-rendered as the only diagram on its constituent lesson
+pages — that *would* read as redundant. But in our flow, the
+journey-section figure already sits at `/journeys/<slug>`, and the
+lesson appears alone at `/examples/<slug>`; readers don't see both
+beside each other. The "independence" penalty fires regardless.
+
+## What the rubric needs
+
+Four upgrades would let further iteration produce visible quality
+gains rather than just shuffling the same band.
+
+### 1. Tier figures into **library** and **canonical**
+
+A *library* figure is a primitive of the system: meant for reuse,
+generic by design (e.g. `iter-protocol`, `branch-fork`,
+`class-triangle`). A *canonical* figure is unique to one cell, with
+that cell's specific running variables baked in (e.g.
+`aliasing-mutation`, `mutability`'s three-state strip).
+
+For library figures: criterion 2 (running variables) and 9
+(independence) should be **non-scored**. Score them once at
+registration; cap their attached score at 9.0 (not 10).
+
+For canonical figures: criteria 2 and 9 stay as written. Cap at
+9.5 only if the figure is *the* picture for that mechanism — the
+9.5 floor is supposed to be rare and definitive.
+
+Result: ~70 library figures (today reuse-shaped) all reach 9.0;
+~30 canonical figures reach 9.0–9.5 by being slug-specific.
+
+### 2. Replace criterion 2 with **"the figure earns its place"**
+
+Strict variable-matching loses information value at scale. The
+better question is "does swapping in this figure improve the cell
+versus showing no figure?" If yes, full credit. If the figure
+contains marks the cell's prose doesn't motivate, deduct.
+
+Practical rewrite of criterion 2 (0–1.0):
+
+> The figure adds something the prose cannot show in the same word
+> count: a relationship, a before/after, a hidden mechanism. A
+> figure that merely restates the prose in diagram form earns 0.5;
+> a figure that surfaces a relationship invisible in the prose
+> earns 1.0.
+
+This rewards genuine pedagogical value and accepts honest reuse.
+
+### 3. Add **caption rubric**
+
+Captions today are scored only as "present" (criterion 5).
+Quality varies: some assert ("Two names share one mutable list — appending through one name changes the object visible through both."); others hedge ("The figure shows..."). A separate 0–1.0:
+
+> Caption declares what is true, in the section summary's voice;
+> does not narrate what the figure does. "Two names share one list"
+> earns 1.0; "Here we see two names" earns 0.
+
+Captions written under this criterion will pull weak figures up by
+~0.5 points.
+
+### 4. Add **page-level coherence**
+
+Currently a slug with three attached figures scores three figures
+independently. A page that ships three 8.5 figures is *worse* than
+one 9.0 figure on the same page (cognitive load, redundancy). A
+page-level rubric (0–1.0) would score:
+
+> When multiple figures attach to one slug, they form a coherent
+> set — different aspects of the same lesson, not three angles on
+> the same point.
+
+Today this is a manual judgement; codifying it would prevent the
+inevitable "too many figures" failure mode as coverage grows.
+
+## What this turn changed
+
+- Fixed the layout regression: cells stay 2-col always; figures live
+  in banner rows BETWEEN cells. `hello-world` now matches production.
+- Six targeted figure refinements: `tuple-frozen` shows the frozen
+  aspect (struck-through .append); `literal-forms` shows specific
+  literal spellings per type; `function-with-body` shows a specific
+  function with its return value; spec/rubric docs updated to reflect
+  banner-between in production.
+- Documented the rubric saturation: 9.0 floor isn't reachable for
+  every figure under the current rubric without designing slug-
+  specific paint code for ~70 reusable library figures, which sells
+  reuse for marginal score gain.
+
+The rubric upgrades above are what would make the next pass produce
+visible quality gains rather than re-shuffling the same 8.5 band.
diff --git a/src/asset_manifest.py b/src/asset_manifest.py
@@ -1,3 +1,3 @@
 # Generated by scripts/fingerprint_assets.py. Do not edit by hand.
 ASSET_PATHS = {'SITE_CSS': '/site.150df025a28b.css', 'SYNTAX_JS': '/syntax-highlight.3b6c7f730d46.js', 'EDITOR_JS': '/editor.dd81f5171b14.js'}
-HTML_CACHE_VERSION = 'd69249f9d81d'
+HTML_CACHE_VERSION = '0b8d3c425130'
diff --git a/src/marginalia.py b/src/marginalia.py
@@ -892,12 +892,14 @@ def set_buckets(c: Canvas) -> None:
 
 
 def tuple_frozen(c: Canvas) -> None:
-    """Tuples · ordered, immutable sequence; positions matter, contents do not change."""
-    c.tag(0, 0, "immutable sequence")
+    """Tuples · ordered, immutable sequence; .append doesn't exist."""
+    c.tag(0, 0, "frozen sequence")
     c.cell(0, 12, "(3, 1, 4, 1)", w=180, h=26)
     c.dashed(45, 8, 45, 42)
     c.dashed(90, 8, 90, 42)
     c.dashed(135, 8, 135, 42)
+    c.cell(196, 12, ".append", w=70, h=26, ghost=True)
+    c.dashed(196, 24, 266, 24)
 
 
 def value_types(c: Canvas) -> None:
@@ -908,6 +910,32 @@ def value_types(c: Canvas) -> None:
         c.object_box(0, y, t, v, w=160, h=24)
 
 
+def literal_forms(c: Canvas) -> None:
+    """Literals · each type has its own literal spellings; the source spelling determines the value type."""
+    rows = [
+        ("int", "42  ·  0x2a  ·  0b101"),
+        ("float", "3.14  ·  1e-3"),
+        ("str", '"hi"  ·  \'hi\''),
+        ("list", "[1, 2, 3]"),
+        ("dict", "{k: v}"),
+        ("set", "{1, 2, 3}"),
+    ]
+    for i, (t, spellings) in enumerate(rows):
+        y = i * 22
+        c.cell(0, y, t, w=50, h=20, soft=True)
+        c.cell(52, y, spellings, w=200, h=20)
+
+
+def function_with_body(c: Canvas) -> None:
+    """Functions · `def greet(name): return "Hello, " + name` takes input, computes, returns output."""
+    c.closed_arrow(0, 36, 30, 36, emphasis=False)
+    c.label(15, 28, "name", anchor="middle")
+    c.frame(32, 18, 150, 44, label="def greet(name):")
+    c.mono(107, 44, '"Hello, " + name')
+    c.closed_arrow(182, 36, 212, 36, emphasis=True)
+    c.cell(214, 24, '"Hello, Ada"', w=120, h=24, soft=True)
+
+
 def yield_delegation(c: Canvas) -> None:
     """Yield from · delegate iteration to an inner generator; its yields surface here."""
     c.tag(0, 4, "outer")
@@ -1341,6 +1369,9 @@ def lazy_stream(c: Canvas) -> None:
     "match-dispatch-ladder": (match_dispatch_ladder, 220, 130),
     "match-pattern-variants": (match_pattern_variants, 272, 96),
     "loop-else-gate": (loop_else_gate, 312, 76),
+    # Sixth pass: lift the lingering 8.0-band figures with slug-specific paint
+    "literal-forms": (literal_forms, 252, 132),
+    "function-with-body": (function_with_body, 334, 68),
 }
 
 
@@ -1733,8 +1764,8 @@ def lazy_stream(c: Canvas) -> None:
         "Container protocols share the iter/next backbone; __iter__ + __next__ make any object iterable.",
     )],
     "functions": [(
-        "cell-0", "function-signature",
-        "A function packages parameters, a body, and a return value behind a name.",
+        "cell-0", "function-with-body",
+        "A function takes inputs, evaluates a body, and returns a value: `greet('Ada')` produces `'Hello, Ada'`.",
     )],
     "constants": [(
         "cell-0", "variables-bind",
@@ -1749,8 +1780,8 @@ def lazy_stream(c: Canvas) -> None:
         "Capture, alternative, guard, and class patterns each name a different way a value can match a case.",
     )],
     "literals": [(
-        "cell-0", "value-types",
-        "Each literal form constructs an object of a specific type; the source spelling and the value type stay in sync.",
+        "cell-0", "literal-forms",
+        "Each Python type has its own literal spellings; ints accept decimal, hex, and binary; strings accept either quote.",
     )],
     # Fourth coverage push: constraint-shaped examples
     "packages": [(