Skip to content

docs: refresh top-models telemetry figure and add uv generator#734

Merged
eric-tramel merged 7 commits into
mainfrom
ewt/top-models-figure
Jun 2, 2026
Merged

docs: refresh top-models telemetry figure and add uv generator#734
eric-tramel merged 7 commits into
mainfrom
ewt/top-models-figure

Conversation

@eric-tramel
Copy link
Copy Markdown
Contributor

What

Replaces the default donut chart in the README's Top models (YTD) section with a ranked input-vs-output token breakdown, styled to match the Data Designer devnote charts (near-black canvas, NVIDIA-green duotone, bold value labels, de-emphasized Other aggregate).

Updates all three tracked copies the README and Fern docs reference:

  • docs/images/top-models.png
  • fern/assets/images/top-models.png
  • fern/images/top-models.png

Refreshing it

Adds docs/scripts/generate_top_models_figure.py — a PEP 723 uv standalone script (matches the convention in docs/assets/recipes/). No repo dependency is added; matplotlib resolves in an ephemeral environment:

# Regenerate from the committed CSV (zero args)
uv run docs/scripts/generate_top_models_figure.py

# Refresh from a new telemetry export
uv run docs/scripts/generate_top_models_figure.py --csv ~/Downloads/new-export.csv

The source telemetry export is committed at docs/scripts/top-model-usage.csv, so the figure regenerates with zero args and renders identically on any machine (falls back to matplotlib's bundled font when Helvetica is absent — keeping the committed asset reproducible in CI/Linux).

Replace the default donut chart with a ranked input-vs-output token breakdown styled to match the Data Designer devnote charts (near-black canvas, NVIDIA-green duotone). Updates all three tracked copies the README and Fern docs reference.

Adds docs/scripts/generate_top_models_figure.py, a PEP 723 uv standalone script (run via 'uv run', no repo dependency added), plus the source telemetry CSV so the figure regenerates with zero args.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>
@eric-tramel eric-tramel requested a review from a team as a code owner June 2, 2026 18:52
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jun 2, 2026

MkDocs preview: https://dcb87148.dd-docs-preview.pages.dev

Fern preview: https://nvidia-preview-pr-734.docs.buildwithfern.com/nemo/datadesigner

Fern previews include the docs-website version archive with PR changes synced into latest. Notebook tutorials are rendered without execution outputs in previews.

@eric-tramel eric-tramel requested a review from johnnygreco June 2, 2026 18:55
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Jun 2, 2026

Greptile Summary

This PR refreshes the README's "Top models (YTD)" telemetry section with June 2026 data and replaces the donut chart with a ranked stacked-bar chart styled to match the Data Designer devnote aesthetic. It also adds a reproducible uv standalone script and committed CSV so the figure can be regenerated with zero args on any machine.

  • README.md: badge updated from "400+ Billion Tokens Generated" to "2.6T+ Tokens Processed", date range extended to June 1, 2026.
  • docs/scripts/generate_top_models_figure.py: new PEP 723 script that loads top-model-usage.csv, renders a dark-canvas stacked-bar figure via matplotlib (pinned to 3.9.4, Agg backend, DejaVu Sans for byte-reproducibility), and writes identical PNGs to docs/images/ and fern/images/.
  • docs/scripts/top-model-usage.csv: committed telemetry export (10 named models + Other aggregate) that drives the figure and yields a ~2.68 T total token count consistent with the badge.

Confidence Score: 5/5

Documentation-only change with a new standalone script and committed data file; no production code is touched.

All changes are confined to docs assets, a uv script, and a CSV. The script logic is correct — CSV parsing, matplotlib rendering, and file copy are all consistent with the data. Token totals in the CSV align with the updated badge. No application code or APIs are affected.

No files require special attention.

Important Files Changed

Filename Overview
docs/scripts/generate_top_models_figure.py New PEP 723 uv script; clean logic — BOM-safe CSV parsing, deterministic matplotlib config, correct stacked-bar rendering, safe mirror copy via shutil. No issues found.
docs/scripts/top-model-usage.csv New telemetry CSV with 10 named model rows + Other aggregate; column count, BOM, and comma-formatted numbers match what load_rows expects. Token totals (~2.68 T) are consistent with the updated README badge.
README.md Badge text updated to '2.6T+ Tokens Processed' and date range bumped to June 1, 2026 — both consistent with the new CSV data.
docs/images/top-models.png Binary PNG replaced with the new stacked-bar chart; rendered by the script from the committed CSV.
fern/images/top-models.png Fern mirror PNG updated in sync with docs/images/top-models.png.
fern/assets/images/top-models.png Binary updated; per previous review discussion, this copy is intentionally not a TARGETS entry in the script as no Fern content references this path.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[uv run generate_top_models_figure.py] --> B{--csv provided?}
    B -- No --> C[top-model-usage.csv default CSV]
    B -- Yes --> D[User-supplied CSV]
    C --> E[load_rows: parse + strip commas]
    D --> E
    E --> F[configure_matplotlib: Agg backend, DejaVu Sans]
    F --> G[render: primary docs/images/top-models.png]
    G --> H{mirrors?}
    H -- Yes --> I[shutil.copyfile: fern/images/top-models.png]
    H -- No --> J[Print wrote paths]
    I --> J
Loading

Reviews (7): Last reviewed commit: "Merge branch 'main' into ewt/top-models-..." | Re-trigger Greptile

@eric-tramel eric-tramel added the documentation Improvements or additions to documentation label Jun 2, 2026
eric-tramel and others added 2 commits June 2, 2026 14:59
Bumps the Top models (YTD) range and last-updated stamp to match the refreshed figure's data snapshot (year-to-date through 6/1/2026).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>
The telemetry snapshot totals ~2.68T tokens (input + output across all models); the badge previously read '400+ Billion Tokens Generated'. Rounds down to 2.6T+ to stay truthful with the '+'.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>
@johnnygreco
Copy link
Copy Markdown
Contributor

Thanks for putting this together, @eric-tramel!

Summary

This PR replaces the README/Fern top-models image with a ranked input-vs-output token chart and adds a CSV-backed standalone uv script for refreshing it. The implementation matches the stated intent; I had one non-blocking reproducibility note and one asset cleanup question.

Findings

Suggestions — Take it or leave it

docs/scripts/generate_top_models_figure.py:3 — Note: reproducibility may drift across environments

  • What: Running uv run docs/scripts/generate_top_models_figure.py in the PR worktree rewrote all three checked-in images. The committed copies are 2950 x 1849 with hash 1f302d0..., while the documented command produced byte-identical mirrors at 3047 x 1857 with hash 9c89573.... The script metadata allows any matplotlib>=3.9,<4, and select_font() depends on locally available fonts, so small output drift is possible across machines.
  • Why: I would not block on this, but it is worth being aware of: future maintainers may run the documented refresh command and get image diffs even when the CSV did not change. That can make it harder to tell whether an asset update reflects telemetry changes or just render-environment differences.
  • Suggestion: If we want this image to be stable across refreshes, consider pinning the matplotlib version used for this asset and choosing a fixed bundled or vendored font instead of selecting Helvetica/Arial opportunistically.

fern/assets/images/top-models.png — Is this extra Fern asset copy needed?

  • What: Could you confirm whether fern/assets/images/top-models.png is needed? I do not see any current Fern content referencing /assets/images/top-models.png; the README uses docs/images/top-models.png, and Fern’s documented mirror for /images/* references is fern/images/.
  • Why: Keeping an unreferenced third copy makes refreshes a little easier to get out of sync and adds binary churn without an obvious consumer.
  • Suggestion: I would drop fern/assets/images/top-models.png from this PR unless there is a planned Fern page that will reference it directly.

docs/scripts/generate_top_models_figure.py:185 — Derive x-axis ticks from the data

  • What: The generator accepts a replacement CSV, but the x-axis ticks are hard-coded to 0..700B.
  • Why: This works for the current export, but the next refresh could easily exceed or undershoot that range, leaving the chart with awkward or misleading grid labels unless the maintainer remembers to edit the script.
  • Suggestion: Consider deriving ticks from xmax, either with matplotlib.ticker.MaxNLocator plus the existing fmt() helper, or by rounding the max total up to a clean interval and building ticks from that.

What Looks Good

  • The chart itself is much easier to scan than the old donut-style asset, and the input/output split is a useful improvement.
  • Committing the source CSV alongside the generator is a nice move; it gives future updates a clear starting point.
  • The tracked PNG copies are synchronized in the PR, and the standalone PEP 723 script keeps this out of the repo dependency set.

Verdict

Ship it (with nits) — Nothing blocking from my side. I’d recommend dropping the unreferenced fern/assets/images/top-models.png copy and keeping the reproducibility note in mind for future refreshes.


This review was generated by an AI assistant.

Pin matplotlib==3.9.4 and force matplotlib's bundled DejaVu Sans (drop opportunistic Helvetica/Arial) so the checked-in PNG is byte-reproducible across machines and CI, not dependent on locally installed fonts.

Derive x-axis ticks from the data (MaxNLocator + the existing fmt() helper, which promotes B -> T) instead of hard-coding 0..700B, so future refreshes can't mislabel the axis.

Drop the unreferenced fern/assets/images/top-models.png; docs/images/ (README) and fern/images/ (Fern's /images/* mirror) are the only consumers.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>
Comment thread docs/scripts/generate_top_models_figure.py
@eric-tramel
Copy link
Copy Markdown
Contributor Author

Thanks @johnnygreco — all three addressed in 7fb3b130:

  1. Reproducibility — pinned matplotlib==3.9.4 and switched to matplotlib's bundled DejaVu Sans (no more opportunistic Helvetica/Arial). The checked-in PNGs are now byte-identical to a fresh uv run on any machine, so future refreshes only diff when the CSV actually changes.
  2. Stray Fern copy — dropped fern/assets/images/top-models.png; the generator now writes only docs/images/ (README) and fern/images/ (the /images/* mirror).
  3. Ticks — now derived from the data via MaxNLocator + the existing fmt() helper (auto-promotes B→T), so the axis won't mislabel as totals grow.

Appreciate the review!

eric-tramel and others added 2 commits June 2, 2026 15:34
The docstring still listed fern/assets/images/top-models.png after that stray copy was removed; drop it so the docstring matches the two-entry TARGETS.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Eric W. Tramel <eric.tramel@gmail.com>
@johnnygreco
Copy link
Copy Markdown
Contributor

Thanks for the quick follow-up, @eric-tramel. The intent looks addressed: the stray fern/assets/images/top-models.png file is gone, the generator targets are down to docs/images/ + fern/images/, the ticks are data-derived, and ruff is clean.

I’m holding approval for two small residual issues I hit while verifying:

  1. The module docstring still lists the deleted fern/assets/images/top-models.png path, even though TARGETS no longer writes it.
  2. The reproducibility check still dirtied the PNGs for me. Running:
uv run --python 3.12 docs/scripts/generate_top_models_figure.py

completed, but changed both remaining images (227929 -> 267240 bytes). Default uv run docs/scripts/generate_top_models_figure.py under Python 3.14 also spent multiple minutes source-building matplotlib==3.9.4, so the standalone refresh path may need either a tighter Python constraint or a matplotlib pin with wheels for the supported/default interpreters.

Could you remove the stale docstring path and either commit the PNGs produced by the current generator or adjust the render environment so a fresh run leaves the worktree clean? After that, I’m happy to approve.

@johnnygreco
Copy link
Copy Markdown
Contributor

Quick clarification after re-checking the latest head (413f0d0a):

  • The stale docstring path is fixed.
  • fern/assets/images/top-models.png is gone.
  • The generator now only targets docs/images/top-models.png and fern/images/top-models.png.
  • Ruff check/format passes.

The only thing I’m still holding approval on is reproducibility: a fresh generator run still changes the committed PNGs for me.

uv run --python 3.12 docs/scripts/generate_top_models_figure.py

This completes quickly now, but leaves both PNGs modified. The committed files hash to c8b68a...; the freshly generated files hash to b4ba90....

So from my side, the remaining ask is just: please commit the PNGs produced by the current generator, or otherwise adjust the script/render environment so this command leaves the worktree clean. Once that’s true, I’m good to approve.

@eric-tramel
Copy link
Copy Markdown
Contributor Author

So from my side, the remaining ask is just: please commit the PNGs produced by the current generator, or otherwise adjust the script/render environment so this command leaves the worktree clean. Once that’s true, I’m good to approve.

Just re-ran, no change locally for me 🤔

@eric-tramel eric-tramel merged commit 633e96d into main Jun 2, 2026
62 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants