Skip to content

add small models rock blog#36

Open
psschwei wants to merge 4 commits intogenerative-computing:mainfrom
psschwei:blog-small-models
Open

add small models rock blog#36
psschwei wants to merge 4 commits intogenerative-computing:mainfrom
psschwei:blog-small-models

Conversation

@psschwei
Copy link
Copy Markdown
Member

@psschwei psschwei commented May 5, 2026

No description provided.

psschwei added 2 commits May 4, 2026 22:58
Signed-off-by: Paul S. Schweigert <paul@paulschweigert.com>
Signed-off-by: Paul S. Schweigert <paul@paulschweigert.com>
@psschwei psschwei requested review from a team and ajbozarth as code owners May 5, 2026 03:00
@psschwei psschwei requested a review from serjikibm May 5, 2026 03:00
Signed-off-by: Paul S. Schweigert <paul@paulschweigert.com>
Copy link
Copy Markdown
Contributor

@ajbozarth ajbozarth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first pass with technical review, will follow with a review of the blog content later

@@ -0,0 +1,522 @@
---
title: "Making Small Models Rock with Mellea"
date: "2026-06-05"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you mean

Suggested change
date: "2026-06-05"
date: "2026-05-06"

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, I'm targeting early June for release

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, ok, that's a ways out, is there a motivation for holding off publish for over a month given the near readiness of the blog?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bandwidth

Comment thread content/blogs/small-models-rock.md Outdated
tags: ["mellea", "granite", "rag", "intrinsics", "small-models", "docling"]
---

![Making Small Models Rock with Mellea](/images/small-models-rock/main.png)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This image doesn't render well in dark mode:

Image

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this sounds like something that should be fixed at the site level (?)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what could be fixed, the issue is that the image assumes a white (or light) background in it's transparency layer

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a --prose-img-bg in src/app/globals.css ?

Comment thread next-env.d.ts Outdated
Comment thread package-lock.json
Signed-off-by: Paul S. Schweigert <paul@paulschweigert.com>
Copy link
Copy Markdown
Contributor

@ajbozarth ajbozarth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's a handful of items Claude found. I also opened #39 to address the image issues.

Comment on lines +168 to +174
def _bom_entry_is_well_formed(entry: BOMEntry) -> bool:
"""Quantity is either an integer or the string 'allowance'."""
try:
int(entry.quantity)
return True
except ValueError:
return entry.quantity.lower() == "allowance"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two bugs here. simple_validate passes the model's raw output string to the validator, but _bom_entry_is_well_formed expects a BOMEntry. Also, line 188 calls _bom_entries_are_well_formed (plural) which is undefined. Rewrite the validator to accept and parse the string output:

Suggested change
def _bom_entry_is_well_formed(entry: BOMEntry) -> bool:
"""Quantity is either an integer or the string 'allowance'."""
try:
int(entry.quantity)
return True
except ValueError:
return entry.quantity.lower() == "allowance"
def _bom_is_valid(output: str) -> bool:
bom = BOM.model_validate_json(output)
return all(
e.quantity.lower() == "allowance" or str(e.quantity).isdigit()
for e in bom.items
)

requirements=[
req(
"Quantity should only contain an integer or Allowance",
validation_fn=simple_validate(_bom_entries_are_well_formed),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update to match the renamed function above.

Suggested change
validation_fn=simple_validate(_bom_entries_are_well_formed),
validation_fn=simple_validate(_bom_is_valid),


```python
m.instruct(
"Reformat this table to have four columns: item, quantity, type, and notes.",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BOMEntry defines this field as category, not type. Mismatched column names cause validation failures.

Suggested change
"Reformat this table to have four columns: item, quantity, type, and notes.",
"Reformat this table to have four columns: item, quantity, category, and notes.",

if is_material_list(m, table_markdown=table.to_markdown()) == "yes":
bom_routines.append(m.ainstruct(..., format=BOM))

bom_thunks: list[ModelOutputThunk] = [await r for r in bom_routines]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This awaits each coroutine in series. The "wall-clock scales with the slowest table" claim on line 224 only holds with asyncio.gather.

Suggested change
bom_thunks: list[ModelOutputThunk] = [await r for r in bom_routines]
bom_thunks: list[ModelOutputThunk] = await asyncio.gather(*bom_routines)

Comment on lines +305 to +307
question, and context relevance drops to around 0.5 (a pricing document
about construction, but not the right one) while answerability correctly
collapses to "unanswerable." Frontier model logits don't give you this.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check_context_relevance returns a categorical string ("relevant", "irrelevant", or "partially relevant"), not a float.

Suggested change
question, and context relevance drops to around 0.5 (a pricing document
about construction, but not the right one) while answerability correctly
collapses to "unanswerable." Frontier model logits don't give you this.
question, and context relevance comes back `"partially relevant"` (a pricing document
about construction, but not the right one) while answerability correctly
collapses to `"unanswerable"`. Frontier model logits don't give you this.

citations = find_citations(
response=price_response.value,
documents=[doors_doc],
context=ctx,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ctx is undefined in this snippet; every other snippet in this post passes ChatContext() directly.

Suggested change
context=ctx,
context=ChatContext(),

tags: ["granite", "rag", "intrinsics", "small-models", "docling"]
---

![Making Small Models Rock with Mellea](/images/small-models-rock/main.png)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

main.png has a transparent background that renders poorly in dark mode. Once #39 is merged, rehype-raw support will be available and this can be fixed inline:

Suggested change
![Making Small Models Rock with Mellea](/images/small-models-rock/main.png)
<img src="/images/small-models-rock/main.png" alt="Making Small Models Rock with Mellea" style="background-color: white;" />

Step back from the construction example. What just happened is the general
shape of the trade.

![A small model, harnessed](/images/small-models-rock/harnessed.png)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The max-width: 60% rule is currently in globals.css as a filename-specific selector (.prose img[src$="harnessed.png"]) which doesn't belong in a global stylesheet. Once #39 is merged, move it inline:

Suggested change
![A small model, harnessed](/images/small-models-rock/harnessed.png)
<img src="/images/small-models-rock/harnessed.png" alt="A small model, harnessed" style="max-width: 60%;" />

and holds those pieces together with ordinary code rather than an
ever-growing English prompt.

Three things fall out of that approach. The first is predictable cost:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mechanical enumeration labels ("The first is... The second is... The third is...") read as LLM-generated prose. Consider leading with the claims directly: "Cost is predictable: local inference has a fixed, knowable cost per run... Data stays local: your documents never leave the machine... The inference backend is yours to choose: Mellea talks to Ollama, vLLM..."

return prices
```

Two things matter about this loop. First, `verdict == "answerable"` is a
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Two things matter about this loop. First, ... Second, ..." — the meta-preamble before the claims is a common LLM tell. Consider opening directly with the first claim: "verdict == \"answerable\" is a gate: items the intrinsic can't confidently answer get total_price=None..."

from mellea.stdlib.components.intrinsic.rag import find_citations

citations = find_citations(
response=price_response.value,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BLOCKERprice_response is never defined in this post (also ctx on L411 — already flagged elsewhere). The pricing loop stores results in unit and total, not price_response. Copy-pasting this block verbatim raises NameError: name 'price_response' is not defined (verified by running it).

If the intent was to cite the final price response, wiring to total gives a runnable example:

Suggested change
response=price_response.value,
citations = find_citations(
response=total.value,
documents=[doors_doc],
context=ChatContext(),
backend=m_hf.backend,
)

requirement with an explicit validator:

```python
m.instruct(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING — Steps 1–3 all use m.instruct(...) / m.ainstruct(...), but the first m = mellea.start_session(...) is in Step 4 (L432). A reader copy-pasting in order hits NameError: name 'm' is not defined on this block — confirmed by running it.

Consider adding a setup block just before Step 1 so the tutorial is runnable top-to-bottom:

import mellea
from mellea.backends.model_ids import IBM_GRANITE_4_MICRO_3B

m = mellea.start_session(backend_name="ollama", model_id=IBM_GRANITE_4_MICRO_3B)

(Or equivalent — the session variable is load-bearing from here onward.)

step is to get a clean, typed `BOM` object.

`RichDocument` wraps [docling](https://github.com/DS4SD/docling) and
exposes tables as markdown, which small models handle much better than raw
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING — A fresh pip install mellea raises ImportError: RichDocument requires extra dependencies. Please install them with: pip install "mellea[docling]" on this import. LocalHFBackend in Step 3 similarly needs mellea[hf]. The blog never mentions installing either — it's the first copy-paste failure for a reader.

One install line before Step 1 covers it:

pip install 'mellea[docling,hf]'

from mellea.stdlib.components.docs.richdocument import RichDocument

construction_plans = RichDocument.from_document_file(
"construction_docs/construction_plans.pdf"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING — The tutorial references construction_docs/construction_plans.pdf (and product_catalogs/*.{pdf,docx,xlsx} in Step 2) but doesn't tell readers where to get them. Running the block raises FileNotFoundError. A pointer to the linked tutorial notebook's asset directory would let readers actually follow along:

Sample input files are in the tutorial repo under construction_docs/ and product_catalogs/.

| Small open-weight (<3B) | Doesn't understand the task. Returns a generic "cost breakdown" with no prices. |
| Open-weight reasoning (~20B) | Finds categories and subtotals. No pie chart. Numbers often wrong. |
| Gemini Fast | Mostly reasonable. No chart. Some prices off. |
| GPT-5.4 Pro, extended thinking | Gets most items right. Cites sources. No chart on first shot. ~$1/run. |
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING — "GPT-5.4 Pro" appears here and on L15, L93. I can't find this as an actual OpenAI product name. For a cost-vs-capability post anchored on a frontier-model baseline, the specific model matters; an invented name undermines the table. If the intent was "whatever the top-tier frontier reasoning model is at publish time," a generic phrasing avoids ageing:

Suggested change
| GPT-5.4 Pro, extended thinking | Gets most items right. Cites sources. No chart on first shot. ~$1/run. |
| Frontier reasoning (GPT-5 Pro / o-series / Claude Opus) | Gets most items right. Cites sources. No chart on first shot. ~$1/run. |

The construction case isn't a one-off. The same three-pattern approach
generalizes. On agent benchmarks the Mellea team has run (a DB2 database
agent and a compliance agent), rewriting large prompt-based systems as
Mellea programs moves a Llama 70B setup from ~80% task completion to ~90%,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WARNING"Mellea programs moves a Llama 70B setup from ~80% task completion to ~90%, and lets a Granite 8B model match or beat a Llama 70B baseline" — these are strong quantitative claims, and the argument rests on them. A link to the DB2-agent / compliance-agent results (or a benchmark table in the mellea repo) would turn this from marketing into evidence. If those numbers aren't public yet, consider softening to "in internal evaluations" and committing to publish.

windows_doc = Document(text=rd_windows.to_markdown())

rd_lumber = RichDocument.from_document_file("product_catalogs/cone_mountain_lumber_catalog.xlsx")
lumber_doc = Document(text=rd_lumber.to_markdown())
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SUGGESTIONlumber_doc is loaded here but never keyed into the pricing catalog ({"windows": windows_doc, "doors": doors_doc} on L350). The comment at L358 explains lumber is skipped for Colab T4 runtime, but the load is still wasted work and confuses the shape of the pipeline. Either drop the three lumber-loading lines, or include lumber in the dict and let check_answerability return "unanswerable" for items the catalog can't price — both make the "skipped lumber" story explicit in the code rather than buried in a comment.

"line-item material list with prices. At the top include the /tmp/chart.png image.",
grounding_context=report_grounding_context,
)
open("/tmp/report.html", "w").write(report.value)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT — Tutorial code gets copied verbatim; open(...).write(...) teaches a bad habit. A context manager is the one-line fix:

Suggested change
open("/tmp/report.html", "w").write(report.value)
with open("/tmp/report.html", "w") as f:
f.write(report.value)

@planetf1
Copy link
Copy Markdown
Collaborator

planetf1 commented May 7, 2026

A broader editorial observation, on top of the inline notes above.

Long prose stretches with no visual breaks. "The Bet", the ALoRA explanation, and the "Why intrinsics are cheap to compose" sections each run 3–5 paragraphs of uninterrupted running text. A reader skimming from the link-sharing site of your choice has nothing to hook onto — no bullet list, no pull-quote, no callout. Consider breaking the longer argumentative sections with either (a) a short bulleted summary of the three differentiators, (b) a margin callout or blockquote for the one-line claim that matters, or (c) an extra H3 that lets the reader resume after an interrupt. The Dijkstra passage is strong enough that it could earn a standalone pull-quote.

Long code blocks without internal narration. Quick scan of the 12 fenced Python blocks:

Block Lines Inline comments
BOM validator (L155) 19 0
Reformat instruct (L181) 11 0
Async extract_bom (L206) 15 0
Catalog load (L232) 11 0
Pricing loop (L332) 57 2
find_citations (L405) 8 0
Report generation (L425) 31 0

The 57-line pricing loop is the main offender — a reader has to hold the whole thing in their head to reach the "verdict == "answerable" is the gate" claim the surrounding prose is building to. A few options that would help without gutting the post:

  1. Highlight the key line in a preceding sentence, e.g., "The one line that matters is if verdict == "answerable": — everything above it is ceremony to get that gate into place," then show the block.
  2. Split the pricing loop into a short get_catalog_for(entry) helper + the actual priced extraction, so each block is ~15 lines.
  3. Add a handful of inline # comments on the non-obvious lines (the .get(entry.category) returning None for lumber, the continue after append, the if catalog: fallthrough producing unit_price=None).

Inline comments in tutorial code are an anti-pattern in production but are the right call in a blog post — readers copy the block into a notebook and the comments are their only in-line teacher.

Neither of these is a blocker, just things that would turn this from "good if you read carefully" into "easy to follow on first scroll."

@planetf1
Copy link
Copy Markdown
Collaborator

planetf1 commented May 7, 2026

One more pass, purely editorial — positioning and discoverability asks, not correctness. All are optional polish.

No fast hook for skimmers. The post is ~2,800 words / 14-min read, and the first hands-on code appears at L123. A reader linking in from HN or a social share needs a 30-second on-ramp. Consider a short callout between the excerpt and "The Bet" along these lines:

What this post does: walks through a construction-cost-estimation pipeline that one-shot prompting needs GPT-5-tier models for, rebuilt on a 3B Granite model running locally — same accuracy, no API keys, ~$0/run. If you're paying frontier-model prices for structured extraction or matching, the same pattern applies.

"Harness" is load-bearing but undefined. The word carries most of the argument (L19, L25, L28, L30, L383, L508) but never gets a definition. A reader who hasn't already absorbed Mellea's framing has to infer it. One sentence near first use — e.g., "By 'harness' we mean the software scaffolding around the model call: decomposition, validation, retries, tool dispatch — the part that isn't the forward pass." — makes the rest of the post land harder.

Pain points skew finance-y; the dev concerns are missing. The three differentiators (cost, data sovereignty, vendor-agnostic) hit procurement and regulated-industries buyers well. The pains that devs themselves feel are under-represented:

  • Latency / rate limits — a frontier API can rate-limit you mid-backtest; local inference doesn't
  • Observability in production — when a prompt-pipeline fails at 3am, debugging is about which step went wrong; Mellea's decomposition surfaces the failure point
  • Fine-tuning vs. harness trade-off — the obvious alternative to "better harness + small model" is "fine-tune a small model"; why harness first?

These are one-paragraph each. The "Trade-offs" section at the bottom is a natural home if you don't want to expand the opening argument.

Cost comparison is one-sided. The $1/run vs "no per-token billing" framing is accurate but omits the local side of the ledger: GPU/laptop amortisation, electricity, and the engineering time to build the decomposed pipeline. The "Trade-offs" section admits "decomposition takes engineering effort" but doesn't put a number on it. Even a rough "a senior engineer can port a prompt pipeline in a day or two" would neutralise the "you're hiding the real cost" objection that readers will raise in the comments either way.

Terminology: "intrinsics" vs "adapters". The post uses both terms interchangeably — ten uses of intrinsic(s) (L64, L246, L255, L259, L301, L312, L329, L393, L406, L521) and six uses of adapter(s) (L255, L316, L319, L322, L396, L398), including the mixed phrasing on L319 "Granite intrinsics ship as ALoRA adapters". My understanding is the Mellea/Granite framing has shifted toward adapters as the external-facing term (with intrinsic still in the module path for now). If that's right, it's worth a sweep to standardise — probably adapters everywhere in prose, with one parenthetical acknowledgement that the Python import path uses intrinsic. Also drop intrinsics from the tag list; local-llm would be a sensible addition there:

tags: ["granite", "rag", "adapters", "small-models", "docling", "local-llm"]

(I'll defer to you on whether adapters or intrinsics is the preferred term — the ask is consistency, not a specific choice.)


None of the above is a blocker — the core argument is strong .

Copy link
Copy Markdown
Collaborator

@planetf1 planetf1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as per comments (need evaluation - but your interpretation about what should be changed is fine)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants