Skip to content

feat: store rewrites preserve unparseable lines (byte-level); measurements record their pin (#16)#23

Open
kiki830621 wants to merge 3 commits into
mainfrom
idd/16-store-rewrite-resilience
Open

feat: store rewrites preserve unparseable lines (byte-level); measurements record their pin (#16)#23
kiki830621 wants to merge 3 commits into
mainfrom
idd/16-store-rewrite-resilience

Conversation

@kiki830621

Copy link
Copy Markdown
Contributor

Refs #16

Summary

(1)rewrite/loadRaw 全面 byte-level:壞行(含非 UTF-8 損壞)經 corpus upsert / model re-seed byte-identical 倖存並持續觸發 load 警告;檔案存在但不可讀 → throw 不盲寫。(2)量測列記錄 pin provenance:seededRow(backend,size,quantization) 從本次 as-seeded rows 取回整列(true family 進 primary key + hf_revision);legacy migration 的 key 推導對齊(修「雙 key 永不 supersede」既有洞)。benchmark-store spec MODIFIED ×2。

Verification

6-AI verify master report 見 issue #16:20 findings——3 組真缺陷(String-level 保留漏洞、family hardcode 汙染 primary key、migration key 分歧)全數當輪修復(byte 漏洞 RED 測試先行)。TDD 全程;165/165 綠。實戰 pin 寫入佐證列 #18 執行項。

Checklist

  • Diagnose ✓(Spectra / A_parallel_safe)
  • Spectra store-rewrite-resilience-and-pin-provenance(tasks 12/12 含 verify-fix 段)
  • Verify ✓(post-fix 0 blocking)
  • Verify-gated: ready to merge → after merge, run /idd-close manually

Related


🤖 Generated by /idd-all. Do NOT add a GitHub close trailer.

…their pin (#16)

Table rewrites (corpus upsert, wholesale model seeding) previously
kept only the rows they could parse — a malformed line survived
load with a warning but was silently deleted by the next rewrite,
punching a data-loss hole through the 'corrupt rows degrade loudly,
not fatally' contract. rewrite() now collects undecodable lines
from the current file and appends them back verbatim, so they keep
surfacing the load warning instead of vanishing.

Measurements gain pin provenance (#15 verify DA's find): the models
table is rewritten wholesale on every seed, so a pin bump silently
re-associated historical numbers with the new snapshot. Each
appended measurement now records hf_revision resolved from the
table AS SEEDED for that run — a measure-time fact that survives
later re-seeding. Optional + snake_case key: legacy rows decode nil,
projection and routing untouched (audit-only column).

TDD: 3 RED tests first (byte-identical survival through both
rewrite paths + provenance round-trip with legacy-nil decode);
162/162 green. Spectra change
store-rewrite-resilience-and-pin-provenance (benchmark-store
MODIFIED ×2). Runner-path wiring gets live-benchmark evidence in
sibling #18.

Refs #16
…eded lookup, aligned migration key

The preserve path (and loadRaw) move from String to byte-level:
a String(contentsOf:) round-trip silently dropped non-UTF-8
corruption — exactly the case the 'never deleted because we could
not parse it' contract most wants to survive — and loadRaw loaded
such a table as silently EMPTY. Lines now split on 0x0A as Data,
survive byte-identical, and an existing-but-unreadable file makes
rewrite throw instead of overwriting blind.

The pin lookup drops the hardcoded family="whisper" key: a
testable seededRow(backend,size,quantization) helper returns the
seeded row itself, so the measurement's PRIMARY KEY carries the
row's true family (the DA showed the hardcode corrupts identity
for any future non-whisper family) and the pin rides along. Source
is the in-memory ModelGrid.rows seeded verbatim a few lines above
(the as-seeded table without re-parsing the whole store, incl.
measurement history). Legacy migration aligns its family
derivation with the live path — the old 'tiny|tiny' key meant a
re-benchmarked model NEVER superseded its migrated row in the
projection.

Tests: non-UTF-8 byte survival + loud load, re-seed-with-new-pin
leaves the measurement untouched, helper unit coverage, realistic
fixture ids. 165/165 green.

Refs #16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant