feat: store rewrites preserve unparseable lines (byte-level); measurements record their pin (#16)#23
Open
kiki830621 wants to merge 3 commits into
Open
feat: store rewrites preserve unparseable lines (byte-level); measurements record their pin (#16)#23kiki830621 wants to merge 3 commits into
kiki830621 wants to merge 3 commits into
Conversation
…their pin (#16) Table rewrites (corpus upsert, wholesale model seeding) previously kept only the rows they could parse — a malformed line survived load with a warning but was silently deleted by the next rewrite, punching a data-loss hole through the 'corrupt rows degrade loudly, not fatally' contract. rewrite() now collects undecodable lines from the current file and appends them back verbatim, so they keep surfacing the load warning instead of vanishing. Measurements gain pin provenance (#15 verify DA's find): the models table is rewritten wholesale on every seed, so a pin bump silently re-associated historical numbers with the new snapshot. Each appended measurement now records hf_revision resolved from the table AS SEEDED for that run — a measure-time fact that survives later re-seeding. Optional + snake_case key: legacy rows decode nil, projection and routing untouched (audit-only column). TDD: 3 RED tests first (byte-identical survival through both rewrite paths + provenance round-trip with legacy-nil decode); 162/162 green. Spectra change store-rewrite-resilience-and-pin-provenance (benchmark-store MODIFIED ×2). Runner-path wiring gets live-benchmark evidence in sibling #18. Refs #16
…eded lookup, aligned migration key The preserve path (and loadRaw) move from String to byte-level: a String(contentsOf:) round-trip silently dropped non-UTF-8 corruption — exactly the case the 'never deleted because we could not parse it' contract most wants to survive — and loadRaw loaded such a table as silently EMPTY. Lines now split on 0x0A as Data, survive byte-identical, and an existing-but-unreadable file makes rewrite throw instead of overwriting blind. The pin lookup drops the hardcoded family="whisper" key: a testable seededRow(backend,size,quantization) helper returns the seeded row itself, so the measurement's PRIMARY KEY carries the row's true family (the DA showed the hardcode corrupts identity for any future non-whisper family) and the pin rides along. Source is the in-memory ModelGrid.rows seeded verbatim a few lines above (the as-seeded table without re-parsing the whole store, incl. measurement history). Legacy migration aligns its family derivation with the live path — the old 'tiny|tiny' key meant a re-benchmarked model NEVER superseded its migrated row in the projection. Tests: non-UTF-8 byte survival + loud load, re-seed-with-new-pin leaves the measurement untouched, helper unit coverage, realistic fixture ids. 165/165 green. Refs #16
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Refs #16
Summary
(1)rewrite/loadRaw 全面 byte-level:壞行(含非 UTF-8 損壞)經 corpus upsert / model re-seed byte-identical 倖存並持續觸發 load 警告;檔案存在但不可讀 → throw 不盲寫。(2)量測列記錄 pin provenance:
seededRow(backend,size,quantization)從本次 as-seeded rows 取回整列(true family 進 primary key +hf_revision);legacy migration 的 key 推導對齊(修「雙 key 永不 supersede」既有洞)。benchmark-store spec MODIFIED ×2。Verification
6-AI verify master report 見 issue #16:20 findings——3 組真缺陷(String-level 保留漏洞、family hardcode 汙染 primary key、migration key 分歧)全數當輪修復(byte 漏洞 RED 測試先行)。TDD 全程;165/165 綠。實戰 pin 寫入佐證列 #18 執行項。
Checklist
Related
🤖 Generated by /idd-all. Do NOT add a GitHub close trailer.