fix(mft): instrumented UTF-16 decoder (WI-4.1) + discard audit (WI-6.3)#347
Merged
Conversation
…r (WI-4.1) Every NTFS-name decode used String::from_utf16_lossy, silently replacing unpaired surrogates with U+FFFD across 21 sites in 7 files. Introduce the single instrumented decode_name_u16(&[u16]) -> (String, count) in io/parser/unified.rs, convert all 21 name sites + USN to it, and make io::parser pub(crate) so the parse/ + usn/ modules share the one decoder. Loss is now MEASURED, not silent: decode_name_u16 bumps a process-global relaxed atomic, snapshotted into the new MftStats::lossy_name_count at index-build and warned once when > 0. (Eliminating loss entirely is the WI-4.4 RFC.) The platform/system.rs fs-TYPE-label decode is not a filename and is marked AUDIT-OK(bytes). Also: anti_pattern_gate.sh now skips `//`/`///` comment lines so the decoder's own doc comments aren't false-flagged. Tests: lossless count=0; unpaired surrogate → 1 U+FFFD + global tally increment; lone low surrogate → 1. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Triaged all 30 .ok()/let _ = sites in prod (tests excluded). None are un-annotated behavior-affecting Result discards: they are Result→Option convert-and-use, infallible in-memory write!/writeln!, doc examples, best-effort diag flushes, or intentional side-effect get_or_create calls (already covered by a block #[expect] + comment). Grep confirms zero io-result .ok() discards in prod. The meaningful control writes were fixed in WI-6.1. No code change warranted; triage recorded. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Second batch of the
bugs-rust-wont-catchhardening plan, building on the merged PR #346. Two WIs, both verified green (native + Windows-target clippy, 190 mft tests,--lockedsanity).WI-4.1 — single instrumented UTF-16 decoder (closes the audit's top correctness issue)
The tool's core job — finding files by name — silently corrupted a class of real NTFS names: every decode used
String::from_utf16_lossy, replacing unpaired surrogates with U+FFFD with no signal, across 21 sites in 7 files.decode_name_u16(&[u16]) -> (String, count)inio/parser/unified.rsis the single instrumented decoder;decode_utf16le_intonow returns the replacement count.io/parser/{index,index_extension,fragment,fragment_extension}.rs,parse/{direct_index,direct_index_extension}.rs,usn/windows.rs); madeio::parserpub(crate)so the sibling modules share one decoder.MftStats::lossy_name_countat index-build andwarn!'d when > 0. (Full elimination = the WI-4.4 RFC, already landed.)platform/system.rsfs-TYPE-label decode (== "NTFS") is not a filename →AUDIT-OK(bytes).//////comment lines (no false-flag on the decoder's docs).WI-6.3 — discard audit (Category 6 now fully closed)
Triaged all 30
.ok();/let _ =sites in prod: none are un-annotated behavior-affecting Result discards (they're Result→Option convert-and-use, infallible in-memorywrite!, doc examples, best-effort diag flushes, or intentional side-effectget_or_createalready under a block#[expect]).grepconfirms zero io-result.ok()discards in prod. Docs-only.Honest status & follow-ups
from_utf16_lossy— Windows path/exe/pipe-name decodes → WI-4.2 (OsString) follow-up.from_utf8_lossy— per-line subprocess-output scans (system_status.rs,connect_sync_autostart.rs) of the same benign, fail-safe class WI-4.3 already AUDIT-OK'd elsewhere; the tightened gate surfaced more than the plan enumerated → a small WI-4.3 follow-up to mark them.checked_*/.get()across ~17indexing_slicingblocks — large hot-path change), WI-5.3 (fuzz, depends on 5.2), WI-7.1 + WI-8.1 (Windows-only, unverifiable on this macOS host). These need a dedicated pass.No new deps;
--lockedclean. Healing log inLOG/(gitignored).