Skip to content

fix(mft): instrumented UTF-16 decoder (WI-4.1) + discard audit (WI-6.3)#347

Merged
githubrobbi merged 3 commits into
mainfrom
harden/bugs-rust-wont-catch-2
Jun 4, 2026
Merged

fix(mft): instrumented UTF-16 decoder (WI-4.1) + discard audit (WI-6.3)#347
githubrobbi merged 3 commits into
mainfrom
harden/bugs-rust-wont-catch-2

Conversation

@githubrobbi

@githubrobbi githubrobbi commented Jun 4, 2026

Copy link
Copy Markdown
Collaborator

Summary

Second batch of the bugs-rust-wont-catch hardening plan, building on the merged PR #346. Two WIs, both verified green (native + Windows-target clippy, 190 mft tests, --locked sanity).

WI-4.1 — single instrumented UTF-16 decoder (closes the audit's top correctness issue)

The tool's core job — finding files by name — silently corrupted a class of real NTFS names: every decode used String::from_utf16_lossy, replacing unpaired surrogates with U+FFFD with no signal, across 21 sites in 7 files.

  • New decode_name_u16(&[u16]) -> (String, count) in io/parser/unified.rs is the single instrumented decoder; decode_utf16le_into now returns the replacement count.
  • Converted all 21 NTFS-name sites + the USN-journal name (io/parser/{index,index_extension,fragment,fragment_extension}.rs, parse/{direct_index,direct_index_extension}.rs, usn/windows.rs); made io::parser pub(crate) so the sibling modules share one decoder.
  • Loss is now measured, not silent: a process-global relaxed atomic tally is snapshotted into the new MftStats::lossy_name_count at index-build and warn!'d when > 0. (Full elimination = the WI-4.4 RFC, already landed.)
  • platform/system.rs fs-TYPE-label decode (== "NTFS") is not a filename → AUDIT-OK(bytes).
  • Gate hardened to skip ////// comment lines (no false-flag on the decoder's docs).
  • Tests: lossless count=0; unpaired surrogate → 1 U+FFFD + tally increment; lone low surrogate → 1.

WI-6.3 — discard audit (Category 6 now fully closed)

Triaged all 30 .ok();/let _ = sites in prod: none are un-annotated behavior-affecting Result discards (they're Result→Option convert-and-use, infallible in-memory write!, doc examples, best-effort diag flushes, or intentional side-effect get_or_create already under a block #[expect]). grep confirms zero io-result .ok() discards in prod. Docs-only.

Honest status & follow-ups

  • Categories now fully closed: 1, 2, 3, 6, plus 4-correctness (silent loss eliminated/measured). With fix(security): close bugs-rust-wont-catch Categories 1/2/3 + partial 4/5/6/8 (13 WIs) #346, 16 of 20 WIs done.
  • The anti-pattern gate is still red (and deliberately NOT wired into the pipeline) on:
    • 4 from_utf16_lossy — Windows path/exe/pipe-name decodes → WI-4.2 (OsString) follow-up.
    • 9 from_utf8_lossy — per-line subprocess-output scans (system_status.rs, connect_sync_autostart.rs) of the same benign, fail-safe class WI-4.3 already AUDIT-OK'd elsewhere; the tightened gate surfaced more than the plan enumerated → a small WI-4.3 follow-up to mark them.
  • Remaining WIs (flagged, not attempted here): WI-4.2 (OsString — API-coupled), WI-5.2 (parser checked_*/.get() across ~17 indexing_slicing blocks — large hot-path change), WI-5.3 (fuzz, depends on 5.2), WI-7.1 + WI-8.1 (Windows-only, unverifiable on this macOS host). These need a dedicated pass.

No new deps; --locked clean. Healing log in LOG/ (gitignored).

githubrobbi and others added 3 commits June 4, 2026 08:55
…r (WI-4.1)

Every NTFS-name decode used String::from_utf16_lossy, silently replacing
unpaired surrogates with U+FFFD across 21 sites in 7 files. Introduce the
single instrumented decode_name_u16(&[u16]) -> (String, count) in
io/parser/unified.rs, convert all 21 name sites + USN to it, and make
io::parser pub(crate) so the parse/ + usn/ modules share the one decoder.

Loss is now MEASURED, not silent: decode_name_u16 bumps a process-global
relaxed atomic, snapshotted into the new MftStats::lossy_name_count at
index-build and warned once when > 0. (Eliminating loss entirely is the
WI-4.4 RFC.) The platform/system.rs fs-TYPE-label decode is not a filename
and is marked AUDIT-OK(bytes).

Also: anti_pattern_gate.sh now skips `//`/`///` comment lines so the
decoder's own doc comments aren't false-flagged.

Tests: lossless count=0; unpaired surrogate → 1 U+FFFD + global tally
increment; lone low surrogate → 1.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Triaged all 30 .ok()/let _ = sites in prod (tests excluded). None are
un-annotated behavior-affecting Result discards: they are Result→Option
convert-and-use, infallible in-memory write!/writeln!, doc examples,
best-effort diag flushes, or intentional side-effect get_or_create calls
(already covered by a block #[expect] + comment). Grep confirms zero
io-result .ok() discards in prod. The meaningful control writes were
fixed in WI-6.1. No code change warranted; triage recorded.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@githubrobbi githubrobbi enabled auto-merge (squash) June 4, 2026 16:11
@githubrobbi githubrobbi merged commit bb226e3 into main Jun 4, 2026
27 checks passed
@githubrobbi githubrobbi deleted the harden/bugs-rust-wont-catch-2 branch June 4, 2026 16:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant