From a139dc11eddbef7b67d5cd78d56e55e340d38aff Mon Sep 17 00:00:00 2001 From: Claude Date: Wed, 1 Jul 2026 07:56:29 +0000 Subject: [PATCH 01/20] docs(devforge): investigation and proposed fix for issue #479 What we're solving: response header values are decoded as ISO-8859-1 (b as char), which garbles UTF-8 header values into mojibake (#479). That decode was introduced intentionally by #434 to stop non-ASCII header bytes from crashing Node / emptying Python (#430), so the two positions conflict. How: proposed fix is to decode UTF-8 first and fall back to the existing byte-preserving ISO-8859-1 decode only when the bytes are not valid UTF-8, in one shared core-crate helper used by both bindings. This fixes #479 while keeping #434's latin-1 case and #430's non-crash guarantee. Alternatives considered: the issue's suggested from_utf8_lossy was rejected because it regresses #434 (turns the bare 0xE4 test byte into U+FFFD). This commit contains only the devforge investigation artifacts under .devforge/; no source has been changed and the run is paused at the design gate awaiting human approval before any source edit. Co-Authored-By: Claude Opus 4.8 Claude-Session: https://claude.ai/code/session_01VrUiE5CzcJ9TiRTqvqb1JE --- .devforge/.gitignore | 5 +++ .devforge/1-triage.md | 33 ++++++++++++++++++ .devforge/2-design.md | 59 ++++++++++++++++++++++++++++++++ .devforge/_panel.json | 13 +++++++ .devforge/_progress.md | 24 +++++++++++++ .devforge/_request_fact_check.md | 20 +++++++++++ .devforge/_state.json | 1 + .devforge/_user_request.md | 1 + .devforge/_verified_task.md | 22 ++++++++++++ .devforge/config.json | 22 ++++++++++++ 10 files changed, 200 insertions(+) create mode 100644 .devforge/.gitignore create mode 100644 .devforge/1-triage.md create mode 100644 .devforge/2-design.md create mode 100644 .devforge/_panel.json create mode 100644 .devforge/_progress.md create mode 100644 .devforge/_request_fact_check.md create mode 100644 .devforge/_state.json create mode 100644 .devforge/_user_request.md create mode 100644 .devforge/_verified_task.md create mode 100644 .devforge/config.json diff --git a/.devforge/.gitignore b/.devforge/.gitignore new file mode 100644 index 00000000..39d36f34 --- /dev/null +++ b/.devforge/.gitignore @@ -0,0 +1,5 @@ +# Regenerable per-iteration transients +iter-*/diff.patch +iter-*/test-results.txt +oracle_header_decode +oracle_header_decode.rs diff --git a/.devforge/1-triage.md b/.devforge/1-triage.md new file mode 100644 index 00000000..623a9b9d --- /dev/null +++ b/.devforge/1-triage.md @@ -0,0 +1,33 @@ +# Triage — issue #479 (header decoding) + +## Problem +Response header values are decoded byte-by-byte with `b as char` (ISO-8859-1) in +`impit-node/src/response.rs:96` and `impit-python/src/response.rs:542`. This was introduced +intentionally by PR #434 to fix #430 (non-ASCII header bytes crashed Node / returned empty in +Python). But ISO-8859-1 decoding garbles the common case of UTF-8 header values (e.g. +`Content-Disposition: filename="naïve.pdf"`) into mojibake (`ï` → `ï`). + +The two positions genuinely conflict: a header byte sequence can't be decoded as both latin-1 +and UTF-8. #434 wants byte-preservation (RFC 9110 obs-text is latin-1); #479 wants correct +UTF-8. The maintainer explicitly left it open: "We might reinvestigate the best way forward." + +## Decision: PROCEED +Both code claims VALID (code present at both sites; Python line is now 542, not 544 — minor +staleness only). Real, unresolved defect the maintainer wants revisited. + +## Complexity: medium +Small code change (~2 call sites + tests in 2 bindings) but it alters the public response-header +contract across both language bindings → blast-radius override lifts it to at least medium. + +## Review-only? no — there is a fix to build. + +## Approach sketch (high level) +Decode as UTF-8 when the bytes are valid UTF-8, otherwise fall back to the existing +byte-preserving latin-1 decode. This fixes #479's UTF-8 case while keeping #434's test (which +sends bare `0xE4`, invalid UTF-8 → latin-1 `ä`) green. Never emits replacement chars, so #430's +crash/empty regression stays fixed. Apply symmetrically in Node and Python; add a UTF-8 test. + +## Open questions +- Should a shared helper live in the core `impit` crate vs. duplicated per binding? +- Do we want to also expose raw header bytes for signature/HMAC callers (issue mentions this)? + Likely out of scope for the core fix; note as follow-up. diff --git a/.devforge/2-design.md b/.devforge/2-design.md new file mode 100644 index 00000000..5e47c677 --- /dev/null +++ b/.devforge/2-design.md @@ -0,0 +1,59 @@ +# Design — fix #479 header decoding without regressing #434 + +## What we're solving +Response header values are decoded as ISO-8859-1 (`b as char`) in both bindings. That was a +deliberate choice in PR #434 to stop non-ASCII header bytes from crashing Node / emptying +Python (issue #430). But ISO-8859-1 mangles the far more common case — headers whose bytes are +UTF-8 (e.g. `Content-Disposition: filename="naïve.pdf"`) — into mojibake (`ï` → `ï`), breaking +filename extraction and any byte-exact re-encoding. We need both cases correct at once. + +## How +Decode with **UTF-8 first, ISO-8859-1 fallback**: + +- If the header bytes are valid UTF-8 → decode as UTF-8 (fixes #479's mojibake). +- Otherwise → fall back to the existing byte-preserving `b as char` latin-1 decode (keeps #434; + e.g. bare `0xE4` → `ä`). + +This never emits `U+FFFD` replacement characters, so #430's non-crash / non-empty guarantee is +preserved and the latin-1 fallback stays byte-reversible. It is strictly better than the +issue's own suggestion (`from_utf8_lossy`), which would turn #434's bare `0xE4` into `U+FFFD` +and reintroduce corruption for exactly the case #434 fixed. + +Rust expresses this cleanly and allocation-efficiently: +`String::from_utf8(bytes.to_vec()).unwrap_or_else(|e| e.into_bytes().iter().map(|&b| b as char).collect())` +— the common UTF-8 path is a single move with no per-byte work; the fallback reuses the same +owned buffer. + +## Alternatives + the call +- **`from_utf8_lossy` (issue's suggestion):** rejected — lossy and regresses #434 (replacement + chars, irreversible) as shown above. +- **Latin-1 always (status quo):** rejected — this is the bug. +- **UTF-8 always / error on invalid:** rejected — re-breaks #430 (invalid-UTF-8 latin-1 headers). +- **Expose raw header bytes API for HMAC/signature callers:** deferred — useful but a separate, + larger public-API addition; note as follow-up, out of scope here. +- **Chosen: UTF-8-first with latin-1 fallback**, placed in one shared helper. + +## Major changes (key areas, not exhaustive) +- Add a shared `decode_header_value(&[u8]) -> String` helper in the core crate's + `response_parsing` module, re-exported through `impit::utils`, so both bindings share one + tested implementation instead of duplicating the closure. Cover it with core unit tests + (ASCII, UTF-8, invalid-UTF-8 latin-1, empty). +- Node (`impit-node/src/response.rs`): replace the inline `b as char` map with a call to the + shared helper. +- Python (`impit-python/src/response.rs`): same replacement. +- Tests: add a UTF-8 header regression test to the Node suite (mirrors the existing latin-1 + test in `basics.test.ts` / `mock.server.ts`); the existing latin-1 test is the guard that the + fallback still works. + +## Risks / open questions +- **Ambiguous bytes:** a byte sequence that is *coincidentally* valid UTF-8 but was meant as + latin-1 will now decode as UTF-8. This is unavoidable without out-of-band charset info and + UTF-8 is the correct modern default; the tradeoff is intended. +- **Environment/oracle limitation:** the full Rust workspace cannot compile here — the pinned + git dep `github.com/apify/h2` is blocked (403) by org egress and its cache is empty. The + devforge oracle therefore runs a standalone `rustc --test` copy of the helper to prove the + algorithm; full binding compilation/integration must be verified in CI. Reviewers should treat + binding-compile as unverified-locally. +- **Python test gap:** Python has no existing header-decode test; adding one requires the + maturin build (also unavailable here). Node coverage + shared-helper unit tests carry the + correctness signal; a Python test is a nice-to-have follow-up if the build is available in CI. diff --git a/.devforge/_panel.json b/.devforge/_panel.json new file mode 100644 index 00000000..b5206af1 --- /dev/null +++ b/.devforge/_panel.json @@ -0,0 +1,13 @@ +{ + "tier": "medium", + "reason": "Small, localized code change but it alters the public response-header decoding contract across both language bindings (blast-radius override lifts small→medium). Extra scrutiny warranted because the change sits between two conflicting prior decisions (#434 vs #479) and the full workspace can't be compiled locally.", + "reviewers": [ + { "use": "staff-review", "model": "sonnet" } + ], + "final_reviewers": [ + { "use": "thermonuclear", "model": "sonnet" }, + { "use": "code-review", "model": "sonnet" } + ], + "inner_iterations": 3, + "final_review_rounds": 2 +} diff --git a/.devforge/_progress.md b/.devforge/_progress.md new file mode 100644 index 00000000..599dcda1 --- /dev/null +++ b/.devforge/_progress.md @@ -0,0 +1,24 @@ +# Progress + +- Repo: apify/impit, branch `claude/issue-479-fixes-r2554a`, base `master`, HEAD 9d2204f. +- Triage: PROCEED, complexity medium (blast-radius override on public response contract). +- Verify: both code claims VALID (Python line stale 544→542). Issue's suggested `from_utf8_lossy` + is incomplete (regresses #434). Chosen fix: UTF-8-first, latin-1 fallback. +- Explore: shared helper home = `impit/src/response_parsing/mod.rs`, re-exported via + `impit::utils`. Existing regression guard: `impit-node/test/basics.test.ts:569` + + `mock.server.ts:105-118` (sends bare 0xE4, expects `ä`). + +## Oracle +- Commands: standalone `rustc --edition 2021 --test .devforge/oracle_header_decode.rs && run`. +- Reason: full `cargo build/test` BLOCKED — pinned git dep github.com/apify/h2 → 403 via org + egress proxy, cargo git cache empty. Org policy denial; not routed around. Standalone rustc + test proves the pure decode algorithm; binding compile/integration deferred to CI. + +## Resolved registry (from config.json + registry.base.json) +- verify_request: brainstorming/opus | architect: writing-plans/opus | implementer: feature-dev/opus +- reviewers: staff-review/sonnet +- final_reviewers: thermonuclear/sonnet, code-review/sonnet +- limits: inner_iterations 3, final_review_rounds 2 | plan_mode_gate true + +## State +- Phase: design-gate (awaiting human approval before any source edit). diff --git a/.devforge/_request_fact_check.md b/.devforge/_request_fact_check.md new file mode 100644 index 00000000..c83a62c4 --- /dev/null +++ b/.devforge/_request_fact_check.md @@ -0,0 +1,20 @@ +# Request fact-check — claim ledger (verified against HEAD 9d2204f) + +| # | Claim | Verdict | Evidence | +|---|-------|---------|----------| +| 1 | Node decodes headers with `b as char` at `impit-node/src/response.rs:96` | VALID | Line 96: `v.as_bytes().iter().map(|&b| b as char).collect(),` | +| 2 | Python decodes headers with `b as char` at `impit-python/src/response.rs:544` | STALE(→ `impit-python/src/response.rs:542`) | Same code, line shifted to 542: `...collect::()` | +| 3 | This interprets each byte as Latin-1 / ISO-8859-1 | VALID | `b as char` on a `u8` maps 0x00–0xFF → U+0000–U+00FF (Latin-1 code points) | +| 4 | UTF-8 header values are garbled into mojibake | VALID | For `ï` (UTF-8 `0xC3 0xAF`), latin-1 decode yields two chars `ï`; re-encoding yields different bytes | +| 5 | Behavior was intentional (PR #434, fixes #430) | VALID | PR #434 merged 2026-04-13; maintainer comment on #479: "This was intentional... We might reinvestigate the best way forward." | +| 6 | Suggested `from_utf8_lossy` is the right fix | LIKELY-FIXED-BUT-INCOMPLETE | `from_utf8_lossy` fixes UTF-8 but REGRESSES #434's latin-1 case: bare `0xE4` (invalid UTF-8) → `U+FFFD` replacement char, not `ä`, and is lossy/irreversible. A try-UTF-8-then-latin-1 fallback is strictly better. | + +## Existing locked-in behavior (regression guard) +- `impit-node/test/basics.test.ts:569` sends header byte `0xE4` (mock.server.ts:111, invalid + UTF-8) and asserts it decodes to `ä`. Any fix MUST keep this green. +- No equivalent Python test exists yet. + +## Verdict +PROCEED. Core defect is real and open. The issue's own suggested fix (`from_utf8_lossy`) is +incomplete — it would reintroduce corruption for the exact case #434 fixed. Correct resolution +is a UTF-8-first decode with latin-1 fallback, applied to both bindings. diff --git a/.devforge/_state.json b/.devforge/_state.json new file mode 100644 index 00000000..010221aa --- /dev/null +++ b/.devforge/_state.json @@ -0,0 +1 @@ +{"phase":"design-gate","iteration":0,"head_sha":"9d2204fec0b3ac1acf3c42b3e0c86463e09e8f49"} diff --git a/.devforge/_user_request.md b/.devforge/_user_request.md new file mode 100644 index 00000000..9699f065 --- /dev/null +++ b/.devforge/_user_request.md @@ -0,0 +1 @@ +Investigate and suggest fixes for https://github.com/apify/impit/issues/479 , take into account https://github.com/apify/impit/pull/434 diff --git a/.devforge/_verified_task.md b/.devforge/_verified_task.md new file mode 100644 index 00000000..3bd83405 --- /dev/null +++ b/.devforge/_verified_task.md @@ -0,0 +1,22 @@ +# Verified task — issue #479 + +## What must be true +Response header values must decode correctly for the common modern case (UTF-8, e.g. +`Content-Disposition: filename="naïve.pdf"`) WITHOUT regressing the case PR #434 fixed +(non-ASCII latin-1 bytes such as `0xE4` = `ä`, which previously crashed Node / emptied Python). + +## Corrected references (verified at HEAD 9d2204f) +- Node: `impit-node/src/response.rs:96` +- Python: `impit-python/src/response.rs:542` (issue said 544 — stale) +- Existing regression guard: `impit-node/test/basics.test.ts:569` + `impit-node/test/mock.server.ts:105-118` +- Shared helper candidate home: `impit/src/response_parsing/mod.rs`, re-exported via `impit::utils` + +## Acceptance +1. A header whose bytes are valid UTF-8 decodes as UTF-8 (fixes #479 mojibake). +2. A header with invalid-UTF-8 latin-1 bytes still decodes byte-for-byte as latin-1 (keeps #434). +3. No `U+FFFD` replacement chars introduced (keeps #430 non-crash / non-empty). +4. Applied symmetrically in Node and Python bindings. +5. Regression test present for the UTF-8 case (at minimum in Node, which has the existing suite). + +## Explicitly out of scope (note as follow-up) +Exposing raw header bytes for HMAC/signature callers — larger API addition, separate issue. diff --git a/.devforge/config.json b/.devforge/config.json new file mode 100644 index 00000000..9715731c --- /dev/null +++ b/.devforge/config.json @@ -0,0 +1,22 @@ +{ + "stages": { + "verify_request": { "use": "brainstorming", "model": "opus" }, + "architect": { "use": "writing-plans", "model": "opus" }, + "implementer": { "use": "feature-dev", "model": "opus" }, + "reviewers": [ + { "use": "staff-review", "model": "sonnet" } + ], + "final_reviewers": [ + { "use": "thermonuclear", "model": "sonnet" }, + { "use": "code-review", "model": "sonnet" } + ] + }, + "oracle": { + "commands": [ + "rustc --edition 2021 --test /home/user/impit/.devforge/oracle_header_decode.rs -o /home/user/impit/.devforge/oracle_header_decode && /home/user/impit/.devforge/oracle_header_decode" + ], + "note": "Full-workspace `cargo test` cannot run here: the pinned git dependency github.com/apify/h2 returns 403 through the org egress proxy and its cargo cache is empty. The oracle instead compiles+runs a standalone rustc --test copy of the pure decode_header_value helper, verifying the exact algorithm (UTF-8-first with latin-1 fallback) independent of h2. Binding compile/integration must be confirmed in CI where github egress is permitted." + }, + "limits": { "inner_iterations": 3, "final_review_rounds": 2 }, + "plan_mode_gate": true +} From cbab1e388d3767c1677f67ee951952160864175f Mon Sep 17 00:00:00 2001 From: Claude Date: Wed, 1 Jul 2026 08:15:43 +0000 Subject: [PATCH 02/20] fix: decode response header values as UTF-8 with ISO-8859-1 fallback MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit What we're solving: response header values were decoded byte-for-byte as ISO-8859-1 (b as char), which garbled UTF-8 header values such as Content-Disposition: attachment; filename="naïve.pdf" into mojibake (#479). That decode was introduced deliberately in #434 to stop non-ASCII header bytes from crashing the Node bindings / emptying the Python ones (#430), so a naive switch to UTF-8 would regress those. How: added a shared decode_header_value helper in the core crate (re-exported via impit::utils) that decodes the bytes as UTF-8 when they are valid UTF-8 and otherwise falls back to the byte-preserving ISO-8859-1 decode. Both the Node and Python bindings now call it. This fixes the common UTF-8 case, keeps #434's genuine ISO-8859-1 values intact, and never emits U+FFFD replacement characters, so #430's non-crash / non-empty guarantee holds. Alternatives considered: the issue's suggested String::from_utf8_lossy was rejected because it turns invalid-UTF-8 latin-1 bytes (e.g. a lone 0xE4) into replacement characters, reintroducing the corruption #434 fixed. Exposing raw header bytes for signature/HMAC callers is left as a separate follow-up. Note: final review is still in progress; the full workspace build and JS/Py suites must run in CI as the pinned github.com/apify/h2 git dependency is not reachable from this environment. Co-Authored-By: Claude Opus 4.8 Claude-Session: https://claude.ai/code/session_01VrUiE5CzcJ9TiRTqvqb1JE --- .devforge/_design.approved | 1 + .devforge/_state.json | 2 +- .devforge/iter-1/claim.md | 30 +++++++++++++ .devforge/iter-1/oracle-run.txt | 10 +++++ impit-node/src/response.rs | 4 +- impit-node/test/basics.test.ts | 5 +++ impit-node/test/mock.server.ts | 20 +++++++++ impit-python/src/response.rs | 7 +++- impit/src/lib.rs | 1 + impit/src/response_parsing/mod.rs | 70 +++++++++++++++++++++++++++++++ 10 files changed, 145 insertions(+), 5 deletions(-) create mode 100644 .devforge/_design.approved create mode 100644 .devforge/iter-1/claim.md create mode 100644 .devforge/iter-1/oracle-run.txt diff --git a/.devforge/_design.approved b/.devforge/_design.approved new file mode 100644 index 00000000..277f6765 --- /dev/null +++ b/.devforge/_design.approved @@ -0,0 +1 @@ +approved by josef.prochazka@apify.com in chat on 2026-07-01 (design gate) diff --git a/.devforge/_state.json b/.devforge/_state.json index 010221aa..dc9e7682 100644 --- a/.devforge/_state.json +++ b/.devforge/_state.json @@ -1 +1 @@ -{"phase":"design-gate","iteration":0,"head_sha":"9d2204fec0b3ac1acf3c42b3e0c86463e09e8f49"} +{"phase":"inner-loop","iteration":1,"head_sha":"9d2204fec0b3ac1acf3c42b3e0c86463e09e8f49","panel":{"tier":"medium","reviewers":[{"use":"staff-review","model":"sonnet"}],"final_reviewers":[{"use":"thermonuclear","model":"sonnet"},{"use":"code-review","model":"sonnet"}],"inner_iterations":3,"final_review_rounds":2}} diff --git a/.devforge/iter-1/claim.md b/.devforge/iter-1/claim.md new file mode 100644 index 00000000..c79abc12 --- /dev/null +++ b/.devforge/iter-1/claim.md @@ -0,0 +1,30 @@ +# Iteration 1 — implementer claim + +## Done +- Added `decode_header_value(&[u8]) -> String` to `impit/src/response_parsing/mod.rs`: + UTF-8-first (`String::from_utf8`) with a byte-preserving ISO-8859-1 fallback + (`b as char`) on invalid UTF-8. Documented with rustdoc + doctest. +- Re-exported it via `impit::utils` in `impit/src/lib.rs`. +- Replaced the inline `b as char` header map in both bindings with a call to the shared helper: + - `impit-node/src/response.rs` (import + call site). + - `impit-python/src/response.rs` (import + call site). +- Added core unit tests (ASCII, empty, UTF-8/#479, invalid-UTF-8 latin-1/#434, and a + no-replacement-char + byte-roundtrip guard for #430). +- Added a Node UTF-8 regression test: new `/utf8-header` route in `mock.server.ts` sending real + UTF-8 bytes (`ï` = 0xC3 0xAF) and an assertion in `basics.test.ts` that it decodes to + `attachment; filename="naïve.pdf"`. The existing latin-1 test is the fallback guard. + +## Oracle +- Green. `rustc --test` on a standalone exact copy of the helper: 5/5 tests pass + (see `iter-1/test-results.txt`). This proves the algorithm for all four use cases. + +## Skipped / not done — with reason +- **Full `cargo build`/`cargo test` and the Node/Python test suites: NOT run.** The workspace + pins git dep `github.com/apify/h2`, which returns 403 through the org egress proxy (cache + empty). This is an environment/policy limit, not a code issue. Binding compilation and the JS + test I added must be verified in CI where github egress is allowed. The core-crate helper is + simple, self-contained, and validated by the standalone oracle. +- **Raw-header-bytes API (for HMAC/signature callers):** intentionally out of scope per + `2-design.md`; noted as a follow-up. +- Did not touch `impit/src/fingerprint/mod.rs:47` `... as char` — unrelated random-string + generation, not header decoding. diff --git a/.devforge/iter-1/oracle-run.txt b/.devforge/iter-1/oracle-run.txt new file mode 100644 index 00000000..91714a1c --- /dev/null +++ b/.devforge/iter-1/oracle-run.txt @@ -0,0 +1,10 @@ + +running 5 tests +test tests::empty_is_empty ... ok +test tests::ascii_is_unchanged ... ok +test tests::iso_8859_1_fallback_never_produces_replacement_char ... ok +test tests::utf8_is_decoded_as_utf8 ... ok +test tests::invalid_utf8_falls_back_to_iso_8859_1 ... ok + +test result: ok. 5 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s + diff --git a/impit-node/src/response.rs b/impit-node/src/response.rs index e1a0ba0d..f1b291b1 100644 --- a/impit-node/src/response.rs +++ b/impit-node/src/response.rs @@ -1,6 +1,6 @@ #![allow(clippy::await_holding_refcell_ref, deprecated)] use crate::abortable_stream::AbortableStream; -use impit::utils::{decode, ContentType}; +use impit::utils::{decode, decode_header_value, ContentType}; use napi::bindgen_prelude::JsObjectValue; use napi::{ bindgen_prelude::{ @@ -93,7 +93,7 @@ impl<'env> ImpitResponse { for (k, v) in response.headers().iter() { headers_vec.push(( k.as_str().to_string(), - v.as_bytes().iter().map(|&b| b as char).collect(), + decode_header_value(v.as_bytes()), )); } let headers = Headers(headers_vec); diff --git a/impit-node/test/basics.test.ts b/impit-node/test/basics.test.ts index 7f7df808..17b51247 100644 --- a/impit-node/test/basics.test.ts +++ b/impit-node/test/basics.test.ts @@ -571,6 +571,11 @@ describe.each([ t.expect(response.headers.get('x-non-ascii')).toBe(routes.nonAsciiHeader.headerValue); }); + test('UTF-8 header values are decoded as UTF-8', async (t) => { + const response = await impit.fetch(new URL(routes.utf8Header.path, "http://127.0.0.1:3001").href); + t.expect(response.headers.get('x-utf8')).toBe(routes.utf8Header.headerValue); + }); + test('.json() method works', async (t) => { const response = await impit.fetch(getHttpBinUrl('/json')); const json = await response.json(); diff --git a/impit-node/test/mock.server.ts b/impit-node/test/mock.server.ts index 3c2ddf6f..1ebba4bb 100644 --- a/impit-node/test/mock.server.ts +++ b/impit-node/test/mock.server.ts @@ -24,6 +24,10 @@ export const routes = { path: '/non-ascii-header', headerValue: 'Dienstag, 31. März 2026', }, + utf8Header: { + path: '/utf8-header', + headerValue: 'attachment; filename="naïve.pdf"', + }, } function parseMultipart(body: Buffer, boundary: string): Record { @@ -117,6 +121,22 @@ export async function runServer(port: number): Promise { socket.end(); }); + app.get(routes.utf8Header.path, (req, res) => { + const socket = res.socket!; + socket.write('HTTP/1.1 200 OK\r\n'); + socket.write('Content-Type: text/plain\r\n'); + // Header value carrying UTF-8 bytes (the ï is 0xC3 0xAF). + socket.write(Buffer.concat([ + Buffer.from('X-Utf8: '), + Buffer.from(routes.utf8Header.headerValue, 'utf-8'), + Buffer.from('\r\n'), + ])); + socket.write('Content-Length: 2\r\n'); + socket.write('\r\n'); + socket.write('ok'); + socket.end(); + }); + app.get('/socket', (req, res) => { const socket = req.socket; const clientAddress = socket.remoteAddress; diff --git a/impit-python/src/response.rs b/impit-python/src/response.rs index a5cb5a15..5d83d3ef 100644 --- a/impit-python/src/response.rs +++ b/impit-python/src/response.rs @@ -5,7 +5,10 @@ use tokio::sync::Mutex as AsyncMutex; use bytes::Bytes; use encoding::label::encoding_from_whatwg_label; use futures::{Stream, StreamExt}; -use impit::{errors::ImpitError, utils::ContentType}; +use impit::{ + errors::ImpitError, + utils::{decode_header_value, ContentType}, +}; use pyo3::prelude::*; use reqwest::{Response, StatusCode, Version}; use std::pin::Pin; @@ -539,7 +542,7 @@ impl ImpitPyResponse { let headers = HashMap::from_iter(val.headers().iter().map(|(k, v)| { ( k.as_str().to_string(), - v.as_bytes().iter().map(|&b| b as char).collect::(), + decode_header_value(v.as_bytes()), ) })); diff --git a/impit/src/lib.rs b/impit/src/lib.rs index 0af8cae7..463d42e4 100644 --- a/impit/src/lib.rs +++ b/impit/src/lib.rs @@ -78,6 +78,7 @@ pub mod fingerprint; /// Various utility functions and types. pub mod utils { pub use crate::response_parsing::decode; + pub use crate::response_parsing::decode_header_value; pub use crate::response_parsing::determine_encoding; pub use crate::response_parsing::ContentType; pub use encoding::all as encodings; diff --git a/impit/src/response_parsing/mod.rs b/impit/src/response_parsing/mod.rs index 9874e1de..f9980a9e 100644 --- a/impit/src/response_parsing/mod.rs +++ b/impit/src/response_parsing/mod.rs @@ -133,6 +133,34 @@ pub fn determine_encoding(bytes: &[u8]) -> Option { None } +/// Decodes an HTTP header value into a [`String`]. +/// +/// Header values arrive as raw bytes with no charset declaration. Per RFC 9110 §5.5 they are +/// nominally ISO-8859-1 (the `obs-text` range), but in practice modern servers routinely send +/// UTF-8 (for example `Content-Disposition: attachment; filename="naïve.pdf"`). +/// +/// This function decodes the bytes as UTF-8 when they form valid UTF-8, and otherwise falls back +/// to a byte-for-byte ISO-8859-1 decode (each byte `0x00..=0xFF` maps to the code point +/// `U+0000..=U+00FF`). This fixes the common UTF-8 case without corrupting genuine ISO-8859-1 +/// values, never fails, and never emits `U+FFFD` replacement characters — so no header value can +/// crash a caller or come back empty. +/// +/// ### Example +/// +/// ```rust +/// use impit::utils::decode_header_value; +/// +/// // Valid UTF-8 is decoded as UTF-8. +/// assert_eq!(decode_header_value(&[b'n', b'a', 0xC3, 0xAF, b've']), "naïve"); +/// +/// // A lone 0xE4 is not valid UTF-8, so it falls back to ISO-8859-1 ('ä'). +/// assert_eq!(decode_header_value(&[b'M', 0xE4, b'r', b'z']), "März"); +/// ``` +pub fn decode_header_value(bytes: &[u8]) -> String { + String::from_utf8(bytes.to_vec()) + .unwrap_or_else(|e| e.into_bytes().iter().map(|&b| b as char).collect()) +} + /// A struct that represents the contents of the `Content-Type` header. /// /// The struct is used to extract the charset from the `Content-Type` header and convert it to an [`encoding::EncodingRef`]. @@ -173,3 +201,45 @@ impl From for Option { encoding::label::encoding_from_whatwg_label(val.charset.as_str()) } } + +#[cfg(test)] +mod tests { + use super::decode_header_value; + + #[test] + fn ascii_is_unchanged() { + assert_eq!(decode_header_value(b"application/json"), "application/json"); + } + + #[test] + fn empty_is_empty() { + assert_eq!(decode_header_value(b""), ""); + } + + #[test] + fn utf8_is_decoded_as_utf8() { + // "naïve.pdf" — the ï is UTF-8 bytes 0xC3 0xAF (issue #479). + let bytes = "attachment; filename=\"naïve.pdf\"".as_bytes(); + assert_eq!( + decode_header_value(bytes), + "attachment; filename=\"naïve.pdf\"" + ); + } + + #[test] + fn invalid_utf8_falls_back_to_iso_8859_1() { + // Lone 0xE4 ('ä' in ISO-8859-1) is not valid UTF-8 (PR #434 / issue #430). + let bytes = [b'D', b'i', b'e', b'n', b's', b't', b'a', b'g', b',', b' ', b'3', b'1', b'.', b' ', b'M', 0xE4, b'r', b'z', b' ', b'2', b'0', b'2', b'6']; + assert_eq!(decode_header_value(&bytes), "Dienstag, 31. März 2026"); + } + + #[test] + fn iso_8859_1_fallback_never_produces_replacement_char() { + // Every non-UTF-8 byte maps to exactly one char, so the result round-trips back to bytes. + let bytes = [0xE4, 0xF6, 0xFC, 0xFF]; + let decoded = decode_header_value(&bytes); + assert!(!decoded.contains('\u{FFFD}')); + let roundtrip: Vec = decoded.chars().map(|c| c as u8).collect(); + assert_eq!(roundtrip, bytes); + } +} From 7c52dafb0ecaa68b2e3ada0bda2d9f64d54b19c1 Mon Sep 17 00:00:00 2001 From: Claude Date: Wed, 1 Jul 2026 08:19:57 +0000 Subject: [PATCH 03/20] fix: correct header-decode doctest and formatting Follow-up to the header-decode fix: the rustdoc example used an invalid byte literal (b've') that would fail cargo test --doc, and a test array literal plus the reformatted binding call sites were not run through rustfmt (a required CI job). Corrected the doctest to spell the bytes out individually and ran rustfmt across the touched files. Also strengthened the local verification to run rustfmt --check and rustdoc --test alongside the unit tests, since the doctest error was invisible to a plain rustc --test run. Co-Authored-By: Claude Opus 4.8 Claude-Session: https://claude.ai/code/session_01VrUiE5CzcJ9TiRTqvqb1JE --- .devforge/_state.json | 2 +- .devforge/config.json | 4 +- .devforge/iter-1/review-staff-review.md | 76 +++++++++++++++++++++++++ .devforge/iter-2/claim.md | 25 ++++++++ impit-node/src/response.rs | 5 +- impit-python/src/response.rs | 11 ++-- impit/src/response_parsing/mod.rs | 9 ++- 7 files changed, 117 insertions(+), 15 deletions(-) create mode 100644 .devforge/iter-1/review-staff-review.md create mode 100644 .devforge/iter-2/claim.md diff --git a/.devforge/_state.json b/.devforge/_state.json index dc9e7682..b06c1bcb 100644 --- a/.devforge/_state.json +++ b/.devforge/_state.json @@ -1 +1 @@ -{"phase":"inner-loop","iteration":1,"head_sha":"9d2204fec0b3ac1acf3c42b3e0c86463e09e8f49","panel":{"tier":"medium","reviewers":[{"use":"staff-review","model":"sonnet"}],"final_reviewers":[{"use":"thermonuclear","model":"sonnet"},{"use":"code-review","model":"sonnet"}],"inner_iterations":3,"final_review_rounds":2}} +{"phase":"inner-loop","iteration":2,"head_sha":"9d2204fec0b3ac1acf3c42b3e0c86463e09e8f49","panel":{"tier":"medium","reviewers":[{"use":"staff-review","model":"sonnet"}],"final_reviewers":[{"use":"thermonuclear","model":"sonnet"},{"use":"code-review","model":"sonnet"}],"inner_iterations":3,"final_review_rounds":2}} diff --git a/.devforge/config.json b/.devforge/config.json index 9715731c..35073bfc 100644 --- a/.devforge/config.json +++ b/.devforge/config.json @@ -13,7 +13,9 @@ }, "oracle": { "commands": [ - "rustc --edition 2021 --test /home/user/impit/.devforge/oracle_header_decode.rs -o /home/user/impit/.devforge/oracle_header_decode && /home/user/impit/.devforge/oracle_header_decode" + "rustfmt --edition 2021 --check /home/user/impit/impit/src/response_parsing/mod.rs /home/user/impit/impit/src/lib.rs /home/user/impit/impit-node/src/response.rs /home/user/impit/impit-python/src/response.rs", + "rustc --edition 2021 --test /home/user/impit/.devforge/oracle_header_decode.rs -o /home/user/impit/.devforge/oracle_header_decode && /home/user/impit/.devforge/oracle_header_decode", + "rustdoc --edition 2021 --test /home/user/impit/.devforge/oracle_header_decode.rs" ], "note": "Full-workspace `cargo test` cannot run here: the pinned git dependency github.com/apify/h2 returns 403 through the org egress proxy and its cargo cache is empty. The oracle instead compiles+runs a standalone rustc --test copy of the pure decode_header_value helper, verifying the exact algorithm (UTF-8-first with latin-1 fallback) independent of h2. Binding compile/integration must be confirmed in CI where github egress is permitted." }, diff --git a/.devforge/iter-1/review-staff-review.md b/.devforge/iter-1/review-staff-review.md new file mode 100644 index 00000000..904c0e75 --- /dev/null +++ b/.devforge/iter-1/review-staff-review.md @@ -0,0 +1,76 @@ +VERDICT: FAIL + +## blocker + +1. **Broken doctest — will fail `cargo test -p impit` in CI.** + `impit/src/response_parsing/mod.rs:154`: + ``` + /// assert_eq!(decode_header_value(&[b'n', b'a', 0xC3, 0xAF, b've']), "naïve"); + ``` + `b've'` is not valid Rust syntax: `b'...'` is a *byte literal* and must contain exactly one + ASCII byte (e.g. `b'v'`), not the two-character sequence `'ve'`. This is a compile error, not + a runtime failure — rustdoc rejects it outright. + + Verified independently: extracted the exact doctest body (using only `String::from_utf8` / + `into_bytes` — no dependency on the blocked `apify/h2` git patch) into a standalone file and + ran `rustdoc --test`: + ``` + error: if you meant to write a byte string literal, use double quotes + 6 - assert_eq!(decode_header_value(&[b'n', b'a', 0xC3, 0xAF, b've']), "naïve"); + 6 + assert_eq!(decode_header_value(&[b'n', b'a', 0xC3, 0xAF, b"ve"]), "naïve"); + test result: FAILED. 0 passed; 1 failed + ``` + The doctest block is plain ` ```rust ` (not `no_run`/`ignore`/`compile_fail`), so it is + collected and compiled by `cargo test --doc` (part of the plain `cargo test -p impit` the + `test` job in `.github/workflows/format.yaml` runs). This is a real CI-breaking bug, distinct + from the disclosed "workspace won't build here" limitation — the syntax error is detectable + with a bare `rustdoc --test` on the snippet alone and has nothing to do with the blocked `h2` + dependency. The task's oracle (`iter-1/test-results.txt`) only ran the `#[cfg(test)] mod tests` + unit tests via a standalone `rustc --test`, which does not execute doctests, so this bug slipped + through undetected. + + Fix: change `b've'` to `b"ve"` and adjust the closing type (e.g. build the array with a byte + string / `..*b"ve"` or just spell out `b'v', b'e'` as separate elements) so the example + actually compiles. + +2. **rustfmt violation — will fail the `fmt` job in `.github/workflows/format.yaml`.** + `impit/src/response_parsing/mod.rs:184` (test `invalid_utf8_falls_back_to_iso_8859_1`): + ``` + let bytes = [b'D', b'i', b'e', b'n', b's', b't', b'a', b'g', b',', b' ', b'3', b'1', b'.', b' ', b'M', 0xE4, b'r', b'z', b' ', b'2', b'0', b'2', b'6']; + ``` + This line exceeds rustfmt's line-width limit and is not wrapped. Verified by running + `rustfmt --check` (installed locally, rustfmt 1.8.0) against the file as shipped in the diff: + it reports a diff at exactly this line, reformatting the array onto multiple lines. The repo's + `format.yaml` workflow runs `actions-rust-lang/rustfmt@v1` on every PR — this diff has not been + run through `cargo fmt` and will fail that check as-is. + + Fix: run `cargo fmt` (or manually wrap the array literal) before landing. + +## Notes (not separate findings, context for the two blockers above) + +- The core algorithm in `decode_header_value` (`impit/src/response_parsing/mod.rs:159-162`) is + correct and matches the design exactly: `String::from_utf8` succeeds and returns UTF-8-decoded + text whenever the whole byte slice is valid UTF-8 (fixes #479), and on failure + `e.into_bytes()` yields the **entire original** byte vector (verified experimentally — not + just the invalid tail), which is then mapped 1:1 byte→codepoint, reproducing the exact `#434` + latin-1 fallback (e.g. lone `0xE4` → `ä`) with no `U+FFFD` ever introduced (#430). Confirmed + with an independent standalone `rustc` build exercising mixed valid/invalid byte sequences, + multi-byte lead-without-continuation sequences, and empty input — all behave as designed. +- Both binding call sites are correctly updated: `impit-node/src/response.rs:96` and + `impit-python/src/response.rs:545` both now call `decode_header_value(v.as_bytes())`, and the + imports (`impit-node/src/response.rs:3`, `impit-python/src/response.rs:8-11`) correctly pull + the newly re-exported `impit::utils::decode_header_value` (re-export added at + `impit/src/lib.rs:81`, mirroring the existing `decode`/`determine_encoding`/`ContentType` + re-export pattern from the private `response_parsing` module — consistent and correct). No + other `b as char` header-decoding call sites were missed (grepped the whole tree; the only + other `as char` usage is an unrelated random-string generator in `fingerprint/mod.rs:47`). +- The Node test addition (`impit-node/test/basics.test.ts:574-577`, + `impit-node/test/mock.server.ts:27-30,124-138`) correctly exercises the intended bytes: it + writes the UTF-8 bytes for `naïve.pdf` (`0xC3 0xAF` for `ï`) directly onto the raw socket, + mirroring the existing raw-socket pattern used by the `#434` regression test for `0xE4`, so it + actually validates the wire-level decode path rather than something already normalized by a + higher-level HTTP client on the server side. Good test; adequate to cover acceptance criterion + 1 and 5 from `_verified_task.md`. +- The two blockers above are cheap, mechanical fixes (change one byte literal; run `cargo fmt`) + and do not implicate the core design or algorithm, but per review discipline any verified + finding — including things this trivial — means the diff cannot pass as-is. diff --git a/.devforge/iter-2/claim.md b/.devforge/iter-2/claim.md new file mode 100644 index 00000000..bd08f86c --- /dev/null +++ b/.devforge/iter-2/claim.md @@ -0,0 +1,25 @@ +# Iteration 2 — implementer claim + +## Addressed both iter-1 review findings (blockers) +1. **Broken doctest (`mod.rs` byte literal `b've'`)** — FIXED. Changed to separate elements + `b'v', b'e'`. Verified: the doctest now compiles and passes under `rustdoc --test` (see + `iter-2/test-results.txt`). +2. **rustfmt violation (long array literal)** — FIXED. Ran `rustfmt` on all touched files. + +## Additional fix surfaced by the strengthened oracle +- Collapsing the header-map closures to call `decode_header_value` changed line lengths, so + `rustfmt` rewrapped the Node push (now one line) and the Python `HashMap::from_iter` map. + Applied. `rustfmt --check` is now clean across all four touched Rust files. + +## Oracle (strengthened this iteration to close the gap that let #1/#2 through) +Three commands, all green: +- `rustfmt --check` on core + both binding files → CLEAN. +- `rustc --test` unit tests → 5/5 pass. +- `rustdoc --test` doctest → 1/1 pass (this is what would have caught the byte-literal typo; + `rustc --test` alone does not run doctests). + +## Unchanged from iter-1 (still true) +- Algorithm: UTF-8-first, byte-preserving ISO-8859-1 fallback; both binding call sites use the + shared helper; Node UTF-8 regression test added. +- Full `cargo`/napi/maturin build still cannot run here (github.com/apify/h2 egress 403); + binding compile + JS/Py suites must run in CI. No code reason they would fail. diff --git a/impit-node/src/response.rs b/impit-node/src/response.rs index f1b291b1..6556e04a 100644 --- a/impit-node/src/response.rs +++ b/impit-node/src/response.rs @@ -91,10 +91,7 @@ impl<'env> ImpitResponse { .to_string(); let mut headers_vec: Vec<(String, String)> = Vec::new(); for (k, v) in response.headers().iter() { - headers_vec.push(( - k.as_str().to_string(), - decode_header_value(v.as_bytes()), - )); + headers_vec.push((k.as_str().to_string(), decode_header_value(v.as_bytes()))); } let headers = Headers(headers_vec); let ok = response.status().is_success(); diff --git a/impit-python/src/response.rs b/impit-python/src/response.rs index 5d83d3ef..f13265f2 100644 --- a/impit-python/src/response.rs +++ b/impit-python/src/response.rs @@ -539,12 +539,11 @@ impl ImpitPyResponse { _ => "Unknown".to_string(), }; let is_redirect = val.status().is_redirection(); - let headers = HashMap::from_iter(val.headers().iter().map(|(k, v)| { - ( - k.as_str().to_string(), - decode_header_value(v.as_bytes()), - ) - })); + let headers = HashMap::from_iter( + val.headers() + .iter() + .map(|(k, v)| (k.as_str().to_string(), decode_header_value(v.as_bytes()))), + ); let content_type_charset = headers .get("content-type") diff --git a/impit/src/response_parsing/mod.rs b/impit/src/response_parsing/mod.rs index f9980a9e..24e32c02 100644 --- a/impit/src/response_parsing/mod.rs +++ b/impit/src/response_parsing/mod.rs @@ -150,8 +150,8 @@ pub fn determine_encoding(bytes: &[u8]) -> Option { /// ```rust /// use impit::utils::decode_header_value; /// -/// // Valid UTF-8 is decoded as UTF-8. -/// assert_eq!(decode_header_value(&[b'n', b'a', 0xC3, 0xAF, b've']), "naïve"); +/// // Valid UTF-8 is decoded as UTF-8 (the ï is the two UTF-8 bytes 0xC3 0xAF). +/// assert_eq!(decode_header_value(&[b'n', b'a', 0xC3, 0xAF, b'v', b'e']), "naïve"); /// /// // A lone 0xE4 is not valid UTF-8, so it falls back to ISO-8859-1 ('ä'). /// assert_eq!(decode_header_value(&[b'M', 0xE4, b'r', b'z']), "März"); @@ -229,7 +229,10 @@ mod tests { #[test] fn invalid_utf8_falls_back_to_iso_8859_1() { // Lone 0xE4 ('ä' in ISO-8859-1) is not valid UTF-8 (PR #434 / issue #430). - let bytes = [b'D', b'i', b'e', b'n', b's', b't', b'a', b'g', b',', b' ', b'3', b'1', b'.', b' ', b'M', 0xE4, b'r', b'z', b' ', b'2', b'0', b'2', b'6']; + let bytes = [ + b'D', b'i', b'e', b'n', b's', b't', b'a', b'g', b',', b' ', b'3', b'1', b'.', b' ', + b'M', 0xE4, b'r', b'z', b' ', b'2', b'0', b'2', b'6', + ]; assert_eq!(decode_header_value(&bytes), "Dienstag, 31. März 2026"); } From 52331300e2d5b6655b503a44201e9da01966bbd4 Mon Sep 17 00:00:00 2001 From: Claude Date: Wed, 1 Jul 2026 08:23:03 +0000 Subject: [PATCH 04/20] chore(devforge): record iter-2 reviewer pass and final-review phase Investigation/review evidence only; no source changes. Co-Authored-By: Claude Opus 4.8 Claude-Session: https://claude.ai/code/session_01VrUiE5CzcJ9TiRTqvqb1JE --- .devforge/_state.json | 2 +- .devforge/iter-2/review-staff-review.md | 88 +++++++++++++++++++++++++ 2 files changed, 89 insertions(+), 1 deletion(-) create mode 100644 .devforge/iter-2/review-staff-review.md diff --git a/.devforge/_state.json b/.devforge/_state.json index b06c1bcb..0087589d 100644 --- a/.devforge/_state.json +++ b/.devforge/_state.json @@ -1 +1 @@ -{"phase":"inner-loop","iteration":2,"head_sha":"9d2204fec0b3ac1acf3c42b3e0c86463e09e8f49","panel":{"tier":"medium","reviewers":[{"use":"staff-review","model":"sonnet"}],"final_reviewers":[{"use":"thermonuclear","model":"sonnet"},{"use":"code-review","model":"sonnet"}],"inner_iterations":3,"final_review_rounds":2}} +{"phase":"final-review","iteration":2,"head_sha":"9d2204fec0b3ac1acf3c42b3e0c86463e09e8f49","panel":{"tier":"medium","reviewers":[{"use":"staff-review","model":"sonnet"}],"final_reviewers":[{"use":"thermonuclear","model":"sonnet"},{"use":"code-review","model":"sonnet"}],"inner_iterations":3,"final_review_rounds":2}} diff --git a/.devforge/iter-2/review-staff-review.md b/.devforge/iter-2/review-staff-review.md new file mode 100644 index 00000000..59e71cb4 --- /dev/null +++ b/.devforge/iter-2/review-staff-review.md @@ -0,0 +1,88 @@ +VERDICT: PASS + +## Summary + +Reviewed the iter-2 diff fresh (both prior blockers claimed fixed by implementer) against +`_verified_task.md` and `2-design.md`, with independent local verification (rustc/rustfmt/rustdoc), +not by trusting the oracle output or implementer claims alone. + +## Verification performed + +1. **Algorithm correctness (`impit/src/response_parsing/mod.rs:159-162`)** + `decode_header_value` = `String::from_utf8(bytes.to_vec()).unwrap_or_else(|e| e.into_bytes().iter().map(|&b| b as char).collect())`. + - Confirmed `FromUtf8Error::into_bytes()` returns the **complete original buffer**, not a + truncated one, via standalone test (`[b'a', b'b', 0xE4, b'c', b'd']` round-trips fully through + the fallback) — this is the crux of correctness for #434/#430: any single invalid byte + anywhere in the header falls back to whole-buffer latin-1, not partial UTF-8 + partial mangle. + - Verified all four use cases directly by compiling and running equivalent Rust: + - UTF-8 (`naïve.pdf`, `héllo 世界 🎉`) decodes as UTF-8 (#479 fixed). + - Invalid-UTF-8 latin-1 (lone `0xE4`) falls back to `ä` byte-for-byte (#434 preserved). + - ASCII and empty-string pass through unchanged. + - No input (tried lone `0xE4`, `0xFF 0xFE 0x41`, truncated multi-byte `0xC3` at end of buffer) + ever produces `U+FFFD` (#430 non-crash/non-empty guarantee preserved). + - This is strictly better than `from_utf8_lossy` as the design claims — confirmed lossy would + introduce `U+FFFD` for the lone-`0xE4` case; the chosen implementation does not. + +2. **Doctest / byte-literal blocker (prior iteration's blocker #1)** + Extracted the exact doctest from `impit/src/response_parsing/mod.rs:150-155` into a standalone + crate and ran real `rustdoc --test` against it (compiled a `.rlib` and linked it properly, not + just `rustc` on a `fn main`). Result: **passes** (`test ... - response_parsing::decode_header_value + (line 10) ... ok`). The old broken single out-of-range byte literal is gone; `0xC3, 0xAF` are now + two separate valid `u8` array elements. This matches the oracle's `test-results.txt` doctest + result and is independently confirmed as a genuine fix, not just an oracle artifact. + +3. **rustfmt cleanliness (prior iteration's blocker #2)** + Ran `rustfmt --check` (not the oracle's cached result) on all four touched files, respecting the + project's actual `impit-node/rustfmt.toml` (`tab_spaces = 2`) by running from within + `impit-node/`: + - `impit/src/response_parsing/mod.rs` — clean + - `impit/src/lib.rs` — clean + - `impit-node/src/response.rs` — clean (2-space indent matches diff) + - `impit-python/src/response.rs` — clean + All exit 0. Matches oracle's "FMT CLEAN". + +4. **Binding call sites** + - Node (`impit-node/src/response.rs:3,94`): import adds `decode_header_value` alongside existing + `decode, ContentType`; call site `decode_header_value(v.as_bytes())` where `v: &HeaderValue` + (`.as_bytes()` returns `&[u8]`) — matches `fn decode_header_value(bytes: &[u8]) -> String`; + assigned into `Vec<(String, String)>` element — types match exactly what was there before. + - Python (`impit-python/src/response.rs:8-11,542-546`): import restructured into a multi-item + `use impit::{errors::ImpitError, utils::{decode_header_value, ContentType}};` — syntactically + valid Rust (verified structurally); call site inside `HashMap::from_iter(...)` closure, + `decode_header_value(v.as_bytes())` returns `String`, matching the `HashMap` + target type exactly as before. + - Confirmed no naming collision with the pre-existing unrelated `impit::utils::decode` (body + decoder) — `impit-python/src/response.rs:458` still calls fully-qualified `impit::utils::decode` + for body content, untouched. + - Confirmed no leftover duplicate inline `b as char` logic anywhere outside the shared helper + (`grep` for `as_bytes().iter()` in both bindings' `src/` returns nothing). + - Re-export chain verified: `impit/src/lib.rs:81` adds `pub use crate::response_parsing::decode_header_value;` + inside the existing `pub mod utils { ... }` block, consistent with how `decode`, + `determine_encoding`, `ContentType` are already re-exported. + +5. **Test coverage** + - Core unit tests (`impit/src/response_parsing/mod.rs:169-212`, oracle: 5/5 pass) cover ASCII, + empty, UTF-8, invalid-UTF-8-latin-1, and an explicit round-trip/no-replacement-char assertion. + Bytes are genuinely exercised as raw `&[u8]` / byte arrays, not derived from a `String` that + would mask the code path (e.g. `utf8_is_decoded_as_utf8` uses `"...".as_bytes()`, but since the + literal is valid UTF-8 source text this correctly represents the UTF-8-bytes-on-the-wire case). + - Node regression test (`impit-node/test/basics.test.ts:574-577` + `mock.server.ts:27-30,124-138`) + mirrors the existing latin-1 guard exactly: writes the raw HTTP response via a raw socket with + literal UTF-8 bytes for the header value (`Buffer.from(routes.utf8Header.headerValue, 'utf-8')`), + bypassing any client-side normalization, so it genuinely exercises the wire-bytes decode path + inside the Rust binding rather than a value Node's HTTP stack might re-encode. This is the + correct test shape (same pattern as the pre-existing `nonAsciiHeader` test) and satisfies + acceptance criterion #5 (UTF-8 regression test present in Node). + - Design explicitly defers a Python-side test as a nice-to-have (build unavailable locally); this + is an accepted, documented risk in `2-design.md`, not a gap introduced by this diff. + +6. **Scope check** + - No out-of-scope raw-header-bytes API was added (design explicitly deferred this). + - No stray edits outside the five touched files (`impit-node/src/response.rs`, + `impit-node/test/basics.test.ts`, `impit-node/test/mock.server.ts`, + `impit-python/src/response.rs`, `impit/src/lib.rs`, `impit/src/response_parsing/mod.rs`). + +No blockers, majors, minors, or nits found. Both prior-iteration blockers are genuinely fixed +(independently reproduced, not just re-reading the oracle log), the algorithm satisfies all four +acceptance criteria under adversarial edge-case testing, both binding call sites compile-shape +correctly, and test coverage is adequate and genuinely exercises the byte-level decode path. From db37ee11ef5eb88461babd14bd8993e0b9dbe9d7 Mon Sep 17 00:00:00 2001 From: Claude Date: Wed, 1 Jul 2026 08:26:16 +0000 Subject: [PATCH 05/20] refactor: avoid redundant copy in header-value decode Validate the header bytes against the borrow with str::from_utf8 instead of copying them into an owned Vec first. The common UTF-8 path now allocates once (the owned String) and the ISO-8859-1 fallback allocates once (the collect), with no intermediate copy. Byte semantics are unchanged. Co-Authored-By: Claude Opus 4.8 Claude-Session: https://claude.ai/code/session_01VrUiE5CzcJ9TiRTqvqb1JE --- .devforge/2-design.md | 9 +-- .devforge/_state.json | 2 +- .devforge/iter-2/final-review-code-review.md | 26 +++++++ .../iter-2/final-review-thermonuclear.md | 67 +++++++++++++++++++ .devforge/iter-3/claim.md | 28 ++++++++ impit/src/response_parsing/mod.rs | 6 +- 6 files changed, 131 insertions(+), 7 deletions(-) create mode 100644 .devforge/iter-2/final-review-code-review.md create mode 100644 .devforge/iter-2/final-review-thermonuclear.md create mode 100644 .devforge/iter-3/claim.md diff --git a/.devforge/2-design.md b/.devforge/2-design.md index 5e47c677..d487b0a0 100644 --- a/.devforge/2-design.md +++ b/.devforge/2-design.md @@ -19,10 +19,11 @@ preserved and the latin-1 fallback stays byte-reversible. It is strictly better issue's own suggestion (`from_utf8_lossy`), which would turn #434's bare `0xE4` into `U+FFFD` and reintroduce corruption for exactly the case #434 fixed. -Rust expresses this cleanly and allocation-efficiently: -`String::from_utf8(bytes.to_vec()).unwrap_or_else(|e| e.into_bytes().iter().map(|&b| b as char).collect())` -— the common UTF-8 path is a single move with no per-byte work; the fallback reuses the same -owned buffer. +Rust expresses this cleanly by validating against the borrow first: +`match std::str::from_utf8(bytes) { Ok(v) => v.to_owned(), Err(_) => bytes.iter().map(|&b| b as char).collect() }` +— `str::from_utf8` checks validity without copying, so the common UTF-8 path allocates exactly +once (`to_owned`) and the latin-1 fallback allocates exactly once (the `collect`); neither path +does a redundant intermediate copy. ## Alternatives + the call - **`from_utf8_lossy` (issue's suggestion):** rejected — lossy and regresses #434 (replacement diff --git a/.devforge/_state.json b/.devforge/_state.json index 0087589d..8345c2cb 100644 --- a/.devforge/_state.json +++ b/.devforge/_state.json @@ -1 +1 @@ -{"phase":"final-review","iteration":2,"head_sha":"9d2204fec0b3ac1acf3c42b3e0c86463e09e8f49","panel":{"tier":"medium","reviewers":[{"use":"staff-review","model":"sonnet"}],"final_reviewers":[{"use":"thermonuclear","model":"sonnet"},{"use":"code-review","model":"sonnet"}],"inner_iterations":3,"final_review_rounds":2}} +{"phase":"final-reopen","iteration":3,"head_sha":"9d2204fec0b3ac1acf3c42b3e0c86463e09e8f49","panel":{"tier":"medium","reviewers":[{"use":"staff-review","model":"sonnet"}],"final_reviewers":[{"use":"thermonuclear","model":"sonnet"},{"use":"code-review","model":"sonnet"}],"inner_iterations":3,"final_review_rounds":2}} diff --git a/.devforge/iter-2/final-review-code-review.md b/.devforge/iter-2/final-review-code-review.md new file mode 100644 index 00000000..2d255b58 --- /dev/null +++ b/.devforge/iter-2/final-review-code-review.md @@ -0,0 +1,26 @@ +VERDICT: PASS + +## Method +Independent fresh review per code-review.md's approach (read-diff, shallow-bug-scan, comment/doc-guidance-compliance, historical-context reasoning), applied to `impit/src/response_parsing/mod.rs`, `impit/src/lib.rs`, `impit-node/src/response.rs`, `impit-python/src/response.rs`, and the Node test additions. Verified the pure helper with a standalone rustc oracle (rustc 1.x local toolchain) since the full workspace cannot build here. + +## Verification performed +- Extracted `decode_header_value` into a standalone snippet and ran the 4 tests from the diff plus 4 adversarial tests I added (mixed valid-UTF-8-prefix-then-invalid-byte, truncated multibyte lead byte, whole-buffer-fallback-not-partial-decode check, char-count-equals-byte-count check). All 8 passed. +- Confirmed `String::from_utf8` failure causes the *entire* original byte buffer (via `FromUtf8Error::into_bytes()`, which returns the original vec unmodified) to fall back to the byte-for-byte latin-1 map — no partial UTF-8 decoding/mixing occurs, so PR #434's guarantee (byte-exact latin-1 on invalid UTF-8) holds even for buffers with a valid UTF-8 prefix followed by a bad byte. +- Confirmed the fallback path never invokes `DecoderTrap::Replace` or any lossy path, so no `U+FFFD` can appear (issue #430 guarantee) — this is structural, not incidental (there is no lossy call in either branch). +- Confirmed the new doctest on `decode_header_value` compiles and passes under `rustdoc --test` once given a matching `--edition 2021` (my harness's first failure was a self-inflicted edition mismatch in the oracle harness, not a defect in the source). +- Confirmed rustfmt reports no formatting diff on the new function. +- Confirmed both binding call sites (`impit-node/src/response.rs:94`, `impit-python/src/response.rs:545`) import and call `decode_header_value(v.as_bytes())` with matching signature `&[u8] -> String`, directly replacing the old `v.as_bytes().iter().map(|&b| b as char).collect()` inline closures 1:1 — no behavior divergence between bindings. +- Confirmed `impit/src/lib.rs` re-exports `decode_header_value` through `pub mod utils`, alongside the existing `decode`/`determine_encoding`/`ContentType` exports, so both bindings' `use impit::utils::{..., decode_header_value, ...}` imports resolve. +- Confirmed the Node regression test (`impit-node/test/basics.test.ts:574-577`) and its mock route (`impit-node/test/mock.server.ts:124-138`) mirror the pre-existing, already-proven `nonAsciiHeader` raw-socket-header-injection pattern; verified with a quick Node snippet that `Buffer.from(headerValue, 'utf-8')` produces the expected UTF-8 bytes (0xC3 0xAF for `ï`), which `decode_header_value` will correctly round-trip back to the original string. + +## Acceptance criteria (from `_verified_task.md`) +1. UTF-8 header decodes as UTF-8 — verified (oracle test + doctest). +2. Invalid-UTF-8 latin-1 bytes still decode byte-for-byte as latin-1 — verified, including the harder case of a valid-UTF-8-looking prefix followed by an invalid byte (whole buffer still falls back atomically). +3. No `U+FFFD` ever introduced — verified structurally (no lossy decode call exists in either code path). +4. Applied symmetrically in Node and Python — verified, identical call pattern in both `response.rs` files. +5. Regression test present for UTF-8 case in Node — verified, added and consistent with existing test infra. + +## Findings with confidence >= 80 +None. + +Minor items noted but explicitly out of scope per the design doc's own risk section (ambiguous-bytes tradeoff, no Python test due to unavailable maturin build here, un-runnable full workspace build due to blocked `h2` git dependency) — these are called out and accepted as intentional/environment-limited in `.devforge/2-design.md`, not defects introduced by this change, and do not meet the >=80 confidence bar as unaddressed regressions. diff --git a/.devforge/iter-2/final-review-thermonuclear.md b/.devforge/iter-2/final-review-thermonuclear.md new file mode 100644 index 00000000..a142435c --- /dev/null +++ b/.devforge/iter-2/final-review-thermonuclear.md @@ -0,0 +1,67 @@ +VERDICT: FAIL + +## MINOR — needless allocation on the common path + inaccurate doc claim in `decode_header_value` + +**File:** `impit/src/response_parsing/mod.rs:159-162` + +```rust +pub fn decode_header_value(bytes: &[u8]) -> String { + String::from_utf8(bytes.to_vec()) + .unwrap_or_else(|e| e.into_bytes().iter().map(|&b| b as char).collect()) +} +``` + +`bytes.to_vec()` unconditionally copies the entire input buffer into a freshly allocated +`Vec` *before* UTF-8 validity is even checked, on every call, for every header, on every +response. The design doc (`.devforge/2-design.md:22-25`) and the function's own rustdoc-adjacent +commentary claim this is "a single move with no per-byte work" on the common path — that's not +accurate for `to_vec()`, which is a byte-for-byte `memcpy`, not a move. `Vec::from(bytes)`/`to_vec` +never reuses the caller's buffer since the caller only hands over a borrowed `&[u8]` +(`HeaderValue::as_bytes()`), so there's no way to "move" here regardless. + +The direct, idiomatic, and cheaper version is the standard borrow-first pattern: + +```rust +pub fn decode_header_value(bytes: &[u8]) -> String { + match std::str::from_utf8(bytes) { + Ok(s) => s.to_string(), + Err(_) => bytes.iter().map(|&b| b as char).collect(), + } +} +``` + +This validates against the borrowed slice with zero allocation, and only allocates once (via +`to_string()`) on the success path — i.e. it does strictly less work than the current +`to_vec()` (which allocates+copies unconditionally) followed by `String::from_utf8` (which just +re-wraps that buffer). It's also easier to read: no `Result`-unwrapping through +`.into_bytes()` in the error arm, no implicit conversion of "the same owned buffer" that isn't +actually being reused (the current code's error path calls `.into_bytes()` on the `FromUtf8Error` +only to immediately throw the `Vec` away byte-by-byte in the `.iter().map(...)` — so the +"reuses the same owned buffer" claim in the design doc is also not realized: the fallback path +re-walks the vec one byte at a time and builds a brand new `String`, it does not reuse the +allocation). + +Concrete scenario this matters for: this helper runs on *every response header, for every +request*, in both bindings (Node hot path: `impit-node/src/response.rs:94`; Python hot path: +`impit-python/src/response.rs:545`). It's exactly the kind of small, shared, called-everywhere +core-crate helper where an avoidable per-call heap copy is worth eliminating now rather than +carrying it forward as "how the shared helper has always worked" — and the mis-description in +the design doc/rustdoc-adjacent rationale ("single move," "reuses the same owned buffer") makes +the code read as more optimized than it actually is, which will mislead the next person who +touches this function into thinking the allocation profile is already minimal. + +This is a one-line fix, low risk, behavior-preserving (verified via a standalone rustc oracle: +both the current implementation and the `std::str::from_utf8` version produce identical output +for the UTF-8, invalid-UTF-8/latin-1-fallback, and empty-input cases). Given the ENGINE.md +standard of "code-judo simplification" and "no needless allocation," this should be fixed before +merge rather than accepted as-is. + +Everything else in this diff is sound: the helper lives in the correct canonical location +(`impit/src/response_parsing/mod.rs`, re-exported via `impit::utils`), both bindings now call the +one shared implementation with no leftover duplicate `b as char` logic anywhere in the tree, no +file approaches the 1k-line ceiling (largest touched file is `impit-python/src/response.rs` at +604 lines), there is no new branching/spaghetti introduced into `response.rs` in either binding +(the call sites are direct one-line substitutions), and the unit tests plus the new Node UTF-8 +regression test are well-targeted and correctly assert the documented invariants (UTF-8 first, +latin-1 fallback, no `U+FFFD`, byte-reversibility). The only issue is the allocation/doc-accuracy +point above. diff --git a/.devforge/iter-3/claim.md b/.devforge/iter-3/claim.md new file mode 100644 index 00000000..cc9a068f --- /dev/null +++ b/.devforge/iter-3/claim.md @@ -0,0 +1,28 @@ +# Iteration 3 (final-reopen round 1) — implementer claim + +## Addressed the thermonuclear final-review finding +- **Needless allocation on the decode path** — FIXED. Rewrote `decode_header_value` to validate + against the borrow with `std::str::from_utf8(bytes)`: + ```rust + match std::str::from_utf8(bytes) { + Ok(valid) => valid.to_owned(), + Err(_) => bytes.iter().map(|&b| b as char).collect(), + } + ``` + This removes the unconditional `bytes.to_vec()` copy. The common UTF-8 path now allocates + exactly once (`to_owned`); the latin-1 fallback allocates exactly once (`collect`) instead of + twice. Byte semantics are identical (whole-buffer UTF-8 check, else full byte→codepoint map). +- **Inaccurate design-doc wording** ("single move / reuses the same buffer") — FIXED in + `2-design.md` to describe the borrow-validate approach accurately. + +## code-review final reviewer: PASS (no changes needed). + +## Oracle — green +- `rustfmt --check` (core + both bindings): CLEAN. +- `rustc --test` unit tests: 5/5 pass (unchanged behavior confirmed after rewrite). +- `rustdoc --test` doctest: 1/1 pass. + +## Scope +- Change is confined to the helper body + doc wording; no call-site, test, or public-signature + change. Behavior is byte-for-byte identical, so this is a final-reopen (final reviewers only). +- Full workspace build / JS-Py suites still deferred to CI (github.com/apify/h2 egress 403). diff --git a/impit/src/response_parsing/mod.rs b/impit/src/response_parsing/mod.rs index 24e32c02..d8eed727 100644 --- a/impit/src/response_parsing/mod.rs +++ b/impit/src/response_parsing/mod.rs @@ -157,8 +157,10 @@ pub fn determine_encoding(bytes: &[u8]) -> Option { /// assert_eq!(decode_header_value(&[b'M', 0xE4, b'r', b'z']), "März"); /// ``` pub fn decode_header_value(bytes: &[u8]) -> String { - String::from_utf8(bytes.to_vec()) - .unwrap_or_else(|e| e.into_bytes().iter().map(|&b| b as char).collect()) + match std::str::from_utf8(bytes) { + Ok(valid) => valid.to_owned(), + Err(_) => bytes.iter().map(|&b| b as char).collect(), + } } /// A struct that represents the contents of the `Content-Type` header. From 0d0ec079a5dc98f8a7cd89841778e17585a564fc Mon Sep 17 00:00:00 2001 From: Claude Date: Wed, 1 Jul 2026 08:28:01 +0000 Subject: [PATCH 06/20] chore(devforge): record final-review pass (loop converged) Review evidence only; no source changes. Awaiting human create-PR decision. Co-Authored-By: Claude Opus 4.8 Claude-Session: https://claude.ai/code/session_01VrUiE5CzcJ9TiRTqvqb1JE --- .devforge/_state.json | 2 +- .devforge/iter-3/final-review-code-review.md | 22 +++++++ .../iter-3/final-review-thermonuclear.md | 58 +++++++++++++++++++ 3 files changed, 81 insertions(+), 1 deletion(-) create mode 100644 .devforge/iter-3/final-review-code-review.md create mode 100644 .devforge/iter-3/final-review-thermonuclear.md diff --git a/.devforge/_state.json b/.devforge/_state.json index 8345c2cb..6254ba49 100644 --- a/.devforge/_state.json +++ b/.devforge/_state.json @@ -1 +1 @@ -{"phase":"final-reopen","iteration":3,"head_sha":"9d2204fec0b3ac1acf3c42b3e0c86463e09e8f49","panel":{"tier":"medium","reviewers":[{"use":"staff-review","model":"sonnet"}],"final_reviewers":[{"use":"thermonuclear","model":"sonnet"},{"use":"code-review","model":"sonnet"}],"inner_iterations":3,"final_review_rounds":2}} +{"phase":"create-pr","iteration":3,"head_sha":"9d2204fec0b3ac1acf3c42b3e0c86463e09e8f49","panel":{"tier":"medium","reviewers":[{"use":"staff-review","model":"sonnet"}],"final_reviewers":[{"use":"thermonuclear","model":"sonnet"},{"use":"code-review","model":"sonnet"}],"inner_iterations":3,"final_review_rounds":2}} diff --git a/.devforge/iter-3/final-review-code-review.md b/.devforge/iter-3/final-review-code-review.md new file mode 100644 index 00000000..cea9a7c9 --- /dev/null +++ b/.devforge/iter-3/final-review-code-review.md @@ -0,0 +1,22 @@ +VERDICT: PASS + +## Scope of this round +Reviewed the single targeted change: `decode_header_value` in `impit/src/response_parsing/mod.rs:159-164` was rewritten from +`String::from_utf8(bytes.to_vec()).unwrap_or_else(|_| bytes.iter().map(|&b| b as char).collect())` +to +`match std::str::from_utf8(bytes) { Ok(valid) => valid.to_owned(), Err(_) => bytes.iter().map(|&b| b as char).collect() }`. +Call sites (`impit-node/src/response.rs:94`, `impit-python/src/response.rs:545`), the re-export (`impit/src/lib.rs:81`), the public signature, and tests are confirmed unchanged from the prior iteration. + +## Verification performed +- Extracted both the old and new implementations into a standalone Rust program and ran an exhaustive equivalence check with `rustc -O`: all 256 single-byte inputs, all 65,536 two-byte inputs, a curated set of UTF-8 edge cases (valid multi-byte sequences, truncated 2/3/4-byte sequences, overlong encodings, encoded surrogate halves, empty input), and 2,000 deterministic pseudo-random byte sequences (lengths 0-11). Result: 67,807/67,807 identical outputs between old and new — zero mismatches. This is expected since both paths use the same UTF-8 validator (`std::str::from_utf8` internally backs `String::from_utf8`); the change only avoids validating/copying via an intermediate `Vec`. +- Compiled and ran the two doc-comment examples (naïve UTF-8 case, März latin-1 fallback case) standalone — both pass. +- Compiled and ran the four `#[cfg(test)]` unit tests (`ascii_is_unchanged`, `empty_is_empty`, `utf8_is_decoded_as_utf8`, `invalid_utf8_falls_back_to_iso_8859_1`, `iso_8859_1_fallback_never_produces_replacement_char`) via standalone `rustc --test` — all 5 pass. +- Ran `cargo clippy --all-targets -- -D warnings` on the extracted function in an isolated crate — zero warnings (no `needless_collect`/allocation lints triggered). + +## Acceptance criteria re-confirmed +1. Valid-UTF-8 header bytes decode as UTF-8 (fixes #479 mojibake) — confirmed via unit test and doctest. +2. Lone `0xE4` still decodes byte-for-byte as latin-1 → `ä` (keeps #434) — confirmed via unit test and equivalence check. +3. No `U+FFFD` ever introduced; function never panics or returns empty for non-empty invalid-UTF-8 input (keeps #430) — confirmed via `str::from_utf8` never causing a panic path here (only produces `Result`, no `.unwrap()`), and the fallback branch is a total function over `u8` (`b as char` is defined for all `0..=255`). + +## Findings with confidence >= 80 +None. No correctness, regression, or compilation-risk findings meet the threshold. diff --git a/.devforge/iter-3/final-review-thermonuclear.md b/.devforge/iter-3/final-review-thermonuclear.md new file mode 100644 index 00000000..ed0d685c --- /dev/null +++ b/.devforge/iter-3/final-review-thermonuclear.md @@ -0,0 +1,58 @@ +VERDICT: PASS + +## Method +Independent fresh review of `impit/src/response_parsing/mod.rs`, the binding call sites +(`impit-node/src/response.rs`, `impit-python/src/response.rs`), `impit/src/lib.rs`, and the +design-doc wording change in `.devforge/2-design.md`, per ENGINE.md's thermonuclear standards +(1k-line ceiling, spaghetti/branching growth, code-judo simplification, canonical-layer reuse). +Did not assume the fix was correct; independently derived and checked the expected helper body, +then diffed it against what's actually in the tree. + +## Verification performed +- Confirmed the current helper body is exactly: + ```rust + pub fn decode_header_value(bytes: &[u8]) -> String { + match std::str::from_utf8(bytes) { + Ok(valid) => valid.to_owned(), + Err(_) => bytes.iter().map(|&b| b as char).collect(), + } + } + ``` + (`impit/src/response_parsing/mod.rs:159-164`) — matches the prescribed fix. `std::str::from_utf8` + validates against the borrowed `&[u8]` with zero allocation; only the success arm (`to_owned`) + or the fallback arm (`collect`) allocates, exactly once each. The prior finding's `bytes.to_vec()` + unconditional pre-copy is gone; there is no remaining needless allocation on either path. +- Wrote a standalone rustc harness comparing the old body + (`String::from_utf8(bytes.to_vec()).unwrap_or_else(|e| e.into_bytes().iter().map(|&b| b as + char).collect())`) against the new body across: empty input, plain ASCII, valid multi-byte UTF-8 + (`naïve.pdf`), a lone invalid byte (`0xE4`), multiple invalid bytes, a valid-UTF-8 prefix followed + by an invalid trailing byte, a truncated multi-byte lead byte (`0xC3` alone), and all 256 + single-byte inputs individually. Result: `ALL EQUIVALENT` — byte-for-byte identical output in + every case. This confirms the rewrite is UTF-8-first / whole-buffer latin-1 fallback / never + emits `U+FFFD` (latin-1 fallback only ever maps `0x00..=0xFF` → `U+0000..=U+00FF`, structurally, + since there's no lossy call in either branch), i.e. semantics are unchanged from the + already-reviewed-and-accepted algorithm. +- Confirmed `.devforge/2-design.md:22-25` no longer contains the inaccurate "single move" / "reuses + the same owned buffer" claim; it now says `str::from_utf8` checks validity without copying and + each path allocates exactly once — accurate for the new code. +- Confirmed `.devforge/iter-3/test-results.txt` oracle output is green: `rustfmt --check` clean, + 5/5 unit tests pass unchanged, 1/1 doctest passes. +- Confirmed no other occurrence of the old inline `b as char` pattern or the old `to_vec()`-based + helper body remains anywhere in the tree (only the new helper's own fallback arm, which + necessarily uses `b as char` per the design's chosen algorithm). +- Confirmed both binding call sites are unchanged one-line substitutions (`impit-node/src/response.rs:94`, + `impit-python/src/response.rs:545`) — no new branching, no new call-site logic, nothing to review + there beyond what already passed prior rounds. +- File-size check: `impit/src/response_parsing/mod.rs` 250 lines, `impit-node/src/response.rs` 336 + lines, `impit-python/src/response.rs` 604 lines, `impit/src/lib.rs` 91 lines — all well under the + 1k-line ceiling; this iteration's diff is a 2-line body swap plus a doc-wording correction, so no + file-size or decomposition concern. +- No new conditionals, flags, wrappers, or abstractions were introduced — the diff strictly + replaces one expression with an equivalent, cheaper one inside an already-isolated pure + function. No spaghetti growth, no canonical-layer violation, no boundary/type churn. + +## Conclusion +The prior finding (needless `bytes.to_vec()` allocation before the UTF-8 check, and the design +doc's inaccurate "single move / reuses the buffer" description) is genuinely resolved: the code +now validates against the borrow first and allocates minimally on both paths, and the doc wording +was corrected to match. No new issue was introduced by the fix. Zero findings. From d236997cdb77277e3264ca46a0a7db09c0d3bb0a Mon Sep 17 00:00:00 2001 From: Claude Date: Wed, 1 Jul 2026 09:11:16 +0000 Subject: [PATCH 07/20] docs(devforge): revise design to per-ecosystem decode + raw-bytes accessor What we're solving: keep #479's UTF-8 fix but make each binding faithful to its reference client - httpx (UTF-8-first) in Python, Fetch (strict latin-1) in JS - and add a raw-header-bytes accessor to both for HMAC/signature callers who need exact wire bytes. Reopens the design gate: the approval marker is removed and no source is changed until re-approved. Awaiting human approval of the revised design and the raw_headers / rawHeaders API shapes. Co-Authored-By: Claude Opus 4.8 Claude-Session: https://claude.ai/code/session_01VrUiE5CzcJ9TiRTqvqb1JE --- .devforge/2-design.md | 106 ++++++++++++++++++------------------ .devforge/_design.approved | 1 - .devforge/_panel.json | 4 +- .devforge/_state.json | 2 +- .devforge/_verified_task.md | 20 ++++--- 5 files changed, 69 insertions(+), 64 deletions(-) delete mode 100644 .devforge/_design.approved diff --git a/.devforge/2-design.md b/.devforge/2-design.md index d487b0a0..2d38b6d7 100644 --- a/.devforge/2-design.md +++ b/.devforge/2-design.md @@ -1,60 +1,62 @@ -# Design — fix #479 header decoding without regressing #434 +# Design (rev 2) — fix #479: per-ecosystem header decoding + raw-bytes accessor ## What we're solving -Response header values are decoded as ISO-8859-1 (`b as char`) in both bindings. That was a -deliberate choice in PR #434 to stop non-ASCII header bytes from crashing Node / emptying -Python (issue #430). But ISO-8859-1 mangles the far more common case — headers whose bytes are -UTF-8 (e.g. `Content-Disposition: filename="naïve.pdf"`) — into mojibake (`ï` → `ï`), breaking -filename extraction and any byte-exact re-encoding. We need both cases correct at once. +Header values are decoded as ISO-8859-1 (`b as char`), garbling UTF-8 headers (#479). Rather +than force one behavior on both bindings, we make each binding faithful to the reference client +it already claims to implement, and give callers who need exact bytes (HMAC/signature checks) a +raw accessor — mirroring what httpx already offers. -## How -Decode with **UTF-8 first, ISO-8859-1 fallback**: - -- If the header bytes are valid UTF-8 → decode as UTF-8 (fixes #479's mojibake). -- Otherwise → fall back to the existing byte-preserving `b as char` latin-1 decode (keeps #434; - e.g. bare `0xE4` → `ä`). +Decision (confirmed with maintainer): **behave like httpx in Python, like Fetch in JS**, and +**add a raw-header-bytes accessor to both**. -This never emits `U+FFFD` replacement characters, so #430's non-crash / non-empty guarantee is -preserved and the latin-1 fallback stays byte-reversible. It is strictly better than the -issue's own suggestion (`from_utf8_lossy`), which would turn #434's bare `0xE4` into `U+FFFD` -and reintroduce corruption for exactly the case #434 fixed. - -Rust expresses this cleanly by validating against the borrow first: -`match std::str::from_utf8(bytes) { Ok(v) => v.to_owned(), Err(_) => bytes.iter().map(|&b| b as char).collect() }` -— `str::from_utf8` checks validity without copying, so the common UTF-8 path allocates exactly -once (`to_owned`) and the latin-1 fallback allocates exactly once (the `collect`); neither path -does a redundant intermediate copy. +## How +**Decoding (the split):** +- **Python = httpx semantics.** Decode UTF-8-first with an ISO-8859-1 fallback (httpx tries + ascii→utf-8→iso-8859-1). This is the shared `decode_header_value` helper we already built. +- **JS = Fetch semantics.** Keep strict ISO-8859-1 isomorphic decode (`b as char`, i.e. PR + #434's behavior). This means impit-node's string headers stay byte-recoverable via the standard + `Buffer.from(v, 'binary')` idiom, matching `fetch()`/undici/axios. Revert the JS call site to + `b as char` and drop the JS "UTF-8 header" string test. + +**Raw-bytes accessor (new public API, both bindings):** +- **Python** — mirror httpx's `Response.headers.raw`: expose `raw_headers` on the response as + `list[tuple[bytes, bytes]]` (name, value), preserving order and duplicates (reqwest yields + repeated headers separately). This closes a real httpx-compat gap. +- **JS** — no Fetch precedent, so this is an explicit impit extension: expose `rawHeaders` on the + response as `Array<[string, Uint8Array]>` (name as string per Fetch conventions, value as raw + bytes), same order/duplicate semantics. Justified because HMAC callers need exact bytes and + latin-1 strings, while recoverable, are error-prone to reverse by hand. + +Both accessors return the untouched wire bytes, so a signature/HMAC caller never depends on any +string decoding. ## Alternatives + the call -- **`from_utf8_lossy` (issue's suggestion):** rejected — lossy and regresses #434 (replacement - chars, irreversible) as shown above. -- **Latin-1 always (status quo):** rejected — this is the bug. -- **UTF-8 always / error on invalid:** rejected — re-breaks #430 (invalid-UTF-8 latin-1 headers). -- **Expose raw header bytes API for HMAC/signature callers:** deferred — useful but a separate, - larger public-API addition; note as follow-up, out of scope here. -- **Chosen: UTF-8-first with latin-1 fallback**, placed in one shared helper. - -## Major changes (key areas, not exhaustive) -- Add a shared `decode_header_value(&[u8]) -> String` helper in the core crate's - `response_parsing` module, re-exported through `impit::utils`, so both bindings share one - tested implementation instead of duplicating the closure. Cover it with core unit tests - (ASCII, UTF-8, invalid-UTF-8 latin-1, empty). -- Node (`impit-node/src/response.rs`): replace the inline `b as char` map with a call to the - shared helper. -- Python (`impit-python/src/response.rs`): same replacement. -- Tests: add a UTF-8 header regression test to the Node suite (mirrors the existing latin-1 - test in `basics.test.ts` / `mock.server.ts`); the existing latin-1 test is the guard that the - fallback still works. +- **Symmetric UTF-8-first in both (previous rev):** rejected per maintainer — deviates from + strict Fetch on JS and breaks the `Buffer.from(v,'binary')` recovery idiom. +- **Latin-1 everywhere + raw bytes only:** rejected — leaves Python worse than httpx. +- **Skip the raw accessor:** rejected — HMAC/signature callers have no correct alternative once + decoding is lossy (distinct byte sequences can decode to the same string). +- **Chosen:** per-ecosystem decode + raw accessor in both. + +## Major changes (key areas) +- Core crate: keep `decode_header_value` (UTF-8-first); Python consumes it, JS does not. +- `impit-python/src/response.rs`: keep helper for the string dict; add `raw_headers` getter + returning byte-pair tuples. +- `impit-node/src/response.rs`: revert string decode to `b as char`; add `rawHeaders` accessor + returning name/`Uint8Array` pairs; update the `.d.ts`/napi surface. +- Tests: Python — UTF-8 decodes correctly + `raw_headers` returns exact bytes. JS — existing + latin-1 test stays; add a `rawHeaders` test asserting exact wire bytes (and that string decode + remains latin-1). Core — existing `decode_header_value` unit tests unchanged. +- Docs: note the intentional Python/JS decoding difference; note JS `rawHeaders` is an impit + extension beyond Fetch. ## Risks / open questions -- **Ambiguous bytes:** a byte sequence that is *coincidentally* valid UTF-8 but was meant as - latin-1 will now decode as UTF-8. This is unavoidable without out-of-band charset info and - UTF-8 is the correct modern default; the tradeoff is intended. -- **Environment/oracle limitation:** the full Rust workspace cannot compile here — the pinned - git dep `github.com/apify/h2` is blocked (403) by org egress and its cache is empty. The - devforge oracle therefore runs a standalone `rustc --test` copy of the helper to prove the - algorithm; full binding compilation/integration must be verified in CI. Reviewers should treat - binding-compile as unverified-locally. -- **Python test gap:** Python has no existing header-decode test; adding one requires the - maturin build (also unavailable here). Node coverage + shared-helper unit tests carry the - correctness signal; a Python test is a nice-to-have follow-up if the build is available in CI. +- **#479 becomes a partial fix by design:** JS UTF-8 headers stay latin-1 (mojibake) on the + string API; the fix for JS callers is `rawHeaders` + their own decode, matching Fetch. The + issue/PR must state this explicitly so it isn't read as "not fixed." +- **New public API surface in both bindings** — naming (`raw_headers` / `rawHeaders`), return + shapes, and multi-value/duplicate semantics are the things to lock at this gate. +- **Build/oracle limit unchanged:** full workspace + napi/maturin can't build here + (github.com/apify/h2 egress 403). Core helper is oracle-tested via standalone rustc; the new + accessors' binding compilation + JS/Py tests must be verified in CI. The `rawHeaders` napi/pyo3 + wiring in particular is only compile-checkable in CI. diff --git a/.devforge/_design.approved b/.devforge/_design.approved deleted file mode 100644 index 277f6765..00000000 --- a/.devforge/_design.approved +++ /dev/null @@ -1 +0,0 @@ -approved by josef.prochazka@apify.com in chat on 2026-07-01 (design gate) diff --git a/.devforge/_panel.json b/.devforge/_panel.json index b5206af1..3915fad5 100644 --- a/.devforge/_panel.json +++ b/.devforge/_panel.json @@ -1,6 +1,6 @@ { - "tier": "medium", - "reason": "Small, localized code change but it alters the public response-header decoding contract across both language bindings (blast-radius override lifts small→medium). Extra scrutiny warranted because the change sits between two conflicting prior decisions (#434 vs #479) and the full workspace can't be compiled locally.", + "tier": "large", + "reason": "Now a public-API change in both bindings: adds a new raw-bytes accessor (raw_headers / rawHeaders) plus an intentional per-binding behavior split. New public contract surface in two language bindings lifts this from medium to large.", "reviewers": [ { "use": "staff-review", "model": "sonnet" } ], diff --git a/.devforge/_state.json b/.devforge/_state.json index 6254ba49..62a316e6 100644 --- a/.devforge/_state.json +++ b/.devforge/_state.json @@ -1 +1 @@ -{"phase":"create-pr","iteration":3,"head_sha":"9d2204fec0b3ac1acf3c42b3e0c86463e09e8f49","panel":{"tier":"medium","reviewers":[{"use":"staff-review","model":"sonnet"}],"final_reviewers":[{"use":"thermonuclear","model":"sonnet"},{"use":"code-review","model":"sonnet"}],"inner_iterations":3,"final_review_rounds":2}} +{"phase":"design-gate","iteration":0,"head_sha":"9d2204fec0b3ac1acf3c42b3e0c86463e09e8f49","panel":{"tier":"large","reviewers":[{"use":"staff-review","model":"sonnet"}],"final_reviewers":[{"use":"thermonuclear","model":"sonnet"},{"use":"code-review","model":"sonnet"}],"inner_iterations":3,"final_review_rounds":2}} diff --git a/.devforge/_verified_task.md b/.devforge/_verified_task.md index 3bd83405..d4353084 100644 --- a/.devforge/_verified_task.md +++ b/.devforge/_verified_task.md @@ -11,12 +11,16 @@ Response header values must decode correctly for the common modern case (UTF-8, - Existing regression guard: `impit-node/test/basics.test.ts:569` + `impit-node/test/mock.server.ts:105-118` - Shared helper candidate home: `impit/src/response_parsing/mod.rs`, re-exported via `impit::utils` -## Acceptance -1. A header whose bytes are valid UTF-8 decodes as UTF-8 (fixes #479 mojibake). -2. A header with invalid-UTF-8 latin-1 bytes still decodes byte-for-byte as latin-1 (keeps #434). -3. No `U+FFFD` replacement chars introduced (keeps #430 non-crash / non-empty). -4. Applied symmetrically in Node and Python bindings. -5. Regression test present for the UTF-8 case (at minimum in Node, which has the existing suite). +## Acceptance (rev 2 — per-ecosystem, confirmed with maintainer) +1. Python decodes headers httpx-style: UTF-8-first with ISO-8859-1 fallback, never crash/empty/ + `U+FFFD` (fixes #479 for Python; keeps #430/#434 guarantees). +2. JS decodes headers Fetch-style: strict ISO-8859-1 isomorphic decode (`b as char`), so string + values stay byte-recoverable via `Buffer.from(v,'binary')`. JS UTF-8 mojibake is intentional. +3. Both bindings expose a raw-bytes accessor returning the exact wire bytes, order + duplicates + preserved: Python `raw_headers: list[tuple[bytes,bytes]]` (httpx `.raw` parity); JS + `rawHeaders: Array<[string, Uint8Array]>` (impit extension). +4. Tests: Python UTF-8 decode + raw bytes exact; JS latin-1 decode retained + raw bytes exact. +5. #479 resolution documented as intentionally split (JS = Fetch parity + rawHeaders escape hatch). -## Explicitly out of scope (note as follow-up) -Exposing raw header bytes for HMAC/signature callers — larger API addition, separate issue. +## Now in scope (was previously deferred) +Raw header bytes accessor for HMAC/signature callers — included per maintainer decision. From 4a0e40e5fdcb8c69692e082ba47173cd5e286250 Mon Sep 17 00:00:00 2001 From: Claude Date: Wed, 1 Jul 2026 09:25:58 +0000 Subject: [PATCH 08/20] feat: per-ecosystem header decoding + raw-header-bytes accessor What we're solving: response header values were decoded as ISO-8859-1, garbling UTF-8 values (#479). Rather than impose one behavior everywhere, each binding now follows the reference client it emulates, and both gain a way to read the exact wire bytes for signature/HMAC use. How: Python decodes header values httpx-style (UTF-8 first, ISO-8859-1 fallback) via a shared core helper; JS keeps strict ISO-8859-1 to match the Fetch API, so its string values stay byte-recoverable. Both bindings expose the untouched header bytes - Python `Response.raw_headers` as (bytes, bytes) pairs (httpx Headers.raw parity), JS `response.rawHeaders` as [name, Uint8Array] pairs. Alternatives considered: symmetric UTF-8-first in both (deviates from Fetch on JS and breaks the Buffer.from(v,'latin1') recovery idiom); latin-1 everywhere (leaves Python worse than httpx); skipping the raw accessor (HMAC callers have no correct alternative once decoding is lossy). Note: the Rust binding glue could not be compiled in the authoring environment; binding compilation and the JS/Python test runs are verified in CI. Co-Authored-By: Claude Opus 4.8 Claude-Session: https://claude.ai/code/session_01VrUiE5CzcJ9TiRTqvqb1JE --- .devforge/_design.approved | 1 + .devforge/_state.json | 2 +- .devforge/iter-1-rev2/claim.md | 43 ++++++++++++++++++++++++++++ impit-node/src/response.rs | 45 ++++++++++++++++++++++++++++-- impit-node/test/basics.test.ts | 17 +++++++++-- impit-python/src/response.rs | 32 +++++++++++++++++++++ impit-python/test/response_test.py | 13 +++++++++ 7 files changed, 148 insertions(+), 5 deletions(-) create mode 100644 .devforge/_design.approved create mode 100644 .devforge/iter-1-rev2/claim.md diff --git a/.devforge/_design.approved b/.devforge/_design.approved new file mode 100644 index 00000000..022284ca --- /dev/null +++ b/.devforge/_design.approved @@ -0,0 +1 @@ +approved by josef.prochazka@apify.com in chat on 2026-07-01 (design gate rev 2: asymmetric decode + raw_headers/rawHeaders accessors, API shapes approved) diff --git a/.devforge/_state.json b/.devforge/_state.json index 62a316e6..2c747cf5 100644 --- a/.devforge/_state.json +++ b/.devforge/_state.json @@ -1 +1 @@ -{"phase":"design-gate","iteration":0,"head_sha":"9d2204fec0b3ac1acf3c42b3e0c86463e09e8f49","panel":{"tier":"large","reviewers":[{"use":"staff-review","model":"sonnet"}],"final_reviewers":[{"use":"thermonuclear","model":"sonnet"},{"use":"code-review","model":"sonnet"}],"inner_iterations":3,"final_review_rounds":2}} +{"phase":"inner-loop","iteration":1,"head_sha":"9d2204fec0b3ac1acf3c42b3e0c86463e09e8f49","panel":{"tier":"large","reviewers":[{"use":"staff-review","model":"sonnet"}],"final_reviewers":[{"use":"thermonuclear","model":"sonnet"},{"use":"code-review","model":"sonnet"}],"inner_iterations":3,"final_review_rounds":2}} diff --git a/.devforge/iter-1-rev2/claim.md b/.devforge/iter-1-rev2/claim.md new file mode 100644 index 00000000..a52934ac --- /dev/null +++ b/.devforge/iter-1-rev2/claim.md @@ -0,0 +1,43 @@ +# Iteration 1 (rev 2) — implementer claim + +## Done — asymmetric decode +- **Python (httpx):** keeps `decode_header_value` (UTF-8-first, latin-1 fallback) for the string + `headers` dict. Unchanged from before. +- **JS (Fetch):** reverted the string decode to strict ISO-8859-1 (`b as char`) in + `try_from_response`, and removed `decode_header_value` from the Node import. String header + values now match Fetch and stay byte-recoverable via `Buffer.from(v, 'latin1')`. + +## Done — raw-bytes accessor (new public API) +- **Python** `Response.raw_headers` → `list[tuple[bytes, bytes]]` (httpx `Headers.raw` parity), + wire order + duplicates preserved. New private field `raw_headers: Vec<(Vec, Vec)>` + captured in `from_async` (exact wire bytes) and in `new` (UTF-8 bytes of the string headers); + getter builds `PyBytes`. +- **JS** `response.rawHeaders` → `Array<[string, Uint8Array]>` (impit extension). New private + field `raw_header_pairs: Vec<(String, Vec)>` captured in `try_from_response`; napi getter + maps to `Uint8Array`. The JS wrapper (`index.wrapper.js`) returns the patched native object and + does not overwrite `rawHeaders`, so the native getter surfaces to users without extra plumbing + (verified by reading `#wrapResponse`, which returns `originalResponse`). + +## Tests +- **Core:** existing 5 `decode_header_value` unit tests unchanged (Python path). +- **JS** (`basics.test.ts` + `mock.server.ts`): existing latin-1 test (#434) kept; my previous + UTF-8-string test replaced with a `rawHeaders` test — asserts the string form is latin-1 + (mojibake), `rawHeaders` yields the exact UTF-8 bytes, and `Buffer.from(latin1,'latin1')` + round-trips to those bytes. +- **Python** (`response_test.py`): new `test_response_raw_headers` asserting `(bytes, bytes)` + shape and exact UTF-8 bytes for a non-ASCII value. + +## Oracle — green (what it can cover) +- `rustfmt --check` on all four touched Rust files: CLEAN. +- `rustc --test` core unit tests: 5/5. `rustdoc --test`: 1/1. + +## NOT verifiable in this environment — must be confirmed by CI (disclosed at the design gate) +- **Binding compilation.** napi (`Uint8Array::from(Vec)`, `Vec<(String, Uint8Array)>` getter + return) and pyo3 (`PyBytes::new`, `Vec<(Bound, Bound)>` getter) glue cannot + be compiled here — the `github.com/apify/h2` git dep is egress-blocked (403). These follow the + existing patterns in each crate but are UNVERIFIED against the compiler. +- **napi `index.d.ts` regeneration** for the new `rawHeaders` getter happens at `napi build` in + CI; the committed `.d.ts` is intentionally not hand-edited. +- **JS/Python test execution** needs the built native module (napi/maturin) — CI only. +- Highest-risk specifics to watch in CI: the exact `Uint8Array` constructor, tuple→array + ToNapiValue, and the pyo3 `Bound` tuple return. diff --git a/impit-node/src/response.rs b/impit-node/src/response.rs index 6556e04a..2702cca9 100644 --- a/impit-node/src/response.rs +++ b/impit-node/src/response.rs @@ -1,10 +1,11 @@ #![allow(clippy::await_holding_refcell_ref, deprecated)] use crate::abortable_stream::AbortableStream; -use impit::utils::{decode, decode_header_value, ContentType}; +use impit::utils::{decode, ContentType}; use napi::bindgen_prelude::JsObjectValue; use napi::{ bindgen_prelude::{ BufferSlice, FromNapiValue, Function, Object, ReadableStream, Result, This, ToNapiValue, + Uint8Array, }, sys, Env, JsValue, Unknown, }; @@ -61,6 +62,9 @@ pub struct ImpitResponse { /// /// In case of redirects, this will be the final URL after all redirects have been followed. pub url: String, + // Raw, undecoded header name/value byte pairs, in wire order (duplicates preserved). + // Exposed to JS through the `rawHeaders` getter. + raw_header_pairs: Vec<(String, Vec)>, // Shared sender used to immediately signal abort to the JS ReadableStream without polling. abort_receiver: Arc>>>, abort_sender: Arc>>>, @@ -89,9 +93,17 @@ impl<'env> ImpitResponse { .canonical_reason() .unwrap_or("") .to_string(); + // JS Fetch semantics: header values are decoded as ISO-8859-1 (each byte 0x00..=0xFF maps to + // the code point U+0000..=U+00FF). This keeps the string form byte-recoverable via + // `Buffer.from(value, 'latin1')`; callers needing exact UTF-8 use the `rawHeaders` accessor. let mut headers_vec: Vec<(String, String)> = Vec::new(); + let mut raw_header_pairs: Vec<(String, Vec)> = Vec::new(); for (k, v) in response.headers().iter() { - headers_vec.push((k.as_str().to_string(), decode_header_value(v.as_bytes()))); + headers_vec.push(( + k.as_str().to_string(), + v.as_bytes().iter().map(|&b| b as char).collect(), + )); + raw_header_pairs.push((k.as_str().to_string(), v.as_bytes().to_vec())); } let headers = Headers(headers_vec); let ok = response.status().is_success(); @@ -104,11 +116,40 @@ impl<'env> ImpitResponse { headers, ok, url, + raw_header_pairs, abort_receiver: Arc::new(tokio::sync::Mutex::new(None)), abort_sender: Arc::new(tokio::sync::Mutex::new(None)), }) } + /// Raw, undecoded response header values as `[name, bytes]` pairs, in the order the server + /// sent them (duplicate header names preserved). + /// + /// Unlike {@link headers}, whose values are decoded as ISO-8859-1 strings (matching the Fetch + /// API), this exposes the exact bytes received on the wire. Use it when a header carries UTF-8 + /// (e.g. a `Content-Disposition` filename) or when verifying a header signature/HMAC, where the + /// precise bytes matter: + /// + /// @example + /// ```ts + /// const [, raw] = response.rawHeaders.find(([k]) => k.toLowerCase() === 'content-disposition'); + /// const value = new TextDecoder('utf-8').decode(raw); + /// ``` + /// + /// This is an impit extension; the standard Fetch `Response` has no raw-header accessor. + #[napi( + getter, + js_name = "rawHeaders", + ts_return_type = "Array<[string, Uint8Array]>" + )] + pub fn raw_headers(&self) -> Vec<(String, Uint8Array)> { + self + .raw_header_pairs + .iter() + .map(|(name, value)| (name.clone(), Uint8Array::from(value.clone()))) + .collect() + } + fn get_inner_response(&self, env: &Env, mut this: This) -> Result> { let cached_response = this.get::(INNER_RESPONSE_PROPERTY_NAME)?; diff --git a/impit-node/test/basics.test.ts b/impit-node/test/basics.test.ts index 17b51247..8e561a15 100644 --- a/impit-node/test/basics.test.ts +++ b/impit-node/test/basics.test.ts @@ -571,9 +571,22 @@ describe.each([ t.expect(response.headers.get('x-non-ascii')).toBe(routes.nonAsciiHeader.headerValue); }); - test('UTF-8 header values are decoded as UTF-8', async (t) => { + test('raw header bytes preserve the exact wire value while the string stays ISO-8859-1 (Fetch-style)', async (t) => { const response = await impit.fetch(new URL(routes.utf8Header.path, "http://127.0.0.1:3001").href); - t.expect(response.headers.get('x-utf8')).toBe(routes.utf8Header.headerValue); + + // Fetch semantics: the string form is ISO-8859-1, so a UTF-8 value reads back as mojibake. + const latin1 = response.headers.get('x-utf8'); + t.expect(latin1).not.toBe(routes.utf8Header.headerValue); + + // rawHeaders exposes the exact wire bytes, which decode to the real UTF-8 value. + const rawHeaders = (response as unknown as { rawHeaders: Array<[string, Uint8Array]> }).rawHeaders; + const rawPair = rawHeaders.find(([k]) => k.toLowerCase() === 'x-utf8'); + t.expect(rawPair).toBeDefined(); + const rawBytes = Buffer.from(rawPair![1]); + t.expect(rawBytes.toString('utf8')).toBe(routes.utf8Header.headerValue); + + // The ISO-8859-1 string also round-trips back to those exact bytes (the standard Fetch workaround). + t.expect(Buffer.from(latin1!, 'latin1').equals(rawBytes)).toBe(true); }); test('.json() method works', async (t) => { diff --git a/impit-python/src/response.rs b/impit-python/src/response.rs index f13265f2..990209ba 100644 --- a/impit-python/src/response.rs +++ b/impit-python/src/response.rs @@ -10,6 +10,7 @@ use impit::{ utils::{decode_header_value, ContentType}, }; use pyo3::prelude::*; +use pyo3::types::PyBytes; use reqwest::{Response, StatusCode, Version}; use std::pin::Pin; @@ -226,6 +227,9 @@ pub struct ImpitPyResponse { content: Option>, inner: Option, inner_state: InnerResponseState, + // Raw, undecoded header name/value byte pairs, in wire order (duplicates preserved). + // Exposed to Python through the `raw_headers` getter (httpx `Headers.raw` equivalent). + raw_headers: Vec<(Vec, Vec)>, } #[pymethods] @@ -241,6 +245,12 @@ impl ImpitPyResponse { ) -> Self { let headers = headers.unwrap_or_default(); + // No wire bytes for a manually constructed response; use the UTF-8 bytes of the strings. + let raw_headers: Vec<(Vec, Vec)> = headers + .iter() + .map(|(k, v)| (k.clone().into_bytes(), v.clone().into_bytes())) + .collect(); + let encoding = match headers .iter() .find(|(k, _)| k.to_lowercase() == "content-type") @@ -270,6 +280,7 @@ impl ImpitPyResponse { content: Some(content.unwrap_or_default()), inner: None, inner_state: InnerResponseState::Read, + raw_headers, } } @@ -442,6 +453,19 @@ impl ImpitPyResponse { Ok(()) } + /// Raw, undecoded header name/value pairs as `(bytes, bytes)`, in the order the server sent + /// them (duplicate names preserved). Equivalent to httpx's `Response.headers.raw`. + /// + /// Unlike `headers` (str values decoded UTF-8-first), this returns the exact wire bytes, for + /// callers that need them - e.g. verifying a header signature/HMAC. + #[getter] + fn raw_headers<'py>(&self, py: Python<'py>) -> Vec<(Bound<'py, PyBytes>, Bound<'py, PyBytes>)> { + self.raw_headers + .iter() + .map(|(name, value)| (PyBytes::new(py, name), PyBytes::new(py, value))) + .collect() + } + #[getter] fn content(&mut self, py: Python<'_>) -> PyResult> { self.read(py) @@ -539,11 +563,18 @@ impl ImpitPyResponse { _ => "Unknown".to_string(), }; let is_redirect = val.status().is_redirection(); + // Python/httpx semantics: decode header values UTF-8-first with an ISO-8859-1 fallback. let headers = HashMap::from_iter( val.headers() .iter() .map(|(k, v)| (k.as_str().to_string(), decode_header_value(v.as_bytes()))), ); + // Exact wire bytes for callers that need them (httpx `Headers.raw` equivalent). + let raw_headers: Vec<(Vec, Vec)> = val + .headers() + .iter() + .map(|(k, v)| (k.as_str().as_bytes().to_vec(), v.as_bytes().to_vec())) + .collect(); let content_type_charset = headers .get("content-type") @@ -599,6 +630,7 @@ impl ImpitPyResponse { is_stream_consumed, inner_state, inner, + raw_headers, }) } } diff --git a/impit-python/test/response_test.py b/impit-python/test/response_test.py index ba09610f..a60dd8fb 100644 --- a/impit-python/test/response_test.py +++ b/impit-python/test/response_test.py @@ -40,6 +40,19 @@ def test_response_constructor_with_headers() -> None: assert response.headers['Content-Type'] == 'application/json' +def test_response_raw_headers() -> None: + # raw_headers exposes header name/value pairs as exact bytes (httpx Headers.raw equivalent). + response = Response(200, headers={'Content-Type': 'application/json', 'X-Unicode': 'naïve'}) + + raw = response.raw_headers + + assert isinstance(raw, list) + assert all(isinstance(k, bytes) and isinstance(v, bytes) for k, v in raw) + assert (b'Content-Type', b'application/json') in raw + # A non-ASCII value is preserved as its exact UTF-8 bytes. + assert (b'X-Unicode', 'naïve'.encode('utf-8')) in raw + + def test_response_headers_encoding() -> None: response = Response( 200, headers={'Content-Type': 'text/plain; charset=cp1250'}, content=b'\x9e\x64\xe1\xf8\x65\x6e\xed' From 1b523154191b5b207b7614dca48d352f9b295500 Mon Sep 17 00:00:00 2001 From: Claude Date: Wed, 1 Jul 2026 09:34:06 +0000 Subject: [PATCH 09/20] test: cover Python from_async header decode + raw_headers over a real socket Adds a wire-level integration test (raw-socket server) asserting httpx-style UTF-8 decoding, ISO-8859-1 fallback, and exact raw_headers bytes on the real fetch path, mirroring the JS mock-server test. Also records the ecosystem consistency references for the PR description. Co-Authored-By: Claude Opus 4.8 Claude-Session: https://claude.ai/code/session_01VrUiE5CzcJ9TiRTqvqb1JE --- .devforge/_state.json | 2 +- .devforge/config.json | 5 +- .devforge/iter-1-rev2/review-staff-review.md | 18 ++++++ .devforge/iter-2-rev2/claim.md | 22 +++++++ .devforge/pr-ecosystem-section.md | 59 +++++++++++++++++++ impit-python/test/async_client_test.py | 56 ++++++++++++++++++ impit-python/test/response_test.py | 2 +- napi-0.2.1.zip | Bin 0 -> 11484 bytes 8 files changed, 161 insertions(+), 3 deletions(-) create mode 100644 .devforge/iter-1-rev2/review-staff-review.md create mode 100644 .devforge/iter-2-rev2/claim.md create mode 100644 .devforge/pr-ecosystem-section.md create mode 100644 napi-0.2.1.zip diff --git a/.devforge/_state.json b/.devforge/_state.json index 2c747cf5..a30758df 100644 --- a/.devforge/_state.json +++ b/.devforge/_state.json @@ -1 +1 @@ -{"phase":"inner-loop","iteration":1,"head_sha":"9d2204fec0b3ac1acf3c42b3e0c86463e09e8f49","panel":{"tier":"large","reviewers":[{"use":"staff-review","model":"sonnet"}],"final_reviewers":[{"use":"thermonuclear","model":"sonnet"},{"use":"code-review","model":"sonnet"}],"inner_iterations":3,"final_review_rounds":2}} +{"phase":"inner-loop","iteration":2,"head_sha":"9d2204fec0b3ac1acf3c42b3e0c86463e09e8f49","panel":{"tier":"large","reviewers":[{"use":"staff-review","model":"sonnet"}],"final_reviewers":[{"use":"thermonuclear","model":"sonnet"},{"use":"code-review","model":"sonnet"}],"inner_iterations":3,"final_review_rounds":2}} diff --git a/.devforge/config.json b/.devforge/config.json index 35073bfc..5f94129e 100644 --- a/.devforge/config.json +++ b/.devforge/config.json @@ -15,7 +15,10 @@ "commands": [ "rustfmt --edition 2021 --check /home/user/impit/impit/src/response_parsing/mod.rs /home/user/impit/impit/src/lib.rs /home/user/impit/impit-node/src/response.rs /home/user/impit/impit-python/src/response.rs", "rustc --edition 2021 --test /home/user/impit/.devforge/oracle_header_decode.rs -o /home/user/impit/.devforge/oracle_header_decode && /home/user/impit/.devforge/oracle_header_decode", - "rustdoc --edition 2021 --test /home/user/impit/.devforge/oracle_header_decode.rs" + "rustdoc --edition 2021 --test /home/user/impit/.devforge/oracle_header_decode.rs", + "ruff check /home/user/impit/impit-python/test/response_test.py /home/user/impit/impit-python/test/async_client_test.py", + "ruff format --check /home/user/impit/impit-python/test/response_test.py /home/user/impit/impit-python/test/async_client_test.py", + "python3 -m py_compile /home/user/impit/impit-python/test/response_test.py /home/user/impit/impit-python/test/async_client_test.py" ], "note": "Full-workspace `cargo test` cannot run here: the pinned git dependency github.com/apify/h2 returns 403 through the org egress proxy and its cargo cache is empty. The oracle instead compiles+runs a standalone rustc --test copy of the pure decode_header_value helper, verifying the exact algorithm (UTF-8-first with latin-1 fallback) independent of h2. Binding compile/integration must be confirmed in CI where github egress is permitted." }, diff --git a/.devforge/iter-1-rev2/review-staff-review.md b/.devforge/iter-1-rev2/review-staff-review.md new file mode 100644 index 00000000..f9e67b51 --- /dev/null +++ b/.devforge/iter-1-rev2/review-staff-review.md @@ -0,0 +1,18 @@ +VERDICT: FAIL + +## Findings + +### major: Python has no test exercising the real-fetch path (`from_async`) for either UTF-8 header decoding or `raw_headers` + +- File: `impit-python/test/response_test.py:43-53` (the only new Python test, `test_response_raw_headers`) +- File: `impit-python/src/response.rs:545-635` (`from_async`, the constructor actually used for real HTTP responses; wires `decode_header_value` at line 570 and populates `raw_headers` at lines 573-577) + +`test_response_raw_headers` builds its `Response` via the `#[new]` constructor (`impit-python/src/response.rs:237-285`), which never calls `decode_header_value` and derives `raw_headers` by simply re-encoding the Python string headers as UTF-8 (`impit-python/src/response.rs:249-252`). That path can't fail: any Python `str` header value round-trips through `.encode('utf-8')` trivially, so the test cannot detect a bug in `decode_header_value`'s UTF-8-first/ISO-8859-1-fallback logic, nor in `from_async`'s wiring of `headers`/`raw_headers` from real `reqwest::HeaderMap` bytes, nor an order/duplicate mismatch between the two collections built by two separate `val.headers().iter()` passes (lines 567-571 and 573-577). + +This matters because issue #479 and the design are specifically about *real HTTP responses* carrying UTF-8 header bytes — the `new()` constructor path is not where the bug lived. Concretely, if a regression were introduced in `from_async` (e.g. a typo mapping `raw_headers` from `k.as_str()` on the wrong header, or `decode_header_value` never actually being called on the live path), the added test suite would not catch it; only the pre-existing `test_response_headers_encoding`/ASCII-header tests exercise `from_async`, and none of them use non-ASCII or UTF-8 header bytes. + +The JS side, by contrast, does exercise the equivalent real-fetch path end-to-end: `impit-node/test/basics.test.ts:574-590` fetches through `impit.fetch(...)` (going through `try_from_response`, the wrapper, and the `rawHeaders` getter) against a raw-socket mock server route (`impit-node/test/mock.server.ts:124-138`) that writes literal UTF-8 bytes on the wire, then asserts both the latin-1 string mojibake and the exact `rawHeaders` bytes. + +Python test infrastructure for this already exists and is used elsewhere in the same manner needed here: `impit-python/test/async_client_test.py:16-46` defines raw-socket servers (`thread_server`, `truncating_server`) that hand-craft an HTTP response header block and are exercised via `AsyncClient`/`Client` — i.e., through `from_async`. The acceptance doc (`_verified_task.md` item 4: "Python UTF-8 decode + raw bytes exact") and the design (`2-design.md`: "Python — UTF-8 decodes correctly + `raw_headers` returns exact bytes") both call for this; the change as submitted only satisfies it for the constructor path, not the fetch path, leaving the actually-fixed behavior (issue #479) unverified by any Python test. + +**Fix scope**: add one raw-socket-based Python test (mirroring `thread_server`/`truncating_server`) that sends a header value with UTF-8 bytes (e.g. `naïve.pdf`) over `AsyncClient`/`Client`, and asserts (a) `response.headers[...]` decodes to the correct UTF-8 string and (b) `response.raw_headers` contains the exact wire bytes for that header, in order, matching what `headers` decoded. diff --git a/.devforge/iter-2-rev2/claim.md b/.devforge/iter-2-rev2/claim.md new file mode 100644 index 00000000..a3ab9b7d --- /dev/null +++ b/.devforge/iter-2-rev2/claim.md @@ -0,0 +1,22 @@ +# Iteration 2 (rev 2) — implementer claim + +## Addressed the iter-1-rev2 review finding (major) +- **Python `raw_headers`/decode only tested via the manual `new` constructor, not the real + `from_async` fetch path** — FIXED. Added a wire-level integration test + (`test_header_value_decoding_and_raw_bytes` in `async_client_test.py`) using a new raw-socket + `header_encoding_server`, mirroring the JS mock-server approach. It sends a UTF-8 header value + (`X-Utf8`) and a lone `0xE4` latin-1 byte (`X-Latin1`), then asserts: + - `response.headers['x-utf8']` decodes correctly as UTF-8 (httpx path, exercises + `decode_header_value` on a real response), + - `response.headers['x-latin1'] == 'März'` (latin-1 fallback), + - `response.raw_headers` yields the exact wire bytes for both. + +## Oracle — green (extended this iteration to cover Python) +- Rust: `rustfmt --check` CLEAN; `rustc --test` 5/5; `rustdoc --test` 1/1. +- Python (new): `ruff check` clean; `ruff format --check` clean; `py_compile` OK. (ruff caught a + real `UP012` `.encode('utf-8')` lint on the first pass — fixed.) + +## Still CI-gated (unchanged, disclosed at design gate) +- Binding compilation (napi/pyo3) and execution of the JS/Python tests need the built native + module; the `github.com/apify/h2` git dep is egress-blocked here. The new Python test's + behavior is verified by CI, not locally. diff --git a/.devforge/pr-ecosystem-section.md b/.devforge/pr-ecosystem-section.md new file mode 100644 index 00000000..75ed0a8c --- /dev/null +++ b/.devforge/pr-ecosystem-section.md @@ -0,0 +1,59 @@ +## Consistency with ecosystem + +impit's two bindings each emulate a reference client, so header decoding is deliberately +**asymmetric** — and each side matches its reference exactly. Both bindings additionally expose +the raw header bytes, following the byte-access pattern each ecosystem already relies on. + +### Python — matches `httpx` (which impit-python implements) + +impit-python advertises the httpx interface ("drop-in replacement for `httpx.AsyncClient`"), and +httpx decodes header values **UTF-8-first with an ISO-8859-1 fallback** — exactly what this PR +does via the shared `decode_header_value` helper: + +- httpx `Headers.encoding` tries `ascii`, then `utf-8`, then falls back to `iso-8859-1`: + [`httpx/_models.py` @ v0.28.1, `encoding` property](https://github.com/encode/httpx/blob/0.28.1/httpx/_models.py#L125-L145) + — *"Header encoding is mandated as ascii, but we allow fallbacks to utf-8 or iso-8859-1."* +- httpx exposes raw bytes via `Headers.raw: list[tuple[bytes, bytes]]`: + [`httpx/_models.py` @ v0.28.1, `raw` property](https://github.com/encode/httpx/blob/0.28.1/httpx/_models.py#L152-L156). + Our new `Response.raw_headers` returns the same `list[tuple[bytes, bytes]]` shape. + +So Python callers get the same decoded strings *and* the same raw-bytes escape hatch they would +from httpx. + +### JavaScript — matches the Fetch API / undici (which impit-node implements) + +impit-node is "API-compatible with the Fetch API `Response`". In Fetch, header values are a +**byte sequence** exposed to JS as a `ByteString`, i.e. via **isomorphic decode** — each byte +`0x00–0xFF` maps to the code point of equal value (ISO-8859-1). This PR keeps impit-node on that +exact behavior (`b as char`): + +- Fetch Standard: a header value is a [byte sequence](https://fetch.spec.whatwg.org/#concept-header-value), + and the `Headers` interface types names/values as + [`ByteString`](https://fetch.spec.whatwg.org/#headers-class) (`ByteString get(ByteString name)`). +- [WebIDL `ByteString`](https://webidl.spec.whatwg.org/#idl-ByteString) is the isomorphic + (byte ↔ code-point) mapping — i.e. ISO-8859-1. +- undici (Node's `fetch`) implements exactly this: [nodejs/undici#1560 "ByteString checks & + conversion in Headers"](https://github.com/nodejs/undici/pull/1560) and + [#1317](https://github.com/nodejs/undici/issues/1317) confirm header values are handled as + Latin-1 `ByteString`s. +- Node's core `http` parser likewise decodes header values as `latin1`/`binary` + ([nodejs/node#17390](https://github.com/nodejs/node/issues/17390), + [#58240](https://github.com/nodejs/node/issues/58240)); **axios** inherits this because its + Node adapter reads `http.IncomingMessage` headers and its browser adapter reads + `XMLHttpRequest`/Fetch headers. + +Because ISO-8859-1 is isomorphic, the JS string stays **byte-recoverable** — the standard Fetch +workaround `Buffer.from(value, 'latin1')` (or `Uint8Array.from(value, c => c.charCodeAt(0))` in +the browser) reproduces the exact wire bytes, so a UTF-8 header can be recovered with +`Buffer.from(value, 'latin1').toString('utf8')`. + +The Fetch `Headers` interface has **no** raw-byte accessor, so `response.rawHeaders` +(`Array<[string, Uint8Array]>`) is an explicit impit extension. It's justified because +signature/HMAC callers need the exact bytes without the manual round-trip, and it mirrors the +byte-pair access httpx already offers on the Python side. + +### Net effect on #479 + +- **Python**: fully fixed — UTF-8 header values decode correctly (httpx behavior). +- **JavaScript**: string values remain ISO-8859-1 **by design** (Fetch parity, byte-recoverable); + callers needing the decoded UTF-8 value read `response.rawHeaders` and decode with `TextDecoder`. diff --git a/impit-python/test/async_client_test.py b/impit-python/test/async_client_test.py index 8a0b11b5..6ea26e5b 100644 --- a/impit-python/test/async_client_test.py +++ b/impit-python/test/async_client_test.py @@ -46,6 +46,39 @@ def truncating_server(port_holder: list[int]) -> None: server.close() +def header_encoding_server(port_holder: list[int]) -> None: + """Send a response carrying a UTF-8 header value and a lone ISO-8859-1 byte.""" + server = socket.socket(socket.AF_INET6, socket.SOCK_STREAM) + server.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) + server.setsockopt(socket.IPPROTO_IPV6, socket.IPV6_V6ONLY, 0) + server.bind(('::', 0)) + port_holder[0] = server.getsockname()[1] + server.listen(1) + + conn, _ = server.accept() + conn.recv(1024) + body = b'ok' + response = b''.join( + [ + b'HTTP/1.1 200 OK\r\n', + b'Content-Type: text/plain\r\n', + b'X-Utf8: ', + 'attachment; filename="naïve.pdf"'.encode(), + b'\r\n', + b'X-Latin1: M', + bytes([0xE4]), # 'a' with diaeresis in ISO-8859-1; not valid UTF-8 on its own + b'rz\r\n', + b'Content-Length: ', + str(len(body)).encode(), + b'\r\n\r\n', + body, + ] + ) + conn.send(response) + conn.close() + server.close() + + @pytest.mark.asyncio @pytest.mark.parametrize( ('browser', 'ja4'), @@ -425,6 +458,29 @@ async def test_local_address(self, browser: Browser, addresses: tuple[str, str]) assert response.status_code == 200 thread.join() + @pytest.mark.asyncio + async def test_header_value_decoding_and_raw_bytes(self, browser: Browser) -> None: + port_holder = [0] + thread = threading.Thread(target=header_encoding_server, args=(port_holder,)) + thread.start() + await asyncio.sleep(0.1) + + impit = AsyncClient(browser=browser) + response = await impit.get(f'http://127.0.0.1:{port_holder[0]}/', timeout=5) + thread.join() + + utf8_value = 'attachment; filename="naïve.pdf"' + + # Python follows httpx semantics: a UTF-8 header value decodes correctly as str... + assert response.headers['x-utf8'] == utf8_value + # ...and a lone non-UTF-8 byte falls back to ISO-8859-1. + assert response.headers['x-latin1'] == 'März' + + # raw_headers exposes the exact wire bytes (httpx Headers.raw equivalent). + raw = dict(response.raw_headers) + assert raw[b'x-utf8'] == utf8_value.encode('utf-8') + assert raw[b'x-latin1'] == b'M' + bytes([0xE4]) + b'rz' + @pytest.mark.parametrize( ('browser'), diff --git a/impit-python/test/response_test.py b/impit-python/test/response_test.py index a60dd8fb..a72af162 100644 --- a/impit-python/test/response_test.py +++ b/impit-python/test/response_test.py @@ -50,7 +50,7 @@ def test_response_raw_headers() -> None: assert all(isinstance(k, bytes) and isinstance(v, bytes) for k, v in raw) assert (b'Content-Type', b'application/json') in raw # A non-ASCII value is preserved as its exact UTF-8 bytes. - assert (b'X-Unicode', 'naïve'.encode('utf-8')) in raw + assert (b'X-Unicode', 'naïve'.encode()) in raw def test_response_headers_encoding() -> None: diff --git a/napi-0.2.1.zip b/napi-0.2.1.zip new file mode 100644 index 0000000000000000000000000000000000000000..8cfe868c5f5b678a109f8d17291f2d9517ad7959 GIT binary patch literal 11484 zcmZ{qV|1kJ+N~?-*fu&&I<}p3l8$ZL72CEuX2(Xy>e#j_wma78z0dg0TKn7U)cd1G z)sJhuKjt%^d)#v>$Us110000O08+EP=x;nLT@P>ofEf}1!20vAouPvzJre^912f|{ zDN%7bWpM^)cV}H?nYc_QlrFMsVhL_ribkIePp6(04GJ7w^Em?8@1Rr=3~$W+mRqQa zTT@E>bVt_^Lea%5%Zf2EDE|p^tIPW>2XHD+*;D_n9OsB3`6FF!(d$&AmdKMey$_zC z#1o-dHZL<8Y)N=-X>NubgE^R+5(}KNPBL2S!^oj%(9pTT33QUAwf? zEv5<;MQce%&J4kY$wo5**DcGk4@|&kLhglJg z^UaDZXu|&&%CCaoD;Moutr7FNmSx*rl=A7;mvY3N8Y=U;#i&vfeDgdyYvWIAYym9u zTdjr;>cFLzx+as+kGww1FN@F*OXI$F>O{Iit(=(7p|>xNmlRt|eE?{uEj$+OR|M(@ zE6dNH1VGRRr(lzI88jcw&s;JZPNjR&m*00tXsJ`Yv2dphdTFDvVb6l~#;Tys6e_2^ z;^_q2dyjcfX17q8# z+BO(ZFV65=Afhm3`cr;&5jYfuvR2Vr77cZCc`~LX814>Mfd);{Hoo4GWYGY@l{rG5 zRs$bZHJ6b#Ti(~tMfK}+*Nuy{sUCyOt$U?0^iICJBKnq-{geUGw#(?bCFH;I_)pql zgEumiV{vc=%CPE5sLH-H{$dg?l7A;9j`1i%R?frtts7YF6PS%1j%7Mg;M1?E=vMQ>+RylJrt?67p55 zn)c|-D86yYA57t*wkbBN7y*q1fg;4={^XzctS zngc+}iZ4cC4bW`AQr1I`+mA-S!I+7*1X4iW!!{5ODEGs7~D4)y5s;zA+IdRAOe4j7G zWdOH4KZCm-M3-XT45I)1#9mDU1|VP{3{p5&@CXqB z>~V38p`{7^=K^{YKDcMi6pwx=aH9kTmbCAX`?gNj^3t<)lO~2lP+sE0}_29MU8UU@$YWU|I~s!*%>~He}H4W5ls}-#+#{ zJrP+eC`w6WTPL57z=G1291!d$OMSoXyI-B4Z1&Ju1tKum_CR@K@b%8M%|olBP6tSS zRP)9^&4N}`PInySGrEoNZMD2xbtsubmUr51phi8^FcOx|tsX9#$IeP=4^Y1}bvo9P zJNmaV=jV1@^jOZul_hC6jxfA9n`Z-Z0#KpE2r-kflPk7CzRr9R)Hr;2$=1PB&0 zW~XS)(@xvPlo2h+!EJ@+h87Y#Zwp{VS0k1QSU!GYsGXK5xM8{;hn6%AOq&jQ_&v>5 z2eF@EGzQ``0mmGIeID<$TP z!I*ioJ;8qmsqymPF6@OVr0cZ}RSjaSPEp#}xY(s9V_vc9UI@i01y{4}UDhp>e$+sf zsBnIkHaR+xBt+WLmV>*(|LZsj{W(tD2j1u>U;seLpCg6&pT|i_Tv$w2oB`0Y#o9)>*kAfLg>MnbtR02ytvi`QEImMh1r zLsP=&BpM`f7Eh|5F+z_blO3HG{aThQz_@E^-FU~P#Y55%?oBYcBL&17`#mb+d=jLp zvi)9dE+JFozJg*gmKo0{RI@o;*jcxy)Bmo#dxu6>p{MJ%_ zymNro+!7&MdSwCeqxJ=4tihc8xI+VG{OY6~tN1iI-7(+R$~WvIcg_AeVY~Eza!z1|UP+izi6h zDv8}+aib>?*iJ)Io7S)Hhw&Mfa`FUQtSQ%X3WPoxdnqfdR$$3rYbN^;g;ToNPi2*P zJF;)JnMHt99l(8Rb?*=8eFyv{j(-gX3b;5aw0{PJ{Xc`j|MFTVQ)d?k1_zI9WyL?z zK=DQ16I{=kF`7UbDq6T$}ncSNdyreBD+0?r<)<0~FMQfK@bnA(CO z0dgn4Op9Q}iEqy4`UKVD)2$~!x-jYs6;=A66N6g+BrNNmBEfvDWX12bxZf|{6V*dW z0@oPhCB9)M;gdlzTez9w&guza*6W2>=GD=(DyE;Mjc!g=8|3!H>ATt;>s_BK#Co%3erF)fj;cy^ z+(ok0UuEFpoA}=3E=N$cTH6uMF`#;Yo&Js0RoqPITa0WVf_59KVt5VW?2;2mkyf1# z$(F=Lu7X;?y%FH{w43_p68F&Yr0>fB83B z;27p5a8^Jr+}AIwRj}@Ru&M|#qMpGxavJfS{(njf+akK`?vL*jLIVH<|LHsb8b2A$ zTI38hnVR4_Ub0#emmsBuLbxk8%%?q_)^yr>EN^00ARS^4|R zY8_%|y{)n(;~1%d*}?uqrbngvscHy2jJJ_&WQiOpYT6E5Y>c-FBB9-n9y$vXZle0L zCHvyo#K>Yyhd?RjIB1tv(Z@Q$tI7l&?9GP9OwhS+i%fI|4FRl9B4FTu{fv7 zQixE|IDO-~UBSGOt-z~wEz0XVnMk|>oXM!;H!sk$nsKx7j8G->Vf=YY{Fhu{n}DJ? zgU46ee_W^m2>`(T2f5f9np+zGODjLrx9qdH{yqEkMmmGv-%lS9B>+##+P)@StR`hN zFTiyegk$#f-*arJJZOyfejy=7(L3$PIO zrE`O6=j*V-SwF*_@V+YK7i5LteneX(l}3s3WMp874i~}*zG$>44P8ND8u!OnL~Bze z8E02~QJb^(H~2P?lkv{3nqdVMkU^4l_MR{*oqn_sgk3voSnugdJ%eivLE8_eh|D{f zqrqgk!=Be{fYqs7-n@y|YBJEH-n^f$tLz0p$U$n&Omx z+2YGRn~;FH{p0$a(cQF%x>cvwY4u>qC>Pwg8cxrVP(lVIEV&z86$WWb;+BI#=NElz zOb~Pnwl1Z1V_%C?B&@GO{AwH(7?@HNQjdgLA>&nESL*eVPb3(rCZmY%Gvu3Jd;J7c zlA$18f5yosC6;MAX6Sk>53(@F4Je$4SJT8IX4^<6hc>Mwi6h_{N1|;K_jN zgY&4F#-h+tQJs*8{ep#}GE$Mvw#kSU(G7PRq?r=8aTO7US)m;>muo?pNzOVgNRZmB ziq&ngXK2)~)+;O+OnL09V3V3HvE`6TTBwFz3!8;O&8yHVl5Y%~A|hf<5pL32s`cI< z86i@GzrCx4y!aKN^*|SApvm0?O=4j|V^Lr|hbAT3#VC8g((Vx%8zvkHz~|NQ*oAGR zd_!(*QAk_q+L!M(hf+QKa7glV0WY-o@YLD{BUkX#%&PLwh`|#cSV=a>A9JN0{0)<{ z)yB4{IcRAnF^{8P?86{d=8UI`C}`46z(0#Wc{brbsWou0xdUCq8DwFjGcGF5Jbhea zBlQhjMXF3ZskC#6m;8I>+o$duBAU1P*ZM{nQupYRv;#PE`AJz3TR;kl&d3T{x8PYC zXa<<~mU(6-CX}0GpOcIs)IvJrmnNBQ1`O=OOthwcX=*rEPL^=FZA4(gUPV1TI$h0T z&Ls&l?SLc45YHq(ao>j_L>s+3Q630zppWk#-~gmOyRsozwd4$^_D5;bm#!O<-^guQ z!`yd9WUT`^v1vVL9#;1@Hn|u#CpWV86IS%tVSIc#iE55peukvux@B5coa}c%8C7Dtp1h!3VkB9rM$uQzsa0)+Zk>xy zuQBOMb%6+JgZa+xb-kR(#GIF5lVb);|pKE@3g}@ds z>~9|@qVi?vt$B?5uwH!ySlaBdhI9Iay$7iCsdt2A7EWlErp*cV!#qzS|K&(O%YmXq z^XKviPyhfKJ^+C8502z)>g4<{LpsNmwO`}-O+@k$aEg@3!d{!yCKTwtjC5`};s0}a zfJsNX0NoI+Od7~bHSD*wOGXock;0?f+@}DMZ_*{-WzVywl^h%un*CD*194>)c~^Wla0DQ5E= z8ehM_-p+8N$RqpW##PkA)%*ARdCk(L-pBj>L`C!EaAG9^>M60L$fAN(~b+3GXR#L>X zQh^-ld-?)3u@{uJEAwrSmrqYjl;vsVuyRrF{-Jr&z*Y zS9FI7xw?1|rD4JolILWn8=!*Hc;>C59c*BrVQ7k_R&QG*k$&o2+K-sF6-GgSMJHG~ zR}(k@rjPu1;?huV@azEN61$FPj7031xO3tVPDZ1k3^53huf3>T92(i(jFP$(SlSH% zI4iQ_*KMf0+^b#5w%2H&a~P?Ye4)gZapfFaYP0~sW7Se*bj%f)O;lSL|9UUiSPf*a zv*G#OV&$n?pZbWDz}FL7*`pL~3P-nS5c~pt)0em|2w9Q{Yb&bVlDU!;hEFg$d8gLq zt>L#}Q>{u<%a1t@Yl7pUfBD@x#d`;;&9&(sLftx=>`p+xeG}OXau1hQMb!=`6#@qs zd>LsNcD%a+pG#Mj{P^-jkuJ!GsNG2<6nxm?R|~nqd^UKUyk=zSOQ?t<6Bklney@v? z0oaK_pO$UwuLSmaU<~q8R43F>wSwWlHH0siUk%byl-CI(Z7>~CYvn@zd@+h!!y~}} zGqDRo{Az?#!27A)xT)43Zrm{kg@zhM!lyWo4#XtK6AUIUuN*E8ia5;wFpm#ZFQvCD;PCLJ%OiX6b z!j}~{XC!;-EHs6m$zEK#7M=I!h=D}`kU3=D6G^-g4 zx5LYq1~4EkYpoM9!aIVyVbGMrcWuz>tvmOQ;qKjQPKFh&(IksKQ;uoLDEXJ@4pM}} z!B?(_nYD$Li6+uZ4IoY=(1gD6ds;e`B{oN4nlg`HVf^*d>Xe5Ud}D2DWk@=jY<5^}&RTv#joz*X=Z7{ZL`2lcmLegLvcOW1j1`Jqs`&T#CPLc0H0y`SJ>!O)OBg>*|oADo{RZ}41#rJ5Au%;fcONi%;AKcY6)#biS148=y z*F=J(QVeA}EUh>^U4GGWBWJKpVh&x#e*vEvu_}hvUB8uo# zkJL2%Tq>OL<<2hiu?L$ceY2<(W}nQhqGq@*6v1zN^S>3U#6DYpNwtiCidt$J#QfYnVEyhKf9*E>) zd1$e#ZrvygC@V8Y^=uAG#I@?4NJfWRczo|Y3?qRd2n)w!{F!9j#zs@kAC7`DLgUNZdE{->zsVvJY;HnfVo$XQ!gr0nQ zgL;wf%>TFr{r$@RuN#E+&rBQeRoWjH1pqJ?0|1Es=>`EB+Buoo18q%#|Gr4@t>e}> zZoK?op+64@CswF7bP^#koq&6F+M?=YK#u4MuIG@Bq1jfXP>2^boKF1celrU#5>u>S znt*7zFXyn1B0vheJ&5=HeKfN`iEX28+8)>tU2~HNpZX=lCQ*f@j?ol2wpYBJC9F)N zrrNL{WAHvuL5C_N|9ORyrXesY3^}n_>b|nvBWbvU_7lmM8@7awWl3aWke_1zjVB|g zDt9)^-TpSj;IhiuN!zx_H&i%0HQ6T9eLL+-&915niF(+k(=?VcC(j{9#lEH1_>PF? z5gz$Bf6w%B zPJBImyqu!fDipo-ho1&w|srty26q7Zv;!KC|itV)%L=ft<+Ev4stsA8t zuAW|*mAz?k8=*u?X{czka#1xX87qHV|@BT|b}dKnZw#!j8qA%{{U)l-Mc19Y+^ z#6sS%D40hfAF9Kts;ZEcR?CF;wA(L-+v?jMd21=r3AXqsvzi&4#|&wUkyKn$Vz3Y` z!K3qGrmwO}XBZFuWsNr`V=IV93sG{qpk7!HB`Z7iyhP0b@&2#5w^;mB$-^kF=1s>Gw>a}tYzegi4AgE#mWWKF#KFD62Jv?kU)d(9Rp7T^vjh+~R$dFB&u z-B6;W+06)@Bm0zCG$hgrmZ^_a1A@56DDRrvxp&NS4Uq0OT*j!kQl{P~N6Ex3Fc(B<>HBXsiWQ9;Pu+l1ZA^HH;tfhK z__%8#PJ<_sk$t%WeAyKIZ*&UZre|}&Q@wTfPQ5i9;A575;0Mw5ZsP!i3WS#B&ceqj zEc`NVw0^O8U=JFtob8#7awsj%F?J1?tB6Uv6_#uIDejFyfsbH|m;sS{Kd%nkUgO4a zi((H=kz&^7Ka!aHq@O-vFs3y9}H&Oq})ZBboaPhVsSE}s4Yfg$QQE2 z?aKK4q5S$lT04MxKDY&P<(Zp_Ev1>&q>31_%EgE({$@N7wk?NRZ@*)B`tn&%E(zdY zLW;-WCM)IthTGeZe;m}c%LfCS@va0$W#GlVg5j-MEdH6g_S^BAGgu+nS9Gks$WI!8 zHUPbPNaSiAZlHi?3fz}&!(l^~w|#lOl+SF96rgC9D+D%V(<+NSIAuMvuY62FSEJn2kD0X7S0O zR~b8*13NCgAhZ(mIywErJ;PqxTG~9H`Uaw6^+|eYzUS7d#zS?$P4(!%yI=oa*vI@; zh#w>X5pe;p&khv-iA@l>CP?R$L$GkYLBP``T49&PXkqv+ZhYM*?xxddT* zf9h&1@5NI=-yb8*{N~#gprfua_#}G68nztI$=M3RL3x8#E}Pyt~V`^1}=Z5mc`D&(bFuY3cIzYK*a9Y;42sl=%cL zj&WAeosZ0{s(m8p$Px&$!UU`=->BIOdQiuy3A0i3j@Bj2Ma{|K8``saKoKb`DrTp% z#d`8;HU+iv(YI)OAET_D!c<#dJ?ILf$rsVSfBRZNi9Wj3l(``lkEhO!dLZH{pc0Pt zz9nLKs>5TeUWYhn4UXAGAt_(o?7L6pTLJt8j^ zONya?cS;wPp6KL^eV);$#zZ8jpD|W=bAePb_6ZVeZ74RRl7!2)GodXtiv%rMJ+fGL zI)_fLM8~*qeF#!e<2cphrZDB$r_PfPUrf#7H&1s^{UAI~sT;5f%<|6T01r(J>cYU`!}pt^>dIOB4&LGnUdu>C%3beb_5ZEB(;Ks-B0k=zdEfyW8h4}<>3T+t??WCbj}xBZ3;G&Wyq&3 zmqPCBbw~vG-84#+$aiOVKtZ=#%C>%VOIZs5rHxs>3m4o3a%moj$#QWk54)V9f*4Z$eFz=1zd$c~KS%+;Lp(*nO&5}yl ze6E$OL6Et1tu0Un-?@f~V!sVHcjpik=a9or(7idMAb-R;4T>lzCqmQ?FXjcL-#tD* zfjS0wL>!e72h=hmBfsxP_kq_=!PA&6X*AZHP{&GX@8WbBx(LL^U9B7Ws3xJF{+RG; z+~DO#{Ig!KnyG%;fgO#W-1gpEMED49+`)afl!qA@S7R~9tgsaZI;a9CWH$FKzrpQ@ z;4~vwiJB@=g+H9mf&wQMRc9$Wkd$k%(Y8?u8_-nf<4~p~HpI>_RFx$?3avQMu;hC+ z*uzlpvR&qX+3XFn@ANfo8(RcAUQdFZgtfUhB)p&yZ|Nc{3pZp#K3Kbtx?i7+F6QeW z-z??7D{~gIc_}6yfz5Ug z?6DWL-yn^2PZeoRaG07n;x@it>u79j9~%fN_fYEMiJJ7E=dn)E??Zr(tA0vDn%lG$ zPIk`64l<%M*sY57NQfU_-OmuHp%R^F@XfWkM`|?X*0QWd4cG-e(Qi5#et^7IRbxOn zLwr45M18)i2t(#|W-R7KVZ}T?JR;J((k6|Y3TS>?U99**F63Rh2 zYlAe=h^DtZdf6K-qHx#vBq67;nD7Nx;r0^V_tgW*>5s#4_z8`eCz+*@6g= zJav^i9FTRT-BUBC*ICH;dKhkY78aQd8&kXYitJXm6XjUa`r>wWk7ZLZguuy~E3`ew zIN2|rd@ZJy-zG#p&}B(X99&F=*hM^^>-?=1-u)DK<%juc2^dGET8$M)x*hdKzXIqq z$c)_h9CIZP<_DAj=+Y9kNHq?c%Xy-gcZSO6(0DQ02DO-_>Lf?i;&{)3hBLj6-`;X7 zN;6kMaIW!)zt_?A(0=6zujKRESf7RlVp!JoNPj{Myi_&z66E7$cc&b&Wwb~<;_QQ_=rfFoI4R);@8DWoXkPTFuuY^C`SXmEi*8_OnPY_}a0}n~ zclwFN*>n8dlHLB0oIyJ~F?Wb6kHcP|W?X)!w|vdcq(N^)AslYld3qQzp$hj?;t)h%bOG-7gf?=+u=6;x==;Sd;G9z zyC}t&HT>DwAo z(L+6Q!Ix2^h83a1bE0TxIO?Q3)J;TJ?}5-v1LUq=D<)j;gtjl$?Ln4Go~5UqD#g{} z5*#%9v;h?*y*H4OOD##LIgV8NOyslDGRQN(js)E=!>b1suqI+}cDV26NAU@XcmuK6 z15jPMGcuBDDOED)Z{%+&x!QxR13`EK;APZ?5bDuYU_ z;O6%8ik2=tR+N~r%0lQal`ib?uPCWJrX;^#pP7-__V}N<`TUa%FOo%0wXAfP<{u&O zhlWynzGWO<3BNT+y8B_#4AzqN!E1&qPyP}JmvQVxo3_9O_m0X*jR`xdsAQeCtSdCN z0KyR|xQz)P-f1wrDZSl&-NpDW=PH!~TC$%MGa@IOTC}m#Ch7AW5usEqW;11$mIxUXuv|cB7Usfr+X+(*O}Z*jF0Oj`$$yj_qK8~T#6Xlr z_4z~aj0?E!sYbz3zpq-Yhb1Kyr;Yj9(^I$c;B)wlp`Q@*BZEFD4%qJ#Ri{=R z$7}sq8he6P1dbykya+lGQ{#5jW4%(=s$Lgjc=l!Ab@a-;mFF{~7z5YGcSf~i{k`IK zI`mZ%UyO-Ad*;fm#`GFmK?V#Q1MvTEl>X7-zdt_%{`K|G{nEcP{;BBqe+dA<&w#2w z^8Op+e` Date: Wed, 1 Jul 2026 09:37:22 +0000 Subject: [PATCH 10/20] chore(devforge): enter final review (inner loop converged, rev 2) Co-Authored-By: Claude Opus 4.8 Claude-Session: https://claude.ai/code/session_01VrUiE5CzcJ9TiRTqvqb1JE --- .devforge/_state.json | 2 +- .devforge/iter-2-rev2/review-staff-review.md | 73 ++++++++++++++++++++ 2 files changed, 74 insertions(+), 1 deletion(-) create mode 100644 .devforge/iter-2-rev2/review-staff-review.md diff --git a/.devforge/_state.json b/.devforge/_state.json index a30758df..4744fdb1 100644 --- a/.devforge/_state.json +++ b/.devforge/_state.json @@ -1 +1 @@ -{"phase":"inner-loop","iteration":2,"head_sha":"9d2204fec0b3ac1acf3c42b3e0c86463e09e8f49","panel":{"tier":"large","reviewers":[{"use":"staff-review","model":"sonnet"}],"final_reviewers":[{"use":"thermonuclear","model":"sonnet"},{"use":"code-review","model":"sonnet"}],"inner_iterations":3,"final_review_rounds":2}} +{"phase":"final-review","iteration":2,"head_sha":"9d2204fec0b3ac1acf3c42b3e0c86463e09e8f49","panel":{"tier":"large","reviewers":[{"use":"staff-review","model":"sonnet"}],"final_reviewers":[{"use":"thermonuclear","model":"sonnet"},{"use":"code-review","model":"sonnet"}],"inner_iterations":3,"final_review_rounds":2}} diff --git a/.devforge/iter-2-rev2/review-staff-review.md b/.devforge/iter-2-rev2/review-staff-review.md new file mode 100644 index 00000000..58833fee --- /dev/null +++ b/.devforge/iter-2-rev2/review-staff-review.md @@ -0,0 +1,73 @@ +VERDICT: PASS + +## Scope of this round's diff (verified independently) + +Diffed `iter-2-rev2/diff.patch` against the correct prior-round baseline +(`iter-1-rev2/diff.patch` — the design-rev-2 implementation that received the major finding; +`iter-2`/`iter-3` belong to the earlier, abandoned design-rev-1 track and are not the right +baseline). Result: + +- `impit-node/src/response.rs`, `impit-node/test/basics.test.ts`, `impit-node/test/mock.server.ts`, + `impit-python/src/response.rs`, `impit/src/lib.rs`, `impit/src/response_parsing/mod.rs` — **byte- + identical** to the prior round. No regression risk introduced. +- `impit-python/test/response_test.py` — one line changed: `'naïve'.encode('utf-8')` → + `'naïve'.encode()`. Behaviorally identical (`str.encode()` defaults to UTF-8); cosmetic only. +- `impit-python/test/async_client_test.py` — new `header_encoding_server` helper and new test + `test_header_value_decoding_and_raw_bytes`, purely additive (file didn't exist in the prior + round's diff in this form; no existing test modified). + +This matches the expected shape: the fix is scoped exactly to adding the missing Python +integration test, with no incidental changes to the asymmetric-decode/raw-accessor logic itself. + +## Verification of the new test + +- **Real wire path, not the `#[new]` shortcut**: `test_header_value_decoding_and_raw_bytes` uses + `AsyncClient(browser=browser).get(...)` against a real `socket`-based server + (`header_encoding_server`), landing in the `From` conversion in + `impit-python/src/response.rs:566-577` (the code path with `decode_header_value` and the + `raw_headers` Vec built from `val.headers()`), not the manually-constructed `Response(...)` + path. This is the exact gap the prior review flagged, and it's now closed — it mirrors the + existing JS wire-level pattern (`utf8Header` route in `mock.server.ts` + `basics.test.ts`). +- **Hand-built HTTP/1.1 response is well-formed**: reconstructed the exact byte sequence locally + and confirmed `Content-Length: 2` matches the 2-byte body `ok`, headers are correctly + `\r\n`-terminated, and the header/body boundary (`\r\n\r\n`) is correct. Verified via a live + IPv4 socket round-trip that a client reading this stream would parse it exactly as intended: + `X-Utf8: attachment; filename="naïve.pdf"` (0xC3 0xAF for `ï`) and `X-Latin1: Märtz`-style value + carrying a lone `0xE4` byte, matching PR #434's existing regression scenario. +- **Header-name case handled correctly**: the `From` conversion builds both the string + `headers` dict and `raw_headers` from `k.as_str()` on `reqwest`'s `HeaderName`, which the `http` + crate always normalizes to lowercase. The test asserts against lowercase keys + (`response.headers['x-utf8']`, `raw[b'x-utf8']`) even though the server sends `X-Utf8`/ + `X-Latin1` — this is correct given `as_str()` semantics, and is consistent with how the + pre-existing, unchanged constructor-path test (`test_response_constructor_with_headers`, uses + `'Content-Type'` verbatim) differs because that path does *not* lowercase (no `HeaderName` + involved) — no contradiction, just two different, correctly-modeled code paths. +- **Assertions are correct**: UTF-8 value decodes as the original Python `str` (httpx path via + `decode_header_value`'s UTF-8-first branch); the lone `0xE4` byte falls back to ISO-8859-1 + producing `'März'`; `raw_headers` (as a `dict`) returns the exact wire bytes for both headers, + matching the manually reconstructed byte sequences. +- **IPv6 dual-stack binding pattern**: `header_encoding_server` binds `('::', 0)` with + `IPV6_V6ONLY=0` and the test connects via `127.0.0.1:{port}` — this exactly follows the + pre-existing, proven pattern already used by `truncating_server`/`test_truncated_response` in + the same file (not a new/untested pattern). +- **Test isolation**: uses its own dedicated `header_encoding_server` (own port via `port_holder`, + own thread), doesn't interfere with or reuse state from other tests. + +## Style / lint + +- `ruff check` (`select = ["ALL"]`) and `ruff format --check` both pass clean on + `async_client_test.py` and `response_test.py` (verified locally, matches `test-results.txt`). +- `rustfmt --check --edition 2021` passes clean on all three touched Rust files. +- `py_compile` succeeds on both test files. +- New test function/class placement is consistent with siblings (correct indentation inside + `TestBasicRequests`, correct blank-line spacing, no orphaned/duplicate definitions). + +## Conclusion + +The prior major finding — Python's `raw_headers`/decode behavior being tested only via the +`#[new]` manual-construction path and not the real `from_async` fetch path — is genuinely +resolved. The new test exercises the identical wire-level scenario the JS suite already covered, +the hand-built response is protocol-correct, and the assertions correctly reflect httpx-style +decode semantics and exact-byte `raw_headers` parity. No new issues found; the diff since the +last review is exactly the two Python test files (plus one no-op cosmetic edit), with zero +changes to the reviewed decode/raw-accessor implementation itself. From 750e31a385d0473958f066e9e04aeb6d3016d65f Mon Sep 17 00:00:00 2001 From: Claude Date: Wed, 1 Jul 2026 09:40:38 +0000 Subject: [PATCH 11/20] chore(devforge): record thermonuclear final review (rev 2, round 1) Co-Authored-By: Claude Opus 4.8 Claude-Session: https://claude.ai/code/session_01VrUiE5CzcJ9TiRTqvqb1JE --- .../iter-2-rev2/final-review-thermonuclear.md | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) create mode 100644 .devforge/iter-2-rev2/final-review-thermonuclear.md diff --git a/.devforge/iter-2-rev2/final-review-thermonuclear.md b/.devforge/iter-2-rev2/final-review-thermonuclear.md new file mode 100644 index 00000000..0cdbc9f6 --- /dev/null +++ b/.devforge/iter-2-rev2/final-review-thermonuclear.md @@ -0,0 +1,16 @@ +VERDICT: FAIL + +## Major: `rawHeaders` getter is missing from the committed `impit-node/index.d.ts`, so the public TS surface silently lags the Rust source + +- File: `impit-node/src/response.rs:140-151` adds `#[napi(getter, js_name = "rawHeaders", ...)] pub fn raw_headers(&self) -> Vec<(String, Uint8Array)>` on `ImpitResponse`. +- File: `impit-node/index.d.ts:117-146` (the `ImpitResponse` class declaration, which is checked into the repo and listed as `"types": "index.d.ts"` in `impit-node/package.json`) still only declares `status`, `statusText`, `headers`, `ok`, `url`, `decodeBuffer`, etc. — no `rawHeaders` entry anywhere in the file. +- The design doc (`2-design.md`, "Major changes" section) explicitly lists updating "the `.d.ts`/napi surface" as part of this change's scope, so this isn't an incidental miss — it's a stated deliverable that wasn't done. +- Concrete symptom: `impit-node/test/basics.test.ts:582` has to write `(response as unknown as { rawHeaders: Array<[string, Uint8Array]> }).rawHeaders` to access the new accessor, whereas the adjacent `response.headers.get(...)` call three lines above (line 87 elsewhere in the same file) needs no cast. The test itself is proof the declared public type doesn't match the implementation. +- Impact: any TypeScript consumer calling `response.rawHeaders` today gets a compile error (property does not exist on `ImpitResponse`) until someone manually re-runs `napi build` to regenerate `index.d.ts` from source — the exact feature this PR is supposed to ship is unusable from TypeScript without an undocumented extra build step, and the checked-in artifact is inconsistent with the checked-in source in the same commit. + +## Major: `response.clone()` silently drops `rawHeaders`, with zero test coverage and no documentation of the gap + +- File: `impit-node/index.wrapper.js:417-491` (`Object.defineProperty(originalResponse, 'clone', ...)`). The clone path explicitly re-threads `headers` into the new object (`new Response(stream2, { status: this.status, statusText: this.statusText, headers: this.headers })`, line 474-478) but the `clone` produced this way is a plain global `Response` (confirmed by the existing test at `impit-node/test/basics.test.ts:719`: `expect(clone).toBeInstanceOf(Response)`), not an `ImpitResponse`. The global `Response` has no `rawHeaders` property at all. +- Concrete scenario: an HMAC/signature-verification caller — the exact use case this feature and the design doc (`2-design.md`, "Justified because HMAC callers need exact bytes") were built for — calls `response.clone()` (e.g. to inspect headers while leaving the original body stream intact for a downstream consumer) and then reads `clone.rawHeaders`. This returns `undefined` silently; there is no error, no `TypeError`, nothing to signal that the raw-bytes escape hatch stopped working across the clone boundary. The consumer only discovers this at HMAC-mismatch time, one step removed from the code that dropped the data — a hard bug to trace. +- No test in this diff (or pre-existing) exercises `clone()` together with `rawHeaders`; the new `rawHeaders` test (`impit-node/test/basics.test.ts:573-591`) only checks the pre-clone response, and the `clone()` describe block (`impit-node/test/basics.test.ts:714-791`) predates this PR and knows nothing about `rawHeaders`. +- Given the design doc frames `rawHeaders` as an explicit, first-class, HMAC-oriented API (not an afterthought), this silent data loss on a documented, tested code path (`clone()`) is a real correctness gap introduced by this change, not a pre-existing limitation being merely inherited — `headers` (the string map) was carried across `clone()` on purpose in the same function; `rawHeaders` was not, and nothing calls that asymmetry out. From 82d87561f27af2f2b212b7754ce6fb68932cb697 Mon Sep 17 00:00:00 2001 From: Claude Date: Wed, 1 Jul 2026 09:48:07 +0000 Subject: [PATCH 12/20] fix: preserve rawHeaders across clone(), declare it in .d.ts, correct raw-header docs Addresses final-review findings on the raw-header accessor: - clone() now carries the impit rawHeaders bytes onto the cloned response - ImpitResponse.rawHeaders is declared in index.d.ts (public TS surface) - docs no longer claim original wire order or header-name casing: reqwest's HeaderMap lowercases names and does not retain wire order, so raw_headers / rawHeaders is httpx-.raw-like, with exact header VALUE bytes (what matters for signature/HMAC) and duplicate values preserved Co-Authored-By: Claude Opus 4.8 Claude-Session: https://claude.ai/code/session_01VrUiE5CzcJ9TiRTqvqb1JE --- .devforge/2-design.md | 17 ++++--- .devforge/_state.json | 2 +- .devforge/_verified_task.md | 9 ++-- .../iter-2-rev2/final-review-code-review.md | 51 +++++++++++++++++++ .devforge/iter-3-rev2/claim.md | 29 +++++++++++ .devforge/pr-ecosystem-section.md | 10 +++- impit-node/index.d.ts | 12 +++++ impit-node/index.wrapper.js | 5 ++ impit-node/src/response.rs | 13 ++--- impit-node/test/basics.test.ts | 11 ++-- impit-python/src/response.rs | 8 +-- 11 files changed, 142 insertions(+), 25 deletions(-) create mode 100644 .devforge/iter-2-rev2/final-review-code-review.md create mode 100644 .devforge/iter-3-rev2/claim.md diff --git a/.devforge/2-design.md b/.devforge/2-design.md index 2d38b6d7..0d191bbf 100644 --- a/.devforge/2-design.md +++ b/.devforge/2-design.md @@ -19,16 +19,19 @@ Decision (confirmed with maintainer): **behave like httpx in Python, like Fetch `b as char` and drop the JS "UTF-8 header" string test. **Raw-bytes accessor (new public API, both bindings):** -- **Python** — mirror httpx's `Response.headers.raw`: expose `raw_headers` on the response as - `list[tuple[bytes, bytes]]` (name, value), preserving order and duplicates (reqwest yields - repeated headers separately). This closes a real httpx-compat gap. +- **Python** — httpx-`.raw`-like: expose `raw_headers` on the response as + `list[tuple[bytes, bytes]]` (name, value). - **JS** — no Fetch precedent, so this is an explicit impit extension: expose `rawHeaders` on the response as `Array<[string, Uint8Array]>` (name as string per Fetch conventions, value as raw - bytes), same order/duplicate semantics. Justified because HMAC callers need exact bytes and - latin-1 strings, while recoverable, are error-prone to reverse by hand. + bytes). Justified because HMAC callers need exact bytes and latin-1 strings, while recoverable, + are error-prone to reverse by hand. Preserved across `clone()`. -Both accessors return the untouched wire bytes, so a signature/HMAC caller never depends on any -string decoding. +Both accessors return the untouched header VALUE bytes, so a signature/HMAC caller never depends +on string decoding. **Caveat (reqwest limitation):** by the time impit sees a response, reqwest's +`HeaderMap` has already normalized header names to lowercase and discarded the original +cross-header wire order, so — unlike httpx's `.raw`, which keeps the raw list — names are +lowercased and order is not the wire order. Duplicate values for a given name are preserved. The +value bytes (the part that matters for signatures) are exact. ## Alternatives + the call - **Symmetric UTF-8-first in both (previous rev):** rejected per maintainer — deviates from diff --git a/.devforge/_state.json b/.devforge/_state.json index 4744fdb1..623263c6 100644 --- a/.devforge/_state.json +++ b/.devforge/_state.json @@ -1 +1 @@ -{"phase":"final-review","iteration":2,"head_sha":"9d2204fec0b3ac1acf3c42b3e0c86463e09e8f49","panel":{"tier":"large","reviewers":[{"use":"staff-review","model":"sonnet"}],"final_reviewers":[{"use":"thermonuclear","model":"sonnet"},{"use":"code-review","model":"sonnet"}],"inner_iterations":3,"final_review_rounds":2}} +{"phase":"final-reopen","iteration":3,"head_sha":"9d2204fec0b3ac1acf3c42b3e0c86463e09e8f49","panel":{"tier":"large","reviewers":[{"use":"staff-review","model":"sonnet"}],"final_reviewers":[{"use":"thermonuclear","model":"sonnet"},{"use":"code-review","model":"sonnet"}],"inner_iterations":3,"final_review_rounds":2}} diff --git a/.devforge/_verified_task.md b/.devforge/_verified_task.md index d4353084..889b4045 100644 --- a/.devforge/_verified_task.md +++ b/.devforge/_verified_task.md @@ -16,9 +16,12 @@ Response header values must decode correctly for the common modern case (UTF-8, `U+FFFD` (fixes #479 for Python; keeps #430/#434 guarantees). 2. JS decodes headers Fetch-style: strict ISO-8859-1 isomorphic decode (`b as char`), so string values stay byte-recoverable via `Buffer.from(v,'binary')`. JS UTF-8 mojibake is intentional. -3. Both bindings expose a raw-bytes accessor returning the exact wire bytes, order + duplicates - preserved: Python `raw_headers: list[tuple[bytes,bytes]]` (httpx `.raw` parity); JS - `rawHeaders: Array<[string, Uint8Array]>` (impit extension). +3. Both bindings expose a raw-bytes accessor returning the exact header VALUE bytes (duplicate + values preserved): Python `raw_headers: list[tuple[bytes,bytes]]`; JS + `rawHeaders: Array<[string, Uint8Array]>` (impit extension). Caveat imposed by reqwest's + `HeaderMap`: header names are normalized to lowercase and original cross-header wire order is + NOT preserved — so this is httpx-`.raw`-*like*, not byte-identical. JS `rawHeaders` survives + `clone()`. 4. Tests: Python UTF-8 decode + raw bytes exact; JS latin-1 decode retained + raw bytes exact. 5. #479 resolution documented as intentionally split (JS = Fetch parity + rawHeaders escape hatch). diff --git a/.devforge/iter-2-rev2/final-review-code-review.md b/.devforge/iter-2-rev2/final-review-code-review.md new file mode 100644 index 00000000..f58affc3 --- /dev/null +++ b/.devforge/iter-2-rev2/final-review-code-review.md @@ -0,0 +1,51 @@ +VERDICT: FAIL + +## Findings (confidence >= 80) + +### 1. [High severity] JS `rawHeaders` does not survive `Response.clone()` — silently `undefined` + +**File:** `impit-node/index.wrapper.js:417-492` (the `clone` override inside `#wrapResponse`), specifically lines 474-478. + +The task's own verification step asks whether `#wrapResponse` clobbers `rawHeaders`. It does not clobber it on the *primary* response object — `#wrapResponse` only calls `Object.defineProperty` on `text`, `bytes`, `arrayBuffer`, `json`, `headers`, and `clone` (`index.wrapper.js:379,387,396,405,413,417`), so `rawHeaders`, being a napi prototype getter on the returned `ImpitResponse` instance (`impit-node/src/response.rs:140-151`), is reachable on `originalResponse` as-is. + +However, `clone()` (redefined at `index.wrapper.js:417-492`) constructs its return value via `new Response(stream2, { status, statusText, headers })` at line 474 — the **standard built-in Web `Response`**, not an `ImpitResponse`. That class has no `rawHeaders` getter at all (own or inherited). Only `url` and `text` are manually stapled onto the clone (lines 479-488). + +**Concrete failing scenario:** +```js +const response = await impit.fetch(url); // has rawHeaders, works +const clone = response.clone(); +clone.rawHeaders; // undefined — silently absent, not an error +``` +Any caller who clones a response (a standard, encouraged Fetch pattern, e.g. to read headers in one branch and body in another) loses access to the new raw-bytes accessor with no error or warning. Since `rawHeaders` is the first `ImpitResponse`-only extension with no standard-`Response` equivalent (unlike `body`/`text`/`json`/`arrayBuffer`, which the clone already re-implements), this is a real, newly-introduced gap in the "surfaces to users through `index.wrapper.js`" requirement, not a pre-existing limitation that this diff merely inherits. + +### 2. [High severity] `raw_headers`/`rawHeaders` do not actually preserve wire order when duplicate header names are interleaved with other headers + +**Files:** +- `impit-node/src/response.rs:65-67` (doc comment) and `:101-107` (construction via `response.headers().iter()`) +- `impit-python/src/response.rs:230-231` (doc comment) and `:566-577` (construction via `val.headers().iter()`) + +Both doc comments explicitly claim: *"Raw, undecoded header name/value byte pairs, in wire order (duplicates preserved)."* Both are built by a single pass over `reqwest::Response::headers().iter()`, which delegates to `http::HeaderMap::iter()`. + +`http::HeaderMap`'s own documentation states iteration order is "arbitrary, but consistent... Each key will be yielded once per associated value" — i.e., it does NOT guarantee wire order. Structurally, `HeaderMap` stores the first value per distinct name in a primary table and all subsequent values for a repeated name in a separate `extra_values` side-list; `Iter::next()` drains all of a repeated name's extra values as soon as it reaches that name's slot, before moving to the next distinct name — regardless of what other headers arrived on the wire in between. + +**Concrete failing scenario:** A server sends, in this exact wire order: +``` +Content-Type: ... +X-Trace: span-1 +X-Multi: first +X-Trace: span-2 +X-Multi: second +``` +`raw_headers`/`rawHeaders` (and the underlying `HeaderMap::iter()`) will yield `X-Trace: span-1`, `X-Trace: span-2`, `X-Multi: first`, `X-Multi: second` — the second `X-Trace` value is reordered ahead of `X-Multi: first`, even though `X-Multi: first` arrived earlier on the wire. This was verified empirically by compiling and running a reproduction against the exact `http` crate version pinned in this repo's `Cargo.lock`. + +This directly undermines the acceptance criterion "both raw accessors return EXACT wire bytes with order + duplicates preserved" for any response with interleaved duplicate header names — precisely the scenario where the stated HMAC/signature use case (order-sensitive by nature) would silently get wrong data with no indication of failure. Existing tests only exercise duplicates that are sent back-to-back (e.g. consecutive `Set-Cookie` headers), which happens to preserve apparent order and does not catch this. + +### 3. [Medium-high severity] Python `raw_headers` header **names** are lowercased, not the exact wire bytes, contradicting the docstring's httpx-parity claim + +**File:** `impit-python/src/response.rs:456-461` (docstring) and `:573-577` (construction via `val.headers().iter()`, using `k.as_str().as_bytes()`). + +The docstring states this getter is the "httpx `Response.headers.raw` equivalent" and returns "exact wire bytes." `k` here is an `http::HeaderName`, whose `as_str()` always returns the lowercased ASCII form — `HeaderName` normalizes and stores names case-insensitively at construction and retains no original casing. Verified against real httpx 0.28.1 (`httpx.Headers.raw` returns `raw_key`, the original casing exactly as received, e.g. `[(b'X-Utf8', b'val')]` stays `X-Utf8`, not lowercased). + +**Concrete failing scenario:** A server sends `X-Signature: abc123`. `response.raw_headers` yields `(b'x-signature', b'abc123')` — the name is lowercased, unlike httpx's `.raw`, which would preserve `b'X-Signature'`. A caller building an HMAC over the literal header line (name included) using impit's `raw_headers` to match behavior documented as httpx-equivalent gets a different byte sequence than httpx would produce for the same wire response. Existing tests (`impit-python/test/async_client_test.py:481-482`) don't catch this because they only assert against already-lowercase expected keys (`raw[b'x-utf8']`), and the JS test defensively lowercases before comparing (`impit-node/test/basics.test.ts:583`, `k.toLowerCase() === 'x-utf8'`), so neither test suite exercises or would catch a case-sensitivity mismatch. + +Note: this is not a finding for the JS side — the design doc explicitly scopes JS's `rawHeaders` name as "name as string per Fetch conventions" (not claiming byte-exactness for the name), so JS's behavior matches its own documented contract. Only Python's docstring makes the stronger "exact wire bytes" / "httpx equivalent" claim that this contradicts. diff --git a/.devforge/iter-3-rev2/claim.md b/.devforge/iter-3-rev2/claim.md new file mode 100644 index 00000000..b648fa82 --- /dev/null +++ b/.devforge/iter-3-rev2/claim.md @@ -0,0 +1,29 @@ +# Iteration 3 (rev 2, final-reopen round 1) — implementer claim + +Addressed all findings from both final reviewers (thermonuclear: 2; code-review: 3; one overlaps). + +## Fixed +1. **`rawHeaders` dropped after `clone()`** (both reviewers) — the clone is a plain Fetch + `Response`. `index.wrapper.js` now copies `rawHeaders` onto the clone via `Object.defineProperty`. + Added a clone-preservation assertion to the JS test. +2. **`rawHeaders` missing from public TS surface / test needed `as unknown as`** (thermonuclear) — + added `get rawHeaders(): Array<[string, Uint8Array]>` to the `ImpitResponse` class in + `index.d.ts` (fetch returns `ImpitResponse`). The main test now accesses `response.rawHeaders` + without a cast (only `clone()`, typed `Response`, still casts — honest, since clone returns a + Fetch Response augmented at runtime). +3. **Overstated "wire order + original casing" claims** (code-review findings 2 & 3) — corrected. + Root cause: reqwest's `HeaderMap` normalizes header names to lowercase and discards original + cross-header wire order before impit sees the response, so true httpx-`.raw` parity is + impossible. Softened the docstrings (Node + Python), `2-design.md`, `_verified_task.md`, and + the PR ecosystem section to state: header **values** are exact bytes (the part that matters for + HMAC), duplicate values are preserved, but **names are lowercased and cross-header order is not + guaranteed**. No code change needed for values — they were already exact. + +## No compile-breaking issues +Both final reviewers verified (against published crate source) that the napi tuple return and +pyo3 per-method lifetime signatures compile; no code change there. + +## Oracle — green +Rust: `rustfmt --check` CLEAN; `rustc --test` 5/5; `rustdoc --test` 1/1. +Python: `ruff check` + `ruff format --check` clean; `py_compile` OK. +(Binding compilation + JS/Py test execution remain CI-gated — h2 egress block.) diff --git a/.devforge/pr-ecosystem-section.md b/.devforge/pr-ecosystem-section.md index 75ed0a8c..cf742f72 100644 --- a/.devforge/pr-ecosystem-section.md +++ b/.devforge/pr-ecosystem-section.md @@ -17,8 +17,14 @@ does via the shared `decode_header_value` helper: [`httpx/_models.py` @ v0.28.1, `raw` property](https://github.com/encode/httpx/blob/0.28.1/httpx/_models.py#L152-L156). Our new `Response.raw_headers` returns the same `list[tuple[bytes, bytes]]` shape. -So Python callers get the same decoded strings *and* the same raw-bytes escape hatch they would -from httpx. +So Python callers get the same decoded strings *and* a raw-bytes escape hatch like httpx's. + +> **Caveat vs. httpx `.raw`:** impit is built on `reqwest`, whose `HeaderMap` normalizes header +> names to lowercase and does not retain the original cross-header wire order — that information +> is gone before impit ever sees the response. So `raw_headers` (and JS `rawHeaders`) is +> httpx-`.raw`-*like* but not byte-identical: header **names** are lowercased and cross-header +> order is not guaranteed. Header **values** — the bytes that matter for signature/HMAC +> verification — are exact, and duplicate values for a name are preserved. ### JavaScript — matches the Fetch API / undici (which impit-node implements) diff --git a/impit-node/index.d.ts b/impit-node/index.d.ts index 08b21fb9..9bce9d6e 100644 --- a/impit-node/index.d.ts +++ b/impit-node/index.d.ts @@ -143,6 +143,18 @@ export declare class ImpitResponse { * In case of redirects, this will be the final URL after all redirects have been followed. */ url: string + /** + * Raw, undecoded response header values as `[name, bytes]` pairs. + * + * Unlike {@link headers}, whose values are decoded as ISO-8859-1 strings (matching the Fetch + * API), this exposes the exact value bytes received on the wire. Use it when a header carries + * UTF-8 (e.g. a `Content-Disposition` filename) or when verifying a header signature/HMAC. + * + * Names are lowercased and the original wire order is not preserved (the underlying HTTP client + * normalizes header names into a map); duplicate values for a name are kept. This is an impit + * extension - the standard Fetch `Response` has no raw-header accessor. + */ + get rawHeaders(): Array<[string, Uint8Array]> /** @ignore */ decodeBuffer(buffer: Buffer): string /** diff --git a/impit-node/index.wrapper.js b/impit-node/index.wrapper.js index 3e9ed22e..1c5dc6a7 100644 --- a/impit-node/index.wrapper.js +++ b/impit-node/index.wrapper.js @@ -480,6 +480,11 @@ class Impit extends native.Impit { value: this.url, enumerable: true, }); + // Preserve the impit-specific raw header bytes across clone(). + Object.defineProperty(clone, 'rawHeaders', { + value: this.rawHeaders, + enumerable: true, + }); Object.defineProperty(clone, 'text', { value: async function () { const buffer = await clone.arrayBuffer(); diff --git a/impit-node/src/response.rs b/impit-node/src/response.rs index 2702cca9..7eafbe55 100644 --- a/impit-node/src/response.rs +++ b/impit-node/src/response.rs @@ -122,13 +122,12 @@ impl<'env> ImpitResponse { }) } - /// Raw, undecoded response header values as `[name, bytes]` pairs, in the order the server - /// sent them (duplicate header names preserved). + /// Raw, undecoded response header values as `[name, bytes]` pairs. /// /// Unlike {@link headers}, whose values are decoded as ISO-8859-1 strings (matching the Fetch - /// API), this exposes the exact bytes received on the wire. Use it when a header carries UTF-8 - /// (e.g. a `Content-Disposition` filename) or when verifying a header signature/HMAC, where the - /// precise bytes matter: + /// API), this exposes the exact value bytes received on the wire. Use it when a header carries + /// UTF-8 (e.g. a `Content-Disposition` filename) or when verifying a header signature/HMAC, + /// where the precise bytes matter: /// /// @example /// ```ts @@ -136,7 +135,9 @@ impl<'env> ImpitResponse { /// const value = new TextDecoder('utf-8').decode(raw); /// ``` /// - /// This is an impit extension; the standard Fetch `Response` has no raw-header accessor. + /// Header names are lowercased and the original wire order is not preserved (the underlying + /// HTTP client normalizes header names into a map); duplicate values for a name are kept. This + /// is an impit extension; the standard Fetch `Response` has no raw-header accessor. #[napi( getter, js_name = "rawHeaders", diff --git a/impit-node/test/basics.test.ts b/impit-node/test/basics.test.ts index 8e561a15..15813088 100644 --- a/impit-node/test/basics.test.ts +++ b/impit-node/test/basics.test.ts @@ -578,15 +578,20 @@ describe.each([ const latin1 = response.headers.get('x-utf8'); t.expect(latin1).not.toBe(routes.utf8Header.headerValue); - // rawHeaders exposes the exact wire bytes, which decode to the real UTF-8 value. - const rawHeaders = (response as unknown as { rawHeaders: Array<[string, Uint8Array]> }).rawHeaders; - const rawPair = rawHeaders.find(([k]) => k.toLowerCase() === 'x-utf8'); + // rawHeaders exposes the exact value bytes, which decode to the real UTF-8 value. + const rawPair = response.rawHeaders.find(([k]) => k.toLowerCase() === 'x-utf8'); t.expect(rawPair).toBeDefined(); const rawBytes = Buffer.from(rawPair![1]); t.expect(rawBytes.toString('utf8')).toBe(routes.utf8Header.headerValue); // The ISO-8859-1 string also round-trips back to those exact bytes (the standard Fetch workaround). t.expect(Buffer.from(latin1!, 'latin1').equals(rawBytes)).toBe(true); + + // rawHeaders survives clone() (clone() returns a Fetch Response augmented by impit). + const cloned = response.clone() as unknown as { rawHeaders: Array<[string, Uint8Array]> }; + const clonedPair = cloned.rawHeaders.find(([k]) => k.toLowerCase() === 'x-utf8'); + t.expect(clonedPair).toBeDefined(); + t.expect(Buffer.from(clonedPair![1]).equals(rawBytes)).toBe(true); }); test('.json() method works', async (t) => { diff --git a/impit-python/src/response.rs b/impit-python/src/response.rs index 990209ba..bb5009cc 100644 --- a/impit-python/src/response.rs +++ b/impit-python/src/response.rs @@ -453,10 +453,12 @@ impl ImpitPyResponse { Ok(()) } - /// Raw, undecoded header name/value pairs as `(bytes, bytes)`, in the order the server sent - /// them (duplicate names preserved). Equivalent to httpx's `Response.headers.raw`. + /// Raw, undecoded header name/value pairs as `(bytes, bytes)`. Similar to httpx's + /// `Response.headers.raw`, but note two differences imposed by the underlying HTTP client: + /// header names are normalized to lowercase and the original wire order is not preserved + /// (duplicate values for a name are kept). Header *values* are the exact bytes received. /// - /// Unlike `headers` (str values decoded UTF-8-first), this returns the exact wire bytes, for + /// Unlike `headers` (str values decoded UTF-8-first), this returns the exact value bytes, for /// callers that need them - e.g. verifying a header signature/HMAC. #[getter] fn raw_headers<'py>(&self, py: Python<'py>) -> Vec<(Bound<'py, PyBytes>, Bound<'py, PyBytes>)> { From 6a40868a6cfb682ab1ce7750cc0c2f51ff8cc5a2 Mon Sep 17 00:00:00 2001 From: Claude Date: Wed, 1 Jul 2026 09:51:36 +0000 Subject: [PATCH 13/20] docs: align raw-header field comments with the accessor's documented behavior Removes a leftover "in wire order" phrase from the internal field comments so they match the getter docs: values are exact bytes, names are lowercased, and original wire order is not preserved. Co-Authored-By: Claude Opus 4.8 Claude-Session: https://claude.ai/code/session_01VrUiE5CzcJ9TiRTqvqb1JE --- .devforge/_state.json | 2 +- .../iter-3-rev2/final-review-code-review.md | 76 +++++++++++++++++++ .../iter-3-rev2/final-review-thermonuclear.md | 73 ++++++++++++++++++ .devforge/iter-4-rev2/claim.md | 17 +++++ impit-node/src/response.rs | 4 +- impit-python/src/response.rs | 4 +- 6 files changed, 171 insertions(+), 5 deletions(-) create mode 100644 .devforge/iter-3-rev2/final-review-code-review.md create mode 100644 .devforge/iter-3-rev2/final-review-thermonuclear.md create mode 100644 .devforge/iter-4-rev2/claim.md diff --git a/.devforge/_state.json b/.devforge/_state.json index 623263c6..713179ad 100644 --- a/.devforge/_state.json +++ b/.devforge/_state.json @@ -1 +1 @@ -{"phase":"final-reopen","iteration":3,"head_sha":"9d2204fec0b3ac1acf3c42b3e0c86463e09e8f49","panel":{"tier":"large","reviewers":[{"use":"staff-review","model":"sonnet"}],"final_reviewers":[{"use":"thermonuclear","model":"sonnet"},{"use":"code-review","model":"sonnet"}],"inner_iterations":3,"final_review_rounds":2}} +{"phase":"final-reopen","iteration":4,"head_sha":"9d2204fec0b3ac1acf3c42b3e0c86463e09e8f49","panel":{"tier":"large","reviewers":[{"use":"staff-review","model":"sonnet"}],"final_reviewers":[{"use":"thermonuclear","model":"sonnet"},{"use":"code-review","model":"sonnet"}],"inner_iterations":3,"final_review_rounds":2}} diff --git a/.devforge/iter-3-rev2/final-review-code-review.md b/.devforge/iter-3-rev2/final-review-code-review.md new file mode 100644 index 00000000..192a82fd --- /dev/null +++ b/.devforge/iter-3-rev2/final-review-code-review.md @@ -0,0 +1,76 @@ +VERDICT: PASS + +## Verification of prior findings + +### Finding 1 — JS `rawHeaders` dropped after `clone()` — RESOLVED +`impit-node/index.wrapper.js` (`clone()`, ~line 483-487) now does: +```js +Object.defineProperty(clone, 'rawHeaders', { + value: this.rawHeaders, + enumerable: true, +}); +``` +`this` here is the native `ImpitResponse` (`originalResponse`), so `this.rawHeaders` invokes the +napi getter (`impit-node/src/response.rs:141-152`) and copies the resulting `Array<[string, +Uint8Array]>` onto the clone as a static value — correct, since `rawHeaders` is plain data, not a +stream that can be double-consumed. Covered by a new test in +`impit-node/test/basics.test.ts:574-595` (`'raw header bytes preserve the exact wire value while +the string stays ISO-8859-1 (Fetch-style)'`), which explicitly clones the response and asserts +`cloned.rawHeaders` still contains the exact bytes. Confirmed rustfmt-clean on +`impit-node/src/response.rs`. + +### Finding 2 — `HeaderMap::iter()` order claim — RESOLVED +All public-facing wording now correctly states the wire order is NOT preserved (removed the prior +overclaim of "in wire order"): +- `impit-node/index.d.ts:153-155` (`rawHeaders` getter doc) +- `impit-node/src/response.rs:138-140` (napi getter `///` doc) +- `impit-python/src/response.rs:456-459` (`raw_headers` `///` doc, "note two differences... the + original wire order is not preserved") +- `.devforge/2-design.md:30-34`, `.devforge/_verified_task.md:20-24` + +Verified against the actual `http` crate (v1.4.2, vendored at +`/root/.cargo/registry/src/.../http-1.4.2/src/header/map.rs:943`): `HeaderMap::iter()` docs state +"The iterator order is arbitrary, but consistent across platforms for the same crate version" — +i.e., not wire order and not even a stable/documented order guarantee beyond same-key insertion +order for duplicates. The corrected docs ("wire order not preserved") are accurate (if anything, +conservative, since the map's overall order isn't even loosely tied to wire order). + +### Finding 3 — Python `raw_headers` lowercases names vs. httpx `.raw` — RESOLVED +`impit-python/src/response.rs:456-459` no longer claims unqualified "httpx `Response.headers.raw` +equivalent." It now reads: "Similar to httpx's `Response.headers.raw`, but note two differences +imposed by the underlying HTTP client: header names are normalized to lowercase and the original +wire order is not preserved... Header *values* are the exact bytes received." This is accurate: +`k.as_str()` on an `http::HeaderName` always returns the lowercased form (confirmed prior round), +and the new doc no longer implies name-case parity. `.devforge/pr-ecosystem-section.md:22-27` and +`2-design.md` state the same caveat consistently. New test coverage: +`impit-python/test/async_client_test.py` (`test_header_value_decoding_and_raw_bytes`) asserts +`raw[b'x-utf8']` / `raw[b'x-latin1']` using lowercase keys, consistent with actual behavior; ruff +check passes clean on the touched test files. The unrelated constructor path +(`ImpitPyResponse::new`, used by `impit-python/test/response_test.py::test_response_raw_headers`) +builds `raw_headers` directly from the caller-supplied Python dict (no `HeaderMap` involved), so +it correctly preserves `Content-Type` casing there — consistent with the getter doc, which only +promises byte-exact values and flags the lowercasing caveat as a limitation "imposed by the +underlying HTTP client" (i.e., only applies to responses that actually went through reqwest). + +## Additional checks performed +- `rustfmt --check` on all touched Rust files (`impit-node/src/response.rs`, + `impit-python/src/response.rs`, `impit/src/response_parsing/mod.rs`, `impit/src/lib.rs`): clean. +- Standalone `rustc` compile of the `decode_header_value` logic: UTF-8 input decodes as UTF-8 + (`naïve.pdf`), invalid-UTF-8 single byte (`0xE4`) falls back to ISO-8859-1 (`März`) — matches + docstring and unit tests in `impit/src/response_parsing/mod.rs:454-497`. +- `ruff check` on `impit-python/test/async_client_test.py` and `impit-python/test/response_test.py`: + all checks passed. +- Working tree matches `diff.patch` exactly (`git status` clean); no drift to account for. +- No other file references stale "wire order preserved" / unqualified "httpx equivalent" wording + in any public doc, design doc, or verified-task doc. + +## Notes (not reported as findings, confidence < 80 / non-doc) +- `impit-node/src/response.rs:65` and `impit-python/src/response.rs:230` retain stale plain `//` + (non-doc) comments above the private `raw_header_pairs`/`raw_headers` struct fields ("in wire + order (duplicates preserved)", "httpx `Headers.raw` equivalent"). These are internal + implementation comments, not rendered rustdoc/public API documentation, and the actual `///` + getter docs immediately below are correct and consistent with the design. Cosmetic only; no + functional or user-facing documentation impact. + +No correctness or compatibility regressions found. Value bytes remain exact end-to-end in both +bindings. diff --git a/.devforge/iter-3-rev2/final-review-thermonuclear.md b/.devforge/iter-3-rev2/final-review-thermonuclear.md new file mode 100644 index 00000000..1f0d1423 --- /dev/null +++ b/.devforge/iter-3-rev2/final-review-thermonuclear.md @@ -0,0 +1,73 @@ +VERDICT: FAIL + +## Findings + +### Major — stale "in wire order" comments contradict the corrected public docstrings (docs accuracy regression not fully fixed) + +The rev-2 fix correctly rewrote the *public* docstrings for both `rawHeaders` (Node) and `raw_headers` +(Python) to state that header names are lowercased and the original wire order is **not** preserved +(only duplicate values for a given name are preserved). However, the private field comments sitting +right next to the struct definitions in the same files were left unchanged and still assert the +opposite: + +- `impit-node/src/response.rs:65` + ```rust + // Raw, undecoded header name/value byte pairs, in wire order (duplicates preserved). + ``` + This directly contradicts the getter's own doc comment 70 lines below it + (`impit-node/src/response.rs:138`): *"the original wire order is not preserved."* + +- `impit-python/src/response.rs:230` + ```rust + // Raw, undecoded header name/value byte pairs, in wire order (duplicates preserved). + ``` + Same contradiction against `impit-python/src/response.rs:458`: *"the original wire order is not + preserved."* + +Concrete scenario: a maintainer skimming the struct definition (the first thing you read when +opening the file) sees "in wire order" and walks away with the exact overstated/incorrect claim the +prior review round flagged and that the design doc explicitly calls out as a reqwest `HeaderMap` +caveat. The public-facing docs were fixed, but the adjacent internal comments were not brought into +sync, leaving self-contradictory documentation in both `impit-node/src/response.rs` and +`impit-python/src/response.rs`. This is exactly the kind of leftover overstated claim the re-review +was asked to confirm is gone — it is not gone, it just moved one comment upward. + +Fix: reword both field comments to match the getter docs, e.g. "Raw, undecoded header name/value +byte pairs (names lowercased, cross-header wire order not preserved by the underlying `HeaderMap`; +duplicate values for the same name are preserved in insertion order)." + +## Verified as correctly fixed (no findings) + +- `impit-node/index.d.ts:157` now declares `get rawHeaders(): Array<[string, Uint8Array]>` on + `ImpitResponse`, in the same getter style as `get body()`. The main test's direct access + (`impit-node/test/basics.test.ts:582`, `response.rawHeaders.find(...)`) no longer casts. +- `impit-node/index.wrapper.js:483-487` propagates `rawHeaders` onto the `clone()`-created plain + `Response` object via `Object.defineProperty(clone, 'rawHeaders', { value: this.rawHeaders, ... })`, + reading from the original native `ImpitResponse` (`this` inside the `clone` function is + `originalResponse`, the patched native instance) — correct source of truth. The test + (`impit-node/test/basics.test.ts:591-594`) asserts the raw bytes survive `clone()` and match the + pre-clone bytes. + - Note: `clone()` is declared as `clone(): Response` (`impit-node/index.d.ts:250`), the global DOM + `Response` type, which has no `rawHeaders` member — so the test's + `response.clone() as unknown as { rawHeaders: ... }` cast at + `impit-node/test/basics.test.ts:591` is still present. This is a *different*, pre-existing typing + gap (the clone's declared return type was always the plain DOM `Response`, unrelated to this + fix) rather than a recurrence of the originally flagged issue, which was about accessing + `rawHeaders` directly on `ImpitResponse` before the getter existed. Not treated as a blocker since + the acceptance criteria only requires runtime survival across `clone()`, not a fully-typed clone + contract — but flagged here for visibility in case the intent was to eliminate all casts. +- Docstrings in `impit-node/index.d.ts:147-156`, `impit-node/src/response.rs:125-140`, and + `impit-python/src/response.rs:456-462` now accurately state: names lowercased, wire order not + preserved (reqwest `HeaderMap` caveat), duplicate values preserved, value bytes exact. Matches + `2-design.md`'s caveat section and `_verified_task.md` acceptance item 3. +- Header-pair construction (`impit-node/src/response.rs:99-106`, `impit-python/src/response.rs:574-579`) + correctly derives both the decoded-string headers and the raw byte pairs from the same + `response.headers().iter()` pass, so the "duplicates preserved, values exact" guarantee actually + holds against `HeaderMap::iter()` semantics — not just claimed. +- No new maintainability/structural issue introduced: the raw-header plumbing is a small, additive + field + getter in each binding, reusing the existing `decode_header_value` core helper where + applicable; no file crossed a size threshold, no new branching was bolted onto unrelated code + paths, and no bespoke duplicate-of-an-existing-helper was added. +- No concrete compile hazard spotted in the new Rust code (napi `Uint8Array` import, `getter`/ + `js_name`/`ts_return_type` attributes, and pyo3 `PyBytes` construction all follow existing patterns + already used elsewhere in the same files). diff --git a/.devforge/iter-4-rev2/claim.md b/.devforge/iter-4-rev2/claim.md new file mode 100644 index 00000000..a2494ca2 --- /dev/null +++ b/.devforge/iter-4-rev2/claim.md @@ -0,0 +1,17 @@ +# Iteration 4 (rev 2, final-reopen round 2) — implementer claim + +## Addressed the remaining thermonuclear finding +- **Internal field comments still said "in wire order"** (`impit-node/src/response.rs:65`, + `impit-python/src/response.rs:230`), contradicting the corrected getter docstrings — FIXED. + Both field comments now state values are exact, names lowercased, order not the original wire + order, and point to the getter docs. Also dropped the stale "httpx `Headers.raw` equivalent" + phrasing from the Python field comment. +- Verified no remaining overstated "wire order" claim: the only surviving mentions are accurate + negations ("… not preserved"). + +## code-review: PASS last round (no further changes needed). + +## Oracle — green +Rust: `rustfmt --check` CLEAN; `rustc --test` 5/5; `rustdoc --test` 1/1. +Python: `ruff` + `ruff format --check` clean; `py_compile` OK. +(Comment-only change since iter-3-rev2; binding compile + JS/Py tests remain CI-gated.) diff --git a/impit-node/src/response.rs b/impit-node/src/response.rs index 7eafbe55..a0ad04d8 100644 --- a/impit-node/src/response.rs +++ b/impit-node/src/response.rs @@ -62,8 +62,8 @@ pub struct ImpitResponse { /// /// In case of redirects, this will be the final URL after all redirects have been followed. pub url: String, - // Raw, undecoded header name/value byte pairs, in wire order (duplicates preserved). - // Exposed to JS through the `rawHeaders` getter. + // Raw, undecoded header name/value byte pairs (values exact; names lowercased, order not the + // original wire order - see the `rawHeaders` getter docs). Exposed via the `rawHeaders` getter. raw_header_pairs: Vec<(String, Vec)>, // Shared sender used to immediately signal abort to the JS ReadableStream without polling. abort_receiver: Arc>>>, diff --git a/impit-python/src/response.rs b/impit-python/src/response.rs index bb5009cc..2f7b0685 100644 --- a/impit-python/src/response.rs +++ b/impit-python/src/response.rs @@ -227,8 +227,8 @@ pub struct ImpitPyResponse { content: Option>, inner: Option, inner_state: InnerResponseState, - // Raw, undecoded header name/value byte pairs, in wire order (duplicates preserved). - // Exposed to Python through the `raw_headers` getter (httpx `Headers.raw` equivalent). + // Raw, undecoded header name/value byte pairs (values exact; names lowercased, order not the + // original wire order - see the `raw_headers` getter docs). Exposed via the `raw_headers` getter. raw_headers: Vec<(Vec, Vec)>, } From e8289e56e21c4dd0ac852e3de2dafc4bbd92ccd6 Mon Sep 17 00:00:00 2001 From: Claude Date: Wed, 1 Jul 2026 09:52:57 +0000 Subject: [PATCH 14/20] chore(devforge): final review converged (rev 2, both reviewers PASS) Co-Authored-By: Claude Opus 4.8 Claude-Session: https://claude.ai/code/session_01VrUiE5CzcJ9TiRTqvqb1JE --- .devforge/_state.json | 2 +- .../iter-4-rev2/final-review-code-review.md | 11 ++++++++ .../iter-4-rev2/final-review-thermonuclear.md | 26 +++++++++++++++++++ 3 files changed, 38 insertions(+), 1 deletion(-) create mode 100644 .devforge/iter-4-rev2/final-review-code-review.md create mode 100644 .devforge/iter-4-rev2/final-review-thermonuclear.md diff --git a/.devforge/_state.json b/.devforge/_state.json index 713179ad..463c8ee0 100644 --- a/.devforge/_state.json +++ b/.devforge/_state.json @@ -1 +1 @@ -{"phase":"final-reopen","iteration":4,"head_sha":"9d2204fec0b3ac1acf3c42b3e0c86463e09e8f49","panel":{"tier":"large","reviewers":[{"use":"staff-review","model":"sonnet"}],"final_reviewers":[{"use":"thermonuclear","model":"sonnet"},{"use":"code-review","model":"sonnet"}],"inner_iterations":3,"final_review_rounds":2}} +{"phase":"create-pr","iteration":4,"head_sha":"9d2204fec0b3ac1acf3c42b3e0c86463e09e8f49","panel":{"tier":"large","reviewers":[{"use":"staff-review","model":"sonnet"}],"final_reviewers":[{"use":"thermonuclear","model":"sonnet"},{"use":"code-review","model":"sonnet"}],"inner_iterations":3,"final_review_rounds":2}} diff --git a/.devforge/iter-4-rev2/final-review-code-review.md b/.devforge/iter-4-rev2/final-review-code-review.md new file mode 100644 index 00000000..27b959f9 --- /dev/null +++ b/.devforge/iter-4-rev2/final-review-code-review.md @@ -0,0 +1,11 @@ +VERDICT: PASS + +Reviewed the comment-only change to `impit-node/src/response.rs` (field comment above `raw_header_pairs`, ~line 65-66) and `impit-python/src/response.rs` (field comment above `raw_headers`, ~line 230-231), plus their corresponding public getter docstrings (`impit-node/src/response.rs:90-105` `rawHeaders`; `impit-python/src/response.rs:447-454` `raw_headers`). + +Confirmed: +- The leftover "in wire order" phrasing flagged in prior review rounds is gone from both internal field comments. +- Both field comments now read "values exact; names lowercased, order not the original wire order," which accurately matches the getter docstrings' claims (names lowercased, wire order not preserved, duplicate values kept, value bytes exact). +- No other lines in either file, or elsewhere in the diff, still contain the stale "in wire order" phrase (checked via full-repo grep). +- No other code changed — this iteration is comment-only, as expected; working tree is clean and matches `diff.patch` from the prior iteration plus this comment edit. + +No findings with confidence >= 80. Nothing regressed; no overclaim remains. diff --git a/.devforge/iter-4-rev2/final-review-thermonuclear.md b/.devforge/iter-4-rev2/final-review-thermonuclear.md new file mode 100644 index 00000000..bf3bd580 --- /dev/null +++ b/.devforge/iter-4-rev2/final-review-thermonuclear.md @@ -0,0 +1,26 @@ +VERDICT: PASS + +No findings. The prior finding (internal field comments at `impit-node/src/response.rs:65` and +`impit-python/src/response.rs:230` still saying "in wire order", contradicting the corrected +getter docstrings) is fully resolved: + +- `impit-node/src/response.rs:65-66` now reads: "Raw, undecoded header name/value byte pairs + (values exact; names lowercased, order not the original wire order - see the `rawHeaders` + getter docs)." This matches the `rawHeaders` getter docstring (lines 125-140), which states + names are lowercased, wire order is not preserved, and duplicates are kept. +- `impit-python/src/response.rs:230-231` now reads the equivalent: "values exact; names + lowercased, order not the original wire order - see the `raw_headers` getter docs." This + matches the `raw_headers` getter docstring (lines 456-462), which states the same caveats and + that the getter is "similar to httpx's `Response.headers.raw`" (correctly qualified, not an + unqualified equivalence claim). + +Repo-wide scan for "wire order" / "httpx equivalent" / "httpx-like" wording turned up no other +unqualified overclaims in touched source, `.d.ts`, or test files. Every remaining mention (in +`impit-node/index.d.ts:153`, `impit-node/src/response.rs:138`, `impit-python/src/response.rs:458`, +and test comments in `async_client_test.py`/`response_test.py`) correctly states the order is +*not* preserved and/or qualifies the httpx comparison as approximate. + +Diff scope: the only change relative to the previous review round is the two comment edits +described above, in the two `.rs` files. No other lines in `impit-node/src/response.rs` or +`impit-python/src/response.rs` changed, and no other files in the diff were touched by this +round's fix — no regression introduced. From 3bc991f661a033a25c3c3595dbe5f9ad03871ef6 Mon Sep 17 00:00:00 2001 From: Claude Date: Wed, 1 Jul 2026 09:54:25 +0000 Subject: [PATCH 15/20] chore(devforge): record create-PR approval Co-Authored-By: Claude Opus 4.8 Claude-Session: https://claude.ai/code/session_01VrUiE5CzcJ9TiRTqvqb1JE --- .devforge/_create_pr.approved | 1 + 1 file changed, 1 insertion(+) create mode 100644 .devforge/_create_pr.approved diff --git a/.devforge/_create_pr.approved b/.devforge/_create_pr.approved new file mode 100644 index 00000000..93b4fdee --- /dev/null +++ b/.devforge/_create_pr.approved @@ -0,0 +1 @@ +approved by josef.prochazka@apify.com in chat on 2026-07-01 (create-PR) From dd95522ef874f5116ab6e41dae9671037ed01a7e Mon Sep 17 00:00:00 2001 From: Claude Date: Wed, 1 Jul 2026 09:55:38 +0000 Subject: [PATCH 16/20] chore(devforge): record PR #492 and finish run Co-Authored-By: Claude Opus 4.8 Claude-Session: https://claude.ai/code/session_01VrUiE5CzcJ9TiRTqvqb1JE --- .devforge/_progress.md | 9 +++++++++ .devforge/_state.json | 2 +- 2 files changed, 10 insertions(+), 1 deletion(-) diff --git a/.devforge/_progress.md b/.devforge/_progress.md index 599dcda1..8dee281e 100644 --- a/.devforge/_progress.md +++ b/.devforge/_progress.md @@ -22,3 +22,12 @@ ## State - Phase: design-gate (awaiting human approval before any source edit). + +## Finish (rev 2) +- PR #492 opened: https://github.com/apify/impit/pull/492 (base master). +- Oracle green throughout (rustfmt/rustc/rustdoc + ruff/py_compile). Binding compile + JS/Py test + execution are CI-gated (h2 egress 403 blocks local build). +- Reviewer staff-review: PASS. Final reviewers thermonuclear + code-review: both PASS after 2 + final-reopen rounds (fixed: clone() dropping rawHeaders, index.d.ts, wire-order/casing overclaims). +- Approvals: design gate rev1 + rev2 (chat), create-PR (chat) 2026-07-01. +- Phase: done. diff --git a/.devforge/_state.json b/.devforge/_state.json index 463c8ee0..26e959d3 100644 --- a/.devforge/_state.json +++ b/.devforge/_state.json @@ -1 +1 @@ -{"phase":"create-pr","iteration":4,"head_sha":"9d2204fec0b3ac1acf3c42b3e0c86463e09e8f49","panel":{"tier":"large","reviewers":[{"use":"staff-review","model":"sonnet"}],"final_reviewers":[{"use":"thermonuclear","model":"sonnet"},{"use":"code-review","model":"sonnet"}],"inner_iterations":3,"final_review_rounds":2}} +{"phase":"done","iteration":4,"head_sha":"9d2204fec0b3ac1acf3c42b3e0c86463e09e8f49","pr":"https://github.com/apify/impit/pull/492"} From b7733626c143c294d58bd1736873ed57dd5bf5f4 Mon Sep 17 00:00:00 2001 From: Claude Date: Wed, 1 Jul 2026 09:58:40 +0000 Subject: [PATCH 17/20] chore: remove stray napi-0.2.1.zip committed by mistake This unrelated archive was accidentally staged by a `git add -A`; it is not part of this change and not present on master. Co-Authored-By: Claude Opus 4.8 Claude-Session: https://claude.ai/code/session_01VrUiE5CzcJ9TiRTqvqb1JE --- napi-0.2.1.zip | Bin 11484 -> 0 bytes 1 file changed, 0 insertions(+), 0 deletions(-) delete mode 100644 napi-0.2.1.zip diff --git a/napi-0.2.1.zip b/napi-0.2.1.zip deleted file mode 100644 index 8cfe868c5f5b678a109f8d17291f2d9517ad7959..0000000000000000000000000000000000000000 GIT binary patch literal 0 HcmV?d00001 literal 11484 zcmZ{qV|1kJ+N~?-*fu&&I<}p3l8$ZL72CEuX2(Xy>e#j_wma78z0dg0TKn7U)cd1G z)sJhuKjt%^d)#v>$Us110000O08+EP=x;nLT@P>ofEf}1!20vAouPvzJre^912f|{ zDN%7bWpM^)cV}H?nYc_QlrFMsVhL_ribkIePp6(04GJ7w^Em?8@1Rr=3~$W+mRqQa zTT@E>bVt_^Lea%5%Zf2EDE|p^tIPW>2XHD+*;D_n9OsB3`6FF!(d$&AmdKMey$_zC z#1o-dHZL<8Y)N=-X>NubgE^R+5(}KNPBL2S!^oj%(9pTT33QUAwf? zEv5<;MQce%&J4kY$wo5**DcGk4@|&kLhglJg z^UaDZXu|&&%CCaoD;Moutr7FNmSx*rl=A7;mvY3N8Y=U;#i&vfeDgdyYvWIAYym9u zTdjr;>cFLzx+as+kGww1FN@F*OXI$F>O{Iit(=(7p|>xNmlRt|eE?{uEj$+OR|M(@ zE6dNH1VGRRr(lzI88jcw&s;JZPNjR&m*00tXsJ`Yv2dphdTFDvVb6l~#;Tys6e_2^ z;^_q2dyjcfX17q8# z+BO(ZFV65=Afhm3`cr;&5jYfuvR2Vr77cZCc`~LX814>Mfd);{Hoo4GWYGY@l{rG5 zRs$bZHJ6b#Ti(~tMfK}+*Nuy{sUCyOt$U?0^iICJBKnq-{geUGw#(?bCFH;I_)pql zgEumiV{vc=%CPE5sLH-H{$dg?l7A;9j`1i%R?frtts7YF6PS%1j%7Mg;M1?E=vMQ>+RylJrt?67p55 zn)c|-D86yYA57t*wkbBN7y*q1fg;4={^XzctS zngc+}iZ4cC4bW`AQr1I`+mA-S!I+7*1X4iW!!{5ODEGs7~D4)y5s;zA+IdRAOe4j7G zWdOH4KZCm-M3-XT45I)1#9mDU1|VP{3{p5&@CXqB z>~V38p`{7^=K^{YKDcMi6pwx=aH9kTmbCAX`?gNj^3t<)lO~2lP+sE0}_29MU8UU@$YWU|I~s!*%>~He}H4W5ls}-#+#{ zJrP+eC`w6WTPL57z=G1291!d$OMSoXyI-B4Z1&Ju1tKum_CR@K@b%8M%|olBP6tSS zRP)9^&4N}`PInySGrEoNZMD2xbtsubmUr51phi8^FcOx|tsX9#$IeP=4^Y1}bvo9P zJNmaV=jV1@^jOZul_hC6jxfA9n`Z-Z0#KpE2r-kflPk7CzRr9R)Hr;2$=1PB&0 zW~XS)(@xvPlo2h+!EJ@+h87Y#Zwp{VS0k1QSU!GYsGXK5xM8{;hn6%AOq&jQ_&v>5 z2eF@EGzQ``0mmGIeID<$TP z!I*ioJ;8qmsqymPF6@OVr0cZ}RSjaSPEp#}xY(s9V_vc9UI@i01y{4}UDhp>e$+sf zsBnIkHaR+xBt+WLmV>*(|LZsj{W(tD2j1u>U;seLpCg6&pT|i_Tv$w2oB`0Y#o9)>*kAfLg>MnbtR02ytvi`QEImMh1r zLsP=&BpM`f7Eh|5F+z_blO3HG{aThQz_@E^-FU~P#Y55%?oBYcBL&17`#mb+d=jLp zvi)9dE+JFozJg*gmKo0{RI@o;*jcxy)Bmo#dxu6>p{MJ%_ zymNro+!7&MdSwCeqxJ=4tihc8xI+VG{OY6~tN1iI-7(+R$~WvIcg_AeVY~Eza!z1|UP+izi6h zDv8}+aib>?*iJ)Io7S)Hhw&Mfa`FUQtSQ%X3WPoxdnqfdR$$3rYbN^;g;ToNPi2*P zJF;)JnMHt99l(8Rb?*=8eFyv{j(-gX3b;5aw0{PJ{Xc`j|MFTVQ)d?k1_zI9WyL?z zK=DQ16I{=kF`7UbDq6T$}ncSNdyreBD+0?r<)<0~FMQfK@bnA(CO z0dgn4Op9Q}iEqy4`UKVD)2$~!x-jYs6;=A66N6g+BrNNmBEfvDWX12bxZf|{6V*dW z0@oPhCB9)M;gdlzTez9w&guza*6W2>=GD=(DyE;Mjc!g=8|3!H>ATt;>s_BK#Co%3erF)fj;cy^ z+(ok0UuEFpoA}=3E=N$cTH6uMF`#;Yo&Js0RoqPITa0WVf_59KVt5VW?2;2mkyf1# z$(F=Lu7X;?y%FH{w43_p68F&Yr0>fB83B z;27p5a8^Jr+}AIwRj}@Ru&M|#qMpGxavJfS{(njf+akK`?vL*jLIVH<|LHsb8b2A$ zTI38hnVR4_Ub0#emmsBuLbxk8%%?q_)^yr>EN^00ARS^4|R zY8_%|y{)n(;~1%d*}?uqrbngvscHy2jJJ_&WQiOpYT6E5Y>c-FBB9-n9y$vXZle0L zCHvyo#K>Yyhd?RjIB1tv(Z@Q$tI7l&?9GP9OwhS+i%fI|4FRl9B4FTu{fv7 zQixE|IDO-~UBSGOt-z~wEz0XVnMk|>oXM!;H!sk$nsKx7j8G->Vf=YY{Fhu{n}DJ? zgU46ee_W^m2>`(T2f5f9np+zGODjLrx9qdH{yqEkMmmGv-%lS9B>+##+P)@StR`hN zFTiyegk$#f-*arJJZOyfejy=7(L3$PIO zrE`O6=j*V-SwF*_@V+YK7i5LteneX(l}3s3WMp874i~}*zG$>44P8ND8u!OnL~Bze z8E02~QJb^(H~2P?lkv{3nqdVMkU^4l_MR{*oqn_sgk3voSnugdJ%eivLE8_eh|D{f zqrqgk!=Be{fYqs7-n@y|YBJEH-n^f$tLz0p$U$n&Omx z+2YGRn~;FH{p0$a(cQF%x>cvwY4u>qC>Pwg8cxrVP(lVIEV&z86$WWb;+BI#=NElz zOb~Pnwl1Z1V_%C?B&@GO{AwH(7?@HNQjdgLA>&nESL*eVPb3(rCZmY%Gvu3Jd;J7c zlA$18f5yosC6;MAX6Sk>53(@F4Je$4SJT8IX4^<6hc>Mwi6h_{N1|;K_jN zgY&4F#-h+tQJs*8{ep#}GE$Mvw#kSU(G7PRq?r=8aTO7US)m;>muo?pNzOVgNRZmB ziq&ngXK2)~)+;O+OnL09V3V3HvE`6TTBwFz3!8;O&8yHVl5Y%~A|hf<5pL32s`cI< z86i@GzrCx4y!aKN^*|SApvm0?O=4j|V^Lr|hbAT3#VC8g((Vx%8zvkHz~|NQ*oAGR zd_!(*QAk_q+L!M(hf+QKa7glV0WY-o@YLD{BUkX#%&PLwh`|#cSV=a>A9JN0{0)<{ z)yB4{IcRAnF^{8P?86{d=8UI`C}`46z(0#Wc{brbsWou0xdUCq8DwFjGcGF5Jbhea zBlQhjMXF3ZskC#6m;8I>+o$duBAU1P*ZM{nQupYRv;#PE`AJz3TR;kl&d3T{x8PYC zXa<<~mU(6-CX}0GpOcIs)IvJrmnNBQ1`O=OOthwcX=*rEPL^=FZA4(gUPV1TI$h0T z&Ls&l?SLc45YHq(ao>j_L>s+3Q630zppWk#-~gmOyRsozwd4$^_D5;bm#!O<-^guQ z!`yd9WUT`^v1vVL9#;1@Hn|u#CpWV86IS%tVSIc#iE55peukvux@B5coa}c%8C7Dtp1h!3VkB9rM$uQzsa0)+Zk>xy zuQBOMb%6+JgZa+xb-kR(#GIF5lVb);|pKE@3g}@ds z>~9|@qVi?vt$B?5uwH!ySlaBdhI9Iay$7iCsdt2A7EWlErp*cV!#qzS|K&(O%YmXq z^XKviPyhfKJ^+C8502z)>g4<{LpsNmwO`}-O+@k$aEg@3!d{!yCKTwtjC5`};s0}a zfJsNX0NoI+Od7~bHSD*wOGXock;0?f+@}DMZ_*{-WzVywl^h%un*CD*194>)c~^Wla0DQ5E= z8ehM_-p+8N$RqpW##PkA)%*ARdCk(L-pBj>L`C!EaAG9^>M60L$fAN(~b+3GXR#L>X zQh^-ld-?)3u@{uJEAwrSmrqYjl;vsVuyRrF{-Jr&z*Y zS9FI7xw?1|rD4JolILWn8=!*Hc;>C59c*BrVQ7k_R&QG*k$&o2+K-sF6-GgSMJHG~ zR}(k@rjPu1;?huV@azEN61$FPj7031xO3tVPDZ1k3^53huf3>T92(i(jFP$(SlSH% zI4iQ_*KMf0+^b#5w%2H&a~P?Ye4)gZapfFaYP0~sW7Se*bj%f)O;lSL|9UUiSPf*a zv*G#OV&$n?pZbWDz}FL7*`pL~3P-nS5c~pt)0em|2w9Q{Yb&bVlDU!;hEFg$d8gLq zt>L#}Q>{u<%a1t@Yl7pUfBD@x#d`;;&9&(sLftx=>`p+xeG}OXau1hQMb!=`6#@qs zd>LsNcD%a+pG#Mj{P^-jkuJ!GsNG2<6nxm?R|~nqd^UKUyk=zSOQ?t<6Bklney@v? z0oaK_pO$UwuLSmaU<~q8R43F>wSwWlHH0siUk%byl-CI(Z7>~CYvn@zd@+h!!y~}} zGqDRo{Az?#!27A)xT)43Zrm{kg@zhM!lyWo4#XtK6AUIUuN*E8ia5;wFpm#ZFQvCD;PCLJ%OiX6b z!j}~{XC!;-EHs6m$zEK#7M=I!h=D}`kU3=D6G^-g4 zx5LYq1~4EkYpoM9!aIVyVbGMrcWuz>tvmOQ;qKjQPKFh&(IksKQ;uoLDEXJ@4pM}} z!B?(_nYD$Li6+uZ4IoY=(1gD6ds;e`B{oN4nlg`HVf^*d>Xe5Ud}D2DWk@=jY<5^}&RTv#joz*X=Z7{ZL`2lcmLegLvcOW1j1`Jqs`&T#CPLc0H0y`SJ>!O)OBg>*|oADo{RZ}41#rJ5Au%;fcONi%;AKcY6)#biS148=y z*F=J(QVeA}EUh>^U4GGWBWJKpVh&x#e*vEvu_}hvUB8uo# zkJL2%Tq>OL<<2hiu?L$ceY2<(W}nQhqGq@*6v1zN^S>3U#6DYpNwtiCidt$J#QfYnVEyhKf9*E>) zd1$e#ZrvygC@V8Y^=uAG#I@?4NJfWRczo|Y3?qRd2n)w!{F!9j#zs@kAC7`DLgUNZdE{->zsVvJY;HnfVo$XQ!gr0nQ zgL;wf%>TFr{r$@RuN#E+&rBQeRoWjH1pqJ?0|1Es=>`EB+Buoo18q%#|Gr4@t>e}> zZoK?op+64@CswF7bP^#koq&6F+M?=YK#u4MuIG@Bq1jfXP>2^boKF1celrU#5>u>S znt*7zFXyn1B0vheJ&5=HeKfN`iEX28+8)>tU2~HNpZX=lCQ*f@j?ol2wpYBJC9F)N zrrNL{WAHvuL5C_N|9ORyrXesY3^}n_>b|nvBWbvU_7lmM8@7awWl3aWke_1zjVB|g zDt9)^-TpSj;IhiuN!zx_H&i%0HQ6T9eLL+-&915niF(+k(=?VcC(j{9#lEH1_>PF? z5gz$Bf6w%B zPJBImyqu!fDipo-ho1&w|srty26q7Zv;!KC|itV)%L=ft<+Ev4stsA8t zuAW|*mAz?k8=*u?X{czka#1xX87qHV|@BT|b}dKnZw#!j8qA%{{U)l-Mc19Y+^ z#6sS%D40hfAF9Kts;ZEcR?CF;wA(L-+v?jMd21=r3AXqsvzi&4#|&wUkyKn$Vz3Y` z!K3qGrmwO}XBZFuWsNr`V=IV93sG{qpk7!HB`Z7iyhP0b@&2#5w^;mB$-^kF=1s>Gw>a}tYzegi4AgE#mWWKF#KFD62Jv?kU)d(9Rp7T^vjh+~R$dFB&u z-B6;W+06)@Bm0zCG$hgrmZ^_a1A@56DDRrvxp&NS4Uq0OT*j!kQl{P~N6Ex3Fc(B<>HBXsiWQ9;Pu+l1ZA^HH;tfhK z__%8#PJ<_sk$t%WeAyKIZ*&UZre|}&Q@wTfPQ5i9;A575;0Mw5ZsP!i3WS#B&ceqj zEc`NVw0^O8U=JFtob8#7awsj%F?J1?tB6Uv6_#uIDejFyfsbH|m;sS{Kd%nkUgO4a zi((H=kz&^7Ka!aHq@O-vFs3y9}H&Oq})ZBboaPhVsSE}s4Yfg$QQE2 z?aKK4q5S$lT04MxKDY&P<(Zp_Ev1>&q>31_%EgE({$@N7wk?NRZ@*)B`tn&%E(zdY zLW;-WCM)IthTGeZe;m}c%LfCS@va0$W#GlVg5j-MEdH6g_S^BAGgu+nS9Gks$WI!8 zHUPbPNaSiAZlHi?3fz}&!(l^~w|#lOl+SF96rgC9D+D%V(<+NSIAuMvuY62FSEJn2kD0X7S0O zR~b8*13NCgAhZ(mIywErJ;PqxTG~9H`Uaw6^+|eYzUS7d#zS?$P4(!%yI=oa*vI@; zh#w>X5pe;p&khv-iA@l>CP?R$L$GkYLBP``T49&PXkqv+ZhYM*?xxddT* zf9h&1@5NI=-yb8*{N~#gprfua_#}G68nztI$=M3RL3x8#E}Pyt~V`^1}=Z5mc`D&(bFuY3cIzYK*a9Y;42sl=%cL zj&WAeosZ0{s(m8p$Px&$!UU`=->BIOdQiuy3A0i3j@Bj2Ma{|K8``saKoKb`DrTp% z#d`8;HU+iv(YI)OAET_D!c<#dJ?ILf$rsVSfBRZNi9Wj3l(``lkEhO!dLZH{pc0Pt zz9nLKs>5TeUWYhn4UXAGAt_(o?7L6pTLJt8j^ zONya?cS;wPp6KL^eV);$#zZ8jpD|W=bAePb_6ZVeZ74RRl7!2)GodXtiv%rMJ+fGL zI)_fLM8~*qeF#!e<2cphrZDB$r_PfPUrf#7H&1s^{UAI~sT;5f%<|6T01r(J>cYU`!}pt^>dIOB4&LGnUdu>C%3beb_5ZEB(;Ks-B0k=zdEfyW8h4}<>3T+t??WCbj}xBZ3;G&Wyq&3 zmqPCBbw~vG-84#+$aiOVKtZ=#%C>%VOIZs5rHxs>3m4o3a%moj$#QWk54)V9f*4Z$eFz=1zd$c~KS%+;Lp(*nO&5}yl ze6E$OL6Et1tu0Un-?@f~V!sVHcjpik=a9or(7idMAb-R;4T>lzCqmQ?FXjcL-#tD* zfjS0wL>!e72h=hmBfsxP_kq_=!PA&6X*AZHP{&GX@8WbBx(LL^U9B7Ws3xJF{+RG; z+~DO#{Ig!KnyG%;fgO#W-1gpEMED49+`)afl!qA@S7R~9tgsaZI;a9CWH$FKzrpQ@ z;4~vwiJB@=g+H9mf&wQMRc9$Wkd$k%(Y8?u8_-nf<4~p~HpI>_RFx$?3avQMu;hC+ z*uzlpvR&qX+3XFn@ANfo8(RcAUQdFZgtfUhB)p&yZ|Nc{3pZp#K3Kbtx?i7+F6QeW z-z??7D{~gIc_}6yfz5Ug z?6DWL-yn^2PZeoRaG07n;x@it>u79j9~%fN_fYEMiJJ7E=dn)E??Zr(tA0vDn%lG$ zPIk`64l<%M*sY57NQfU_-OmuHp%R^F@XfWkM`|?X*0QWd4cG-e(Qi5#et^7IRbxOn zLwr45M18)i2t(#|W-R7KVZ}T?JR;J((k6|Y3TS>?U99**F63Rh2 zYlAe=h^DtZdf6K-qHx#vBq67;nD7Nx;r0^V_tgW*>5s#4_z8`eCz+*@6g= zJav^i9FTRT-BUBC*ICH;dKhkY78aQd8&kXYitJXm6XjUa`r>wWk7ZLZguuy~E3`ew zIN2|rd@ZJy-zG#p&}B(X99&F=*hM^^>-?=1-u)DK<%juc2^dGET8$M)x*hdKzXIqq z$c)_h9CIZP<_DAj=+Y9kNHq?c%Xy-gcZSO6(0DQ02DO-_>Lf?i;&{)3hBLj6-`;X7 zN;6kMaIW!)zt_?A(0=6zujKRESf7RlVp!JoNPj{Myi_&z66E7$cc&b&Wwb~<;_QQ_=rfFoI4R);@8DWoXkPTFuuY^C`SXmEi*8_OnPY_}a0}n~ zclwFN*>n8dlHLB0oIyJ~F?Wb6kHcP|W?X)!w|vdcq(N^)AslYld3qQzp$hj?;t)h%bOG-7gf?=+u=6;x==;Sd;G9z zyC}t&HT>DwAo z(L+6Q!Ix2^h83a1bE0TxIO?Q3)J;TJ?}5-v1LUq=D<)j;gtjl$?Ln4Go~5UqD#g{} z5*#%9v;h?*y*H4OOD##LIgV8NOyslDGRQN(js)E=!>b1suqI+}cDV26NAU@XcmuK6 z15jPMGcuBDDOED)Z{%+&x!QxR13`EK;APZ?5bDuYU_ z;O6%8ik2=tR+N~r%0lQal`ib?uPCWJrX;^#pP7-__V}N<`TUa%FOo%0wXAfP<{u&O zhlWynzGWO<3BNT+y8B_#4AzqN!E1&qPyP}JmvQVxo3_9O_m0X*jR`xdsAQeCtSdCN z0KyR|xQz)P-f1wrDZSl&-NpDW=PH!~TC$%MGa@IOTC}m#Ch7AW5usEqW;11$mIxUXuv|cB7Usfr+X+(*O}Z*jF0Oj`$$yj_qK8~T#6Xlr z_4z~aj0?E!sYbz3zpq-Yhb1Kyr;Yj9(^I$c;B)wlp`Q@*BZEFD4%qJ#Ri{=R z$7}sq8he6P1dbykya+lGQ{#5jW4%(=s$Lgjc=l!Ab@a-;mFF{~7z5YGcSf~i{k`IK zI`mZ%UyO-Ad*;fm#`GFmK?V#Q1MvTEl>X7-zdt_%{`K|G{nEcP{;BBqe+dA<&w#2w z^8Op+e` Date: Wed, 1 Jul 2026 10:03:14 +0000 Subject: [PATCH 18/20] chore: drop .devforge working files from the PR and gitignore them These are internal devforge run artifacts, not part of the change. Untracked and ignored so they no longer appear in the PR diff. Co-Authored-By: Claude Opus 4.8 Claude-Session: https://claude.ai/code/session_01VrUiE5CzcJ9TiRTqvqb1JE --- .devforge/.gitignore | 5 -- .devforge/1-triage.md | 33 ------- .devforge/2-design.md | 65 -------------- .devforge/_create_pr.approved | 1 - .devforge/_design.approved | 1 - .devforge/_panel.json | 13 --- .devforge/_progress.md | 33 ------- .devforge/_request_fact_check.md | 20 ----- .devforge/_state.json | 1 - .devforge/_user_request.md | 1 - .devforge/_verified_task.md | 29 ------ .devforge/config.json | 27 ------ .devforge/iter-1-rev2/claim.md | 43 --------- .devforge/iter-1-rev2/review-staff-review.md | 18 ---- .devforge/iter-1/claim.md | 30 ------- .devforge/iter-1/oracle-run.txt | 10 --- .devforge/iter-1/review-staff-review.md | 76 ---------------- .devforge/iter-2-rev2/claim.md | 22 ----- .../iter-2-rev2/final-review-code-review.md | 51 ----------- .../iter-2-rev2/final-review-thermonuclear.md | 16 ---- .devforge/iter-2-rev2/review-staff-review.md | 73 --------------- .devforge/iter-2/claim.md | 25 ------ .devforge/iter-2/final-review-code-review.md | 26 ------ .../iter-2/final-review-thermonuclear.md | 67 -------------- .devforge/iter-2/review-staff-review.md | 88 ------------------- .devforge/iter-3-rev2/claim.md | 29 ------ .../iter-3-rev2/final-review-code-review.md | 76 ---------------- .../iter-3-rev2/final-review-thermonuclear.md | 73 --------------- .devforge/iter-3/claim.md | 28 ------ .devforge/iter-3/final-review-code-review.md | 22 ----- .../iter-3/final-review-thermonuclear.md | 58 ------------ .devforge/iter-4-rev2/claim.md | 17 ---- .../iter-4-rev2/final-review-code-review.md | 11 --- .../iter-4-rev2/final-review-thermonuclear.md | 26 ------ .devforge/pr-ecosystem-section.md | 65 -------------- .gitignore | 1 + 36 files changed, 1 insertion(+), 1179 deletions(-) delete mode 100644 .devforge/.gitignore delete mode 100644 .devforge/1-triage.md delete mode 100644 .devforge/2-design.md delete mode 100644 .devforge/_create_pr.approved delete mode 100644 .devforge/_design.approved delete mode 100644 .devforge/_panel.json delete mode 100644 .devforge/_progress.md delete mode 100644 .devforge/_request_fact_check.md delete mode 100644 .devforge/_state.json delete mode 100644 .devforge/_user_request.md delete mode 100644 .devforge/_verified_task.md delete mode 100644 .devforge/config.json delete mode 100644 .devforge/iter-1-rev2/claim.md delete mode 100644 .devforge/iter-1-rev2/review-staff-review.md delete mode 100644 .devforge/iter-1/claim.md delete mode 100644 .devforge/iter-1/oracle-run.txt delete mode 100644 .devforge/iter-1/review-staff-review.md delete mode 100644 .devforge/iter-2-rev2/claim.md delete mode 100644 .devforge/iter-2-rev2/final-review-code-review.md delete mode 100644 .devforge/iter-2-rev2/final-review-thermonuclear.md delete mode 100644 .devforge/iter-2-rev2/review-staff-review.md delete mode 100644 .devforge/iter-2/claim.md delete mode 100644 .devforge/iter-2/final-review-code-review.md delete mode 100644 .devforge/iter-2/final-review-thermonuclear.md delete mode 100644 .devforge/iter-2/review-staff-review.md delete mode 100644 .devforge/iter-3-rev2/claim.md delete mode 100644 .devforge/iter-3-rev2/final-review-code-review.md delete mode 100644 .devforge/iter-3-rev2/final-review-thermonuclear.md delete mode 100644 .devforge/iter-3/claim.md delete mode 100644 .devforge/iter-3/final-review-code-review.md delete mode 100644 .devforge/iter-3/final-review-thermonuclear.md delete mode 100644 .devforge/iter-4-rev2/claim.md delete mode 100644 .devforge/iter-4-rev2/final-review-code-review.md delete mode 100644 .devforge/iter-4-rev2/final-review-thermonuclear.md delete mode 100644 .devforge/pr-ecosystem-section.md diff --git a/.devforge/.gitignore b/.devforge/.gitignore deleted file mode 100644 index 39d36f34..00000000 --- a/.devforge/.gitignore +++ /dev/null @@ -1,5 +0,0 @@ -# Regenerable per-iteration transients -iter-*/diff.patch -iter-*/test-results.txt -oracle_header_decode -oracle_header_decode.rs diff --git a/.devforge/1-triage.md b/.devforge/1-triage.md deleted file mode 100644 index 623a9b9d..00000000 --- a/.devforge/1-triage.md +++ /dev/null @@ -1,33 +0,0 @@ -# Triage — issue #479 (header decoding) - -## Problem -Response header values are decoded byte-by-byte with `b as char` (ISO-8859-1) in -`impit-node/src/response.rs:96` and `impit-python/src/response.rs:542`. This was introduced -intentionally by PR #434 to fix #430 (non-ASCII header bytes crashed Node / returned empty in -Python). But ISO-8859-1 decoding garbles the common case of UTF-8 header values (e.g. -`Content-Disposition: filename="naïve.pdf"`) into mojibake (`ï` → `ï`). - -The two positions genuinely conflict: a header byte sequence can't be decoded as both latin-1 -and UTF-8. #434 wants byte-preservation (RFC 9110 obs-text is latin-1); #479 wants correct -UTF-8. The maintainer explicitly left it open: "We might reinvestigate the best way forward." - -## Decision: PROCEED -Both code claims VALID (code present at both sites; Python line is now 542, not 544 — minor -staleness only). Real, unresolved defect the maintainer wants revisited. - -## Complexity: medium -Small code change (~2 call sites + tests in 2 bindings) but it alters the public response-header -contract across both language bindings → blast-radius override lifts it to at least medium. - -## Review-only? no — there is a fix to build. - -## Approach sketch (high level) -Decode as UTF-8 when the bytes are valid UTF-8, otherwise fall back to the existing -byte-preserving latin-1 decode. This fixes #479's UTF-8 case while keeping #434's test (which -sends bare `0xE4`, invalid UTF-8 → latin-1 `ä`) green. Never emits replacement chars, so #430's -crash/empty regression stays fixed. Apply symmetrically in Node and Python; add a UTF-8 test. - -## Open questions -- Should a shared helper live in the core `impit` crate vs. duplicated per binding? -- Do we want to also expose raw header bytes for signature/HMAC callers (issue mentions this)? - Likely out of scope for the core fix; note as follow-up. diff --git a/.devforge/2-design.md b/.devforge/2-design.md deleted file mode 100644 index 0d191bbf..00000000 --- a/.devforge/2-design.md +++ /dev/null @@ -1,65 +0,0 @@ -# Design (rev 2) — fix #479: per-ecosystem header decoding + raw-bytes accessor - -## What we're solving -Header values are decoded as ISO-8859-1 (`b as char`), garbling UTF-8 headers (#479). Rather -than force one behavior on both bindings, we make each binding faithful to the reference client -it already claims to implement, and give callers who need exact bytes (HMAC/signature checks) a -raw accessor — mirroring what httpx already offers. - -Decision (confirmed with maintainer): **behave like httpx in Python, like Fetch in JS**, and -**add a raw-header-bytes accessor to both**. - -## How -**Decoding (the split):** -- **Python = httpx semantics.** Decode UTF-8-first with an ISO-8859-1 fallback (httpx tries - ascii→utf-8→iso-8859-1). This is the shared `decode_header_value` helper we already built. -- **JS = Fetch semantics.** Keep strict ISO-8859-1 isomorphic decode (`b as char`, i.e. PR - #434's behavior). This means impit-node's string headers stay byte-recoverable via the standard - `Buffer.from(v, 'binary')` idiom, matching `fetch()`/undici/axios. Revert the JS call site to - `b as char` and drop the JS "UTF-8 header" string test. - -**Raw-bytes accessor (new public API, both bindings):** -- **Python** — httpx-`.raw`-like: expose `raw_headers` on the response as - `list[tuple[bytes, bytes]]` (name, value). -- **JS** — no Fetch precedent, so this is an explicit impit extension: expose `rawHeaders` on the - response as `Array<[string, Uint8Array]>` (name as string per Fetch conventions, value as raw - bytes). Justified because HMAC callers need exact bytes and latin-1 strings, while recoverable, - are error-prone to reverse by hand. Preserved across `clone()`. - -Both accessors return the untouched header VALUE bytes, so a signature/HMAC caller never depends -on string decoding. **Caveat (reqwest limitation):** by the time impit sees a response, reqwest's -`HeaderMap` has already normalized header names to lowercase and discarded the original -cross-header wire order, so — unlike httpx's `.raw`, which keeps the raw list — names are -lowercased and order is not the wire order. Duplicate values for a given name are preserved. The -value bytes (the part that matters for signatures) are exact. - -## Alternatives + the call -- **Symmetric UTF-8-first in both (previous rev):** rejected per maintainer — deviates from - strict Fetch on JS and breaks the `Buffer.from(v,'binary')` recovery idiom. -- **Latin-1 everywhere + raw bytes only:** rejected — leaves Python worse than httpx. -- **Skip the raw accessor:** rejected — HMAC/signature callers have no correct alternative once - decoding is lossy (distinct byte sequences can decode to the same string). -- **Chosen:** per-ecosystem decode + raw accessor in both. - -## Major changes (key areas) -- Core crate: keep `decode_header_value` (UTF-8-first); Python consumes it, JS does not. -- `impit-python/src/response.rs`: keep helper for the string dict; add `raw_headers` getter - returning byte-pair tuples. -- `impit-node/src/response.rs`: revert string decode to `b as char`; add `rawHeaders` accessor - returning name/`Uint8Array` pairs; update the `.d.ts`/napi surface. -- Tests: Python — UTF-8 decodes correctly + `raw_headers` returns exact bytes. JS — existing - latin-1 test stays; add a `rawHeaders` test asserting exact wire bytes (and that string decode - remains latin-1). Core — existing `decode_header_value` unit tests unchanged. -- Docs: note the intentional Python/JS decoding difference; note JS `rawHeaders` is an impit - extension beyond Fetch. - -## Risks / open questions -- **#479 becomes a partial fix by design:** JS UTF-8 headers stay latin-1 (mojibake) on the - string API; the fix for JS callers is `rawHeaders` + their own decode, matching Fetch. The - issue/PR must state this explicitly so it isn't read as "not fixed." -- **New public API surface in both bindings** — naming (`raw_headers` / `rawHeaders`), return - shapes, and multi-value/duplicate semantics are the things to lock at this gate. -- **Build/oracle limit unchanged:** full workspace + napi/maturin can't build here - (github.com/apify/h2 egress 403). Core helper is oracle-tested via standalone rustc; the new - accessors' binding compilation + JS/Py tests must be verified in CI. The `rawHeaders` napi/pyo3 - wiring in particular is only compile-checkable in CI. diff --git a/.devforge/_create_pr.approved b/.devforge/_create_pr.approved deleted file mode 100644 index 93b4fdee..00000000 --- a/.devforge/_create_pr.approved +++ /dev/null @@ -1 +0,0 @@ -approved by josef.prochazka@apify.com in chat on 2026-07-01 (create-PR) diff --git a/.devforge/_design.approved b/.devforge/_design.approved deleted file mode 100644 index 022284ca..00000000 --- a/.devforge/_design.approved +++ /dev/null @@ -1 +0,0 @@ -approved by josef.prochazka@apify.com in chat on 2026-07-01 (design gate rev 2: asymmetric decode + raw_headers/rawHeaders accessors, API shapes approved) diff --git a/.devforge/_panel.json b/.devforge/_panel.json deleted file mode 100644 index 3915fad5..00000000 --- a/.devforge/_panel.json +++ /dev/null @@ -1,13 +0,0 @@ -{ - "tier": "large", - "reason": "Now a public-API change in both bindings: adds a new raw-bytes accessor (raw_headers / rawHeaders) plus an intentional per-binding behavior split. New public contract surface in two language bindings lifts this from medium to large.", - "reviewers": [ - { "use": "staff-review", "model": "sonnet" } - ], - "final_reviewers": [ - { "use": "thermonuclear", "model": "sonnet" }, - { "use": "code-review", "model": "sonnet" } - ], - "inner_iterations": 3, - "final_review_rounds": 2 -} diff --git a/.devforge/_progress.md b/.devforge/_progress.md deleted file mode 100644 index 8dee281e..00000000 --- a/.devforge/_progress.md +++ /dev/null @@ -1,33 +0,0 @@ -# Progress - -- Repo: apify/impit, branch `claude/issue-479-fixes-r2554a`, base `master`, HEAD 9d2204f. -- Triage: PROCEED, complexity medium (blast-radius override on public response contract). -- Verify: both code claims VALID (Python line stale 544→542). Issue's suggested `from_utf8_lossy` - is incomplete (regresses #434). Chosen fix: UTF-8-first, latin-1 fallback. -- Explore: shared helper home = `impit/src/response_parsing/mod.rs`, re-exported via - `impit::utils`. Existing regression guard: `impit-node/test/basics.test.ts:569` + - `mock.server.ts:105-118` (sends bare 0xE4, expects `ä`). - -## Oracle -- Commands: standalone `rustc --edition 2021 --test .devforge/oracle_header_decode.rs && run`. -- Reason: full `cargo build/test` BLOCKED — pinned git dep github.com/apify/h2 → 403 via org - egress proxy, cargo git cache empty. Org policy denial; not routed around. Standalone rustc - test proves the pure decode algorithm; binding compile/integration deferred to CI. - -## Resolved registry (from config.json + registry.base.json) -- verify_request: brainstorming/opus | architect: writing-plans/opus | implementer: feature-dev/opus -- reviewers: staff-review/sonnet -- final_reviewers: thermonuclear/sonnet, code-review/sonnet -- limits: inner_iterations 3, final_review_rounds 2 | plan_mode_gate true - -## State -- Phase: design-gate (awaiting human approval before any source edit). - -## Finish (rev 2) -- PR #492 opened: https://github.com/apify/impit/pull/492 (base master). -- Oracle green throughout (rustfmt/rustc/rustdoc + ruff/py_compile). Binding compile + JS/Py test - execution are CI-gated (h2 egress 403 blocks local build). -- Reviewer staff-review: PASS. Final reviewers thermonuclear + code-review: both PASS after 2 - final-reopen rounds (fixed: clone() dropping rawHeaders, index.d.ts, wire-order/casing overclaims). -- Approvals: design gate rev1 + rev2 (chat), create-PR (chat) 2026-07-01. -- Phase: done. diff --git a/.devforge/_request_fact_check.md b/.devforge/_request_fact_check.md deleted file mode 100644 index c83a62c4..00000000 --- a/.devforge/_request_fact_check.md +++ /dev/null @@ -1,20 +0,0 @@ -# Request fact-check — claim ledger (verified against HEAD 9d2204f) - -| # | Claim | Verdict | Evidence | -|---|-------|---------|----------| -| 1 | Node decodes headers with `b as char` at `impit-node/src/response.rs:96` | VALID | Line 96: `v.as_bytes().iter().map(|&b| b as char).collect(),` | -| 2 | Python decodes headers with `b as char` at `impit-python/src/response.rs:544` | STALE(→ `impit-python/src/response.rs:542`) | Same code, line shifted to 542: `...collect::()` | -| 3 | This interprets each byte as Latin-1 / ISO-8859-1 | VALID | `b as char` on a `u8` maps 0x00–0xFF → U+0000–U+00FF (Latin-1 code points) | -| 4 | UTF-8 header values are garbled into mojibake | VALID | For `ï` (UTF-8 `0xC3 0xAF`), latin-1 decode yields two chars `ï`; re-encoding yields different bytes | -| 5 | Behavior was intentional (PR #434, fixes #430) | VALID | PR #434 merged 2026-04-13; maintainer comment on #479: "This was intentional... We might reinvestigate the best way forward." | -| 6 | Suggested `from_utf8_lossy` is the right fix | LIKELY-FIXED-BUT-INCOMPLETE | `from_utf8_lossy` fixes UTF-8 but REGRESSES #434's latin-1 case: bare `0xE4` (invalid UTF-8) → `U+FFFD` replacement char, not `ä`, and is lossy/irreversible. A try-UTF-8-then-latin-1 fallback is strictly better. | - -## Existing locked-in behavior (regression guard) -- `impit-node/test/basics.test.ts:569` sends header byte `0xE4` (mock.server.ts:111, invalid - UTF-8) and asserts it decodes to `ä`. Any fix MUST keep this green. -- No equivalent Python test exists yet. - -## Verdict -PROCEED. Core defect is real and open. The issue's own suggested fix (`from_utf8_lossy`) is -incomplete — it would reintroduce corruption for the exact case #434 fixed. Correct resolution -is a UTF-8-first decode with latin-1 fallback, applied to both bindings. diff --git a/.devforge/_state.json b/.devforge/_state.json deleted file mode 100644 index 26e959d3..00000000 --- a/.devforge/_state.json +++ /dev/null @@ -1 +0,0 @@ -{"phase":"done","iteration":4,"head_sha":"9d2204fec0b3ac1acf3c42b3e0c86463e09e8f49","pr":"https://github.com/apify/impit/pull/492"} diff --git a/.devforge/_user_request.md b/.devforge/_user_request.md deleted file mode 100644 index 9699f065..00000000 --- a/.devforge/_user_request.md +++ /dev/null @@ -1 +0,0 @@ -Investigate and suggest fixes for https://github.com/apify/impit/issues/479 , take into account https://github.com/apify/impit/pull/434 diff --git a/.devforge/_verified_task.md b/.devforge/_verified_task.md deleted file mode 100644 index 889b4045..00000000 --- a/.devforge/_verified_task.md +++ /dev/null @@ -1,29 +0,0 @@ -# Verified task — issue #479 - -## What must be true -Response header values must decode correctly for the common modern case (UTF-8, e.g. -`Content-Disposition: filename="naïve.pdf"`) WITHOUT regressing the case PR #434 fixed -(non-ASCII latin-1 bytes such as `0xE4` = `ä`, which previously crashed Node / emptied Python). - -## Corrected references (verified at HEAD 9d2204f) -- Node: `impit-node/src/response.rs:96` -- Python: `impit-python/src/response.rs:542` (issue said 544 — stale) -- Existing regression guard: `impit-node/test/basics.test.ts:569` + `impit-node/test/mock.server.ts:105-118` -- Shared helper candidate home: `impit/src/response_parsing/mod.rs`, re-exported via `impit::utils` - -## Acceptance (rev 2 — per-ecosystem, confirmed with maintainer) -1. Python decodes headers httpx-style: UTF-8-first with ISO-8859-1 fallback, never crash/empty/ - `U+FFFD` (fixes #479 for Python; keeps #430/#434 guarantees). -2. JS decodes headers Fetch-style: strict ISO-8859-1 isomorphic decode (`b as char`), so string - values stay byte-recoverable via `Buffer.from(v,'binary')`. JS UTF-8 mojibake is intentional. -3. Both bindings expose a raw-bytes accessor returning the exact header VALUE bytes (duplicate - values preserved): Python `raw_headers: list[tuple[bytes,bytes]]`; JS - `rawHeaders: Array<[string, Uint8Array]>` (impit extension). Caveat imposed by reqwest's - `HeaderMap`: header names are normalized to lowercase and original cross-header wire order is - NOT preserved — so this is httpx-`.raw`-*like*, not byte-identical. JS `rawHeaders` survives - `clone()`. -4. Tests: Python UTF-8 decode + raw bytes exact; JS latin-1 decode retained + raw bytes exact. -5. #479 resolution documented as intentionally split (JS = Fetch parity + rawHeaders escape hatch). - -## Now in scope (was previously deferred) -Raw header bytes accessor for HMAC/signature callers — included per maintainer decision. diff --git a/.devforge/config.json b/.devforge/config.json deleted file mode 100644 index 5f94129e..00000000 --- a/.devforge/config.json +++ /dev/null @@ -1,27 +0,0 @@ -{ - "stages": { - "verify_request": { "use": "brainstorming", "model": "opus" }, - "architect": { "use": "writing-plans", "model": "opus" }, - "implementer": { "use": "feature-dev", "model": "opus" }, - "reviewers": [ - { "use": "staff-review", "model": "sonnet" } - ], - "final_reviewers": [ - { "use": "thermonuclear", "model": "sonnet" }, - { "use": "code-review", "model": "sonnet" } - ] - }, - "oracle": { - "commands": [ - "rustfmt --edition 2021 --check /home/user/impit/impit/src/response_parsing/mod.rs /home/user/impit/impit/src/lib.rs /home/user/impit/impit-node/src/response.rs /home/user/impit/impit-python/src/response.rs", - "rustc --edition 2021 --test /home/user/impit/.devforge/oracle_header_decode.rs -o /home/user/impit/.devforge/oracle_header_decode && /home/user/impit/.devforge/oracle_header_decode", - "rustdoc --edition 2021 --test /home/user/impit/.devforge/oracle_header_decode.rs", - "ruff check /home/user/impit/impit-python/test/response_test.py /home/user/impit/impit-python/test/async_client_test.py", - "ruff format --check /home/user/impit/impit-python/test/response_test.py /home/user/impit/impit-python/test/async_client_test.py", - "python3 -m py_compile /home/user/impit/impit-python/test/response_test.py /home/user/impit/impit-python/test/async_client_test.py" - ], - "note": "Full-workspace `cargo test` cannot run here: the pinned git dependency github.com/apify/h2 returns 403 through the org egress proxy and its cargo cache is empty. The oracle instead compiles+runs a standalone rustc --test copy of the pure decode_header_value helper, verifying the exact algorithm (UTF-8-first with latin-1 fallback) independent of h2. Binding compile/integration must be confirmed in CI where github egress is permitted." - }, - "limits": { "inner_iterations": 3, "final_review_rounds": 2 }, - "plan_mode_gate": true -} diff --git a/.devforge/iter-1-rev2/claim.md b/.devforge/iter-1-rev2/claim.md deleted file mode 100644 index a52934ac..00000000 --- a/.devforge/iter-1-rev2/claim.md +++ /dev/null @@ -1,43 +0,0 @@ -# Iteration 1 (rev 2) — implementer claim - -## Done — asymmetric decode -- **Python (httpx):** keeps `decode_header_value` (UTF-8-first, latin-1 fallback) for the string - `headers` dict. Unchanged from before. -- **JS (Fetch):** reverted the string decode to strict ISO-8859-1 (`b as char`) in - `try_from_response`, and removed `decode_header_value` from the Node import. String header - values now match Fetch and stay byte-recoverable via `Buffer.from(v, 'latin1')`. - -## Done — raw-bytes accessor (new public API) -- **Python** `Response.raw_headers` → `list[tuple[bytes, bytes]]` (httpx `Headers.raw` parity), - wire order + duplicates preserved. New private field `raw_headers: Vec<(Vec, Vec)>` - captured in `from_async` (exact wire bytes) and in `new` (UTF-8 bytes of the string headers); - getter builds `PyBytes`. -- **JS** `response.rawHeaders` → `Array<[string, Uint8Array]>` (impit extension). New private - field `raw_header_pairs: Vec<(String, Vec)>` captured in `try_from_response`; napi getter - maps to `Uint8Array`. The JS wrapper (`index.wrapper.js`) returns the patched native object and - does not overwrite `rawHeaders`, so the native getter surfaces to users without extra plumbing - (verified by reading `#wrapResponse`, which returns `originalResponse`). - -## Tests -- **Core:** existing 5 `decode_header_value` unit tests unchanged (Python path). -- **JS** (`basics.test.ts` + `mock.server.ts`): existing latin-1 test (#434) kept; my previous - UTF-8-string test replaced with a `rawHeaders` test — asserts the string form is latin-1 - (mojibake), `rawHeaders` yields the exact UTF-8 bytes, and `Buffer.from(latin1,'latin1')` - round-trips to those bytes. -- **Python** (`response_test.py`): new `test_response_raw_headers` asserting `(bytes, bytes)` - shape and exact UTF-8 bytes for a non-ASCII value. - -## Oracle — green (what it can cover) -- `rustfmt --check` on all four touched Rust files: CLEAN. -- `rustc --test` core unit tests: 5/5. `rustdoc --test`: 1/1. - -## NOT verifiable in this environment — must be confirmed by CI (disclosed at the design gate) -- **Binding compilation.** napi (`Uint8Array::from(Vec)`, `Vec<(String, Uint8Array)>` getter - return) and pyo3 (`PyBytes::new`, `Vec<(Bound, Bound)>` getter) glue cannot - be compiled here — the `github.com/apify/h2` git dep is egress-blocked (403). These follow the - existing patterns in each crate but are UNVERIFIED against the compiler. -- **napi `index.d.ts` regeneration** for the new `rawHeaders` getter happens at `napi build` in - CI; the committed `.d.ts` is intentionally not hand-edited. -- **JS/Python test execution** needs the built native module (napi/maturin) — CI only. -- Highest-risk specifics to watch in CI: the exact `Uint8Array` constructor, tuple→array - ToNapiValue, and the pyo3 `Bound` tuple return. diff --git a/.devforge/iter-1-rev2/review-staff-review.md b/.devforge/iter-1-rev2/review-staff-review.md deleted file mode 100644 index f9e67b51..00000000 --- a/.devforge/iter-1-rev2/review-staff-review.md +++ /dev/null @@ -1,18 +0,0 @@ -VERDICT: FAIL - -## Findings - -### major: Python has no test exercising the real-fetch path (`from_async`) for either UTF-8 header decoding or `raw_headers` - -- File: `impit-python/test/response_test.py:43-53` (the only new Python test, `test_response_raw_headers`) -- File: `impit-python/src/response.rs:545-635` (`from_async`, the constructor actually used for real HTTP responses; wires `decode_header_value` at line 570 and populates `raw_headers` at lines 573-577) - -`test_response_raw_headers` builds its `Response` via the `#[new]` constructor (`impit-python/src/response.rs:237-285`), which never calls `decode_header_value` and derives `raw_headers` by simply re-encoding the Python string headers as UTF-8 (`impit-python/src/response.rs:249-252`). That path can't fail: any Python `str` header value round-trips through `.encode('utf-8')` trivially, so the test cannot detect a bug in `decode_header_value`'s UTF-8-first/ISO-8859-1-fallback logic, nor in `from_async`'s wiring of `headers`/`raw_headers` from real `reqwest::HeaderMap` bytes, nor an order/duplicate mismatch between the two collections built by two separate `val.headers().iter()` passes (lines 567-571 and 573-577). - -This matters because issue #479 and the design are specifically about *real HTTP responses* carrying UTF-8 header bytes — the `new()` constructor path is not where the bug lived. Concretely, if a regression were introduced in `from_async` (e.g. a typo mapping `raw_headers` from `k.as_str()` on the wrong header, or `decode_header_value` never actually being called on the live path), the added test suite would not catch it; only the pre-existing `test_response_headers_encoding`/ASCII-header tests exercise `from_async`, and none of them use non-ASCII or UTF-8 header bytes. - -The JS side, by contrast, does exercise the equivalent real-fetch path end-to-end: `impit-node/test/basics.test.ts:574-590` fetches through `impit.fetch(...)` (going through `try_from_response`, the wrapper, and the `rawHeaders` getter) against a raw-socket mock server route (`impit-node/test/mock.server.ts:124-138`) that writes literal UTF-8 bytes on the wire, then asserts both the latin-1 string mojibake and the exact `rawHeaders` bytes. - -Python test infrastructure for this already exists and is used elsewhere in the same manner needed here: `impit-python/test/async_client_test.py:16-46` defines raw-socket servers (`thread_server`, `truncating_server`) that hand-craft an HTTP response header block and are exercised via `AsyncClient`/`Client` — i.e., through `from_async`. The acceptance doc (`_verified_task.md` item 4: "Python UTF-8 decode + raw bytes exact") and the design (`2-design.md`: "Python — UTF-8 decodes correctly + `raw_headers` returns exact bytes") both call for this; the change as submitted only satisfies it for the constructor path, not the fetch path, leaving the actually-fixed behavior (issue #479) unverified by any Python test. - -**Fix scope**: add one raw-socket-based Python test (mirroring `thread_server`/`truncating_server`) that sends a header value with UTF-8 bytes (e.g. `naïve.pdf`) over `AsyncClient`/`Client`, and asserts (a) `response.headers[...]` decodes to the correct UTF-8 string and (b) `response.raw_headers` contains the exact wire bytes for that header, in order, matching what `headers` decoded. diff --git a/.devforge/iter-1/claim.md b/.devforge/iter-1/claim.md deleted file mode 100644 index c79abc12..00000000 --- a/.devforge/iter-1/claim.md +++ /dev/null @@ -1,30 +0,0 @@ -# Iteration 1 — implementer claim - -## Done -- Added `decode_header_value(&[u8]) -> String` to `impit/src/response_parsing/mod.rs`: - UTF-8-first (`String::from_utf8`) with a byte-preserving ISO-8859-1 fallback - (`b as char`) on invalid UTF-8. Documented with rustdoc + doctest. -- Re-exported it via `impit::utils` in `impit/src/lib.rs`. -- Replaced the inline `b as char` header map in both bindings with a call to the shared helper: - - `impit-node/src/response.rs` (import + call site). - - `impit-python/src/response.rs` (import + call site). -- Added core unit tests (ASCII, empty, UTF-8/#479, invalid-UTF-8 latin-1/#434, and a - no-replacement-char + byte-roundtrip guard for #430). -- Added a Node UTF-8 regression test: new `/utf8-header` route in `mock.server.ts` sending real - UTF-8 bytes (`ï` = 0xC3 0xAF) and an assertion in `basics.test.ts` that it decodes to - `attachment; filename="naïve.pdf"`. The existing latin-1 test is the fallback guard. - -## Oracle -- Green. `rustc --test` on a standalone exact copy of the helper: 5/5 tests pass - (see `iter-1/test-results.txt`). This proves the algorithm for all four use cases. - -## Skipped / not done — with reason -- **Full `cargo build`/`cargo test` and the Node/Python test suites: NOT run.** The workspace - pins git dep `github.com/apify/h2`, which returns 403 through the org egress proxy (cache - empty). This is an environment/policy limit, not a code issue. Binding compilation and the JS - test I added must be verified in CI where github egress is allowed. The core-crate helper is - simple, self-contained, and validated by the standalone oracle. -- **Raw-header-bytes API (for HMAC/signature callers):** intentionally out of scope per - `2-design.md`; noted as a follow-up. -- Did not touch `impit/src/fingerprint/mod.rs:47` `... as char` — unrelated random-string - generation, not header decoding. diff --git a/.devforge/iter-1/oracle-run.txt b/.devforge/iter-1/oracle-run.txt deleted file mode 100644 index 91714a1c..00000000 --- a/.devforge/iter-1/oracle-run.txt +++ /dev/null @@ -1,10 +0,0 @@ - -running 5 tests -test tests::empty_is_empty ... ok -test tests::ascii_is_unchanged ... ok -test tests::iso_8859_1_fallback_never_produces_replacement_char ... ok -test tests::utf8_is_decoded_as_utf8 ... ok -test tests::invalid_utf8_falls_back_to_iso_8859_1 ... ok - -test result: ok. 5 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s - diff --git a/.devforge/iter-1/review-staff-review.md b/.devforge/iter-1/review-staff-review.md deleted file mode 100644 index 904c0e75..00000000 --- a/.devforge/iter-1/review-staff-review.md +++ /dev/null @@ -1,76 +0,0 @@ -VERDICT: FAIL - -## blocker - -1. **Broken doctest — will fail `cargo test -p impit` in CI.** - `impit/src/response_parsing/mod.rs:154`: - ``` - /// assert_eq!(decode_header_value(&[b'n', b'a', 0xC3, 0xAF, b've']), "naïve"); - ``` - `b've'` is not valid Rust syntax: `b'...'` is a *byte literal* and must contain exactly one - ASCII byte (e.g. `b'v'`), not the two-character sequence `'ve'`. This is a compile error, not - a runtime failure — rustdoc rejects it outright. - - Verified independently: extracted the exact doctest body (using only `String::from_utf8` / - `into_bytes` — no dependency on the blocked `apify/h2` git patch) into a standalone file and - ran `rustdoc --test`: - ``` - error: if you meant to write a byte string literal, use double quotes - 6 - assert_eq!(decode_header_value(&[b'n', b'a', 0xC3, 0xAF, b've']), "naïve"); - 6 + assert_eq!(decode_header_value(&[b'n', b'a', 0xC3, 0xAF, b"ve"]), "naïve"); - test result: FAILED. 0 passed; 1 failed - ``` - The doctest block is plain ` ```rust ` (not `no_run`/`ignore`/`compile_fail`), so it is - collected and compiled by `cargo test --doc` (part of the plain `cargo test -p impit` the - `test` job in `.github/workflows/format.yaml` runs). This is a real CI-breaking bug, distinct - from the disclosed "workspace won't build here" limitation — the syntax error is detectable - with a bare `rustdoc --test` on the snippet alone and has nothing to do with the blocked `h2` - dependency. The task's oracle (`iter-1/test-results.txt`) only ran the `#[cfg(test)] mod tests` - unit tests via a standalone `rustc --test`, which does not execute doctests, so this bug slipped - through undetected. - - Fix: change `b've'` to `b"ve"` and adjust the closing type (e.g. build the array with a byte - string / `..*b"ve"` or just spell out `b'v', b'e'` as separate elements) so the example - actually compiles. - -2. **rustfmt violation — will fail the `fmt` job in `.github/workflows/format.yaml`.** - `impit/src/response_parsing/mod.rs:184` (test `invalid_utf8_falls_back_to_iso_8859_1`): - ``` - let bytes = [b'D', b'i', b'e', b'n', b's', b't', b'a', b'g', b',', b' ', b'3', b'1', b'.', b' ', b'M', 0xE4, b'r', b'z', b' ', b'2', b'0', b'2', b'6']; - ``` - This line exceeds rustfmt's line-width limit and is not wrapped. Verified by running - `rustfmt --check` (installed locally, rustfmt 1.8.0) against the file as shipped in the diff: - it reports a diff at exactly this line, reformatting the array onto multiple lines. The repo's - `format.yaml` workflow runs `actions-rust-lang/rustfmt@v1` on every PR — this diff has not been - run through `cargo fmt` and will fail that check as-is. - - Fix: run `cargo fmt` (or manually wrap the array literal) before landing. - -## Notes (not separate findings, context for the two blockers above) - -- The core algorithm in `decode_header_value` (`impit/src/response_parsing/mod.rs:159-162`) is - correct and matches the design exactly: `String::from_utf8` succeeds and returns UTF-8-decoded - text whenever the whole byte slice is valid UTF-8 (fixes #479), and on failure - `e.into_bytes()` yields the **entire original** byte vector (verified experimentally — not - just the invalid tail), which is then mapped 1:1 byte→codepoint, reproducing the exact `#434` - latin-1 fallback (e.g. lone `0xE4` → `ä`) with no `U+FFFD` ever introduced (#430). Confirmed - with an independent standalone `rustc` build exercising mixed valid/invalid byte sequences, - multi-byte lead-without-continuation sequences, and empty input — all behave as designed. -- Both binding call sites are correctly updated: `impit-node/src/response.rs:96` and - `impit-python/src/response.rs:545` both now call `decode_header_value(v.as_bytes())`, and the - imports (`impit-node/src/response.rs:3`, `impit-python/src/response.rs:8-11`) correctly pull - the newly re-exported `impit::utils::decode_header_value` (re-export added at - `impit/src/lib.rs:81`, mirroring the existing `decode`/`determine_encoding`/`ContentType` - re-export pattern from the private `response_parsing` module — consistent and correct). No - other `b as char` header-decoding call sites were missed (grepped the whole tree; the only - other `as char` usage is an unrelated random-string generator in `fingerprint/mod.rs:47`). -- The Node test addition (`impit-node/test/basics.test.ts:574-577`, - `impit-node/test/mock.server.ts:27-30,124-138`) correctly exercises the intended bytes: it - writes the UTF-8 bytes for `naïve.pdf` (`0xC3 0xAF` for `ï`) directly onto the raw socket, - mirroring the existing raw-socket pattern used by the `#434` regression test for `0xE4`, so it - actually validates the wire-level decode path rather than something already normalized by a - higher-level HTTP client on the server side. Good test; adequate to cover acceptance criterion - 1 and 5 from `_verified_task.md`. -- The two blockers above are cheap, mechanical fixes (change one byte literal; run `cargo fmt`) - and do not implicate the core design or algorithm, but per review discipline any verified - finding — including things this trivial — means the diff cannot pass as-is. diff --git a/.devforge/iter-2-rev2/claim.md b/.devforge/iter-2-rev2/claim.md deleted file mode 100644 index a3ab9b7d..00000000 --- a/.devforge/iter-2-rev2/claim.md +++ /dev/null @@ -1,22 +0,0 @@ -# Iteration 2 (rev 2) — implementer claim - -## Addressed the iter-1-rev2 review finding (major) -- **Python `raw_headers`/decode only tested via the manual `new` constructor, not the real - `from_async` fetch path** — FIXED. Added a wire-level integration test - (`test_header_value_decoding_and_raw_bytes` in `async_client_test.py`) using a new raw-socket - `header_encoding_server`, mirroring the JS mock-server approach. It sends a UTF-8 header value - (`X-Utf8`) and a lone `0xE4` latin-1 byte (`X-Latin1`), then asserts: - - `response.headers['x-utf8']` decodes correctly as UTF-8 (httpx path, exercises - `decode_header_value` on a real response), - - `response.headers['x-latin1'] == 'März'` (latin-1 fallback), - - `response.raw_headers` yields the exact wire bytes for both. - -## Oracle — green (extended this iteration to cover Python) -- Rust: `rustfmt --check` CLEAN; `rustc --test` 5/5; `rustdoc --test` 1/1. -- Python (new): `ruff check` clean; `ruff format --check` clean; `py_compile` OK. (ruff caught a - real `UP012` `.encode('utf-8')` lint on the first pass — fixed.) - -## Still CI-gated (unchanged, disclosed at design gate) -- Binding compilation (napi/pyo3) and execution of the JS/Python tests need the built native - module; the `github.com/apify/h2` git dep is egress-blocked here. The new Python test's - behavior is verified by CI, not locally. diff --git a/.devforge/iter-2-rev2/final-review-code-review.md b/.devforge/iter-2-rev2/final-review-code-review.md deleted file mode 100644 index f58affc3..00000000 --- a/.devforge/iter-2-rev2/final-review-code-review.md +++ /dev/null @@ -1,51 +0,0 @@ -VERDICT: FAIL - -## Findings (confidence >= 80) - -### 1. [High severity] JS `rawHeaders` does not survive `Response.clone()` — silently `undefined` - -**File:** `impit-node/index.wrapper.js:417-492` (the `clone` override inside `#wrapResponse`), specifically lines 474-478. - -The task's own verification step asks whether `#wrapResponse` clobbers `rawHeaders`. It does not clobber it on the *primary* response object — `#wrapResponse` only calls `Object.defineProperty` on `text`, `bytes`, `arrayBuffer`, `json`, `headers`, and `clone` (`index.wrapper.js:379,387,396,405,413,417`), so `rawHeaders`, being a napi prototype getter on the returned `ImpitResponse` instance (`impit-node/src/response.rs:140-151`), is reachable on `originalResponse` as-is. - -However, `clone()` (redefined at `index.wrapper.js:417-492`) constructs its return value via `new Response(stream2, { status, statusText, headers })` at line 474 — the **standard built-in Web `Response`**, not an `ImpitResponse`. That class has no `rawHeaders` getter at all (own or inherited). Only `url` and `text` are manually stapled onto the clone (lines 479-488). - -**Concrete failing scenario:** -```js -const response = await impit.fetch(url); // has rawHeaders, works -const clone = response.clone(); -clone.rawHeaders; // undefined — silently absent, not an error -``` -Any caller who clones a response (a standard, encouraged Fetch pattern, e.g. to read headers in one branch and body in another) loses access to the new raw-bytes accessor with no error or warning. Since `rawHeaders` is the first `ImpitResponse`-only extension with no standard-`Response` equivalent (unlike `body`/`text`/`json`/`arrayBuffer`, which the clone already re-implements), this is a real, newly-introduced gap in the "surfaces to users through `index.wrapper.js`" requirement, not a pre-existing limitation that this diff merely inherits. - -### 2. [High severity] `raw_headers`/`rawHeaders` do not actually preserve wire order when duplicate header names are interleaved with other headers - -**Files:** -- `impit-node/src/response.rs:65-67` (doc comment) and `:101-107` (construction via `response.headers().iter()`) -- `impit-python/src/response.rs:230-231` (doc comment) and `:566-577` (construction via `val.headers().iter()`) - -Both doc comments explicitly claim: *"Raw, undecoded header name/value byte pairs, in wire order (duplicates preserved)."* Both are built by a single pass over `reqwest::Response::headers().iter()`, which delegates to `http::HeaderMap::iter()`. - -`http::HeaderMap`'s own documentation states iteration order is "arbitrary, but consistent... Each key will be yielded once per associated value" — i.e., it does NOT guarantee wire order. Structurally, `HeaderMap` stores the first value per distinct name in a primary table and all subsequent values for a repeated name in a separate `extra_values` side-list; `Iter::next()` drains all of a repeated name's extra values as soon as it reaches that name's slot, before moving to the next distinct name — regardless of what other headers arrived on the wire in between. - -**Concrete failing scenario:** A server sends, in this exact wire order: -``` -Content-Type: ... -X-Trace: span-1 -X-Multi: first -X-Trace: span-2 -X-Multi: second -``` -`raw_headers`/`rawHeaders` (and the underlying `HeaderMap::iter()`) will yield `X-Trace: span-1`, `X-Trace: span-2`, `X-Multi: first`, `X-Multi: second` — the second `X-Trace` value is reordered ahead of `X-Multi: first`, even though `X-Multi: first` arrived earlier on the wire. This was verified empirically by compiling and running a reproduction against the exact `http` crate version pinned in this repo's `Cargo.lock`. - -This directly undermines the acceptance criterion "both raw accessors return EXACT wire bytes with order + duplicates preserved" for any response with interleaved duplicate header names — precisely the scenario where the stated HMAC/signature use case (order-sensitive by nature) would silently get wrong data with no indication of failure. Existing tests only exercise duplicates that are sent back-to-back (e.g. consecutive `Set-Cookie` headers), which happens to preserve apparent order and does not catch this. - -### 3. [Medium-high severity] Python `raw_headers` header **names** are lowercased, not the exact wire bytes, contradicting the docstring's httpx-parity claim - -**File:** `impit-python/src/response.rs:456-461` (docstring) and `:573-577` (construction via `val.headers().iter()`, using `k.as_str().as_bytes()`). - -The docstring states this getter is the "httpx `Response.headers.raw` equivalent" and returns "exact wire bytes." `k` here is an `http::HeaderName`, whose `as_str()` always returns the lowercased ASCII form — `HeaderName` normalizes and stores names case-insensitively at construction and retains no original casing. Verified against real httpx 0.28.1 (`httpx.Headers.raw` returns `raw_key`, the original casing exactly as received, e.g. `[(b'X-Utf8', b'val')]` stays `X-Utf8`, not lowercased). - -**Concrete failing scenario:** A server sends `X-Signature: abc123`. `response.raw_headers` yields `(b'x-signature', b'abc123')` — the name is lowercased, unlike httpx's `.raw`, which would preserve `b'X-Signature'`. A caller building an HMAC over the literal header line (name included) using impit's `raw_headers` to match behavior documented as httpx-equivalent gets a different byte sequence than httpx would produce for the same wire response. Existing tests (`impit-python/test/async_client_test.py:481-482`) don't catch this because they only assert against already-lowercase expected keys (`raw[b'x-utf8']`), and the JS test defensively lowercases before comparing (`impit-node/test/basics.test.ts:583`, `k.toLowerCase() === 'x-utf8'`), so neither test suite exercises or would catch a case-sensitivity mismatch. - -Note: this is not a finding for the JS side — the design doc explicitly scopes JS's `rawHeaders` name as "name as string per Fetch conventions" (not claiming byte-exactness for the name), so JS's behavior matches its own documented contract. Only Python's docstring makes the stronger "exact wire bytes" / "httpx equivalent" claim that this contradicts. diff --git a/.devforge/iter-2-rev2/final-review-thermonuclear.md b/.devforge/iter-2-rev2/final-review-thermonuclear.md deleted file mode 100644 index 0cdbc9f6..00000000 --- a/.devforge/iter-2-rev2/final-review-thermonuclear.md +++ /dev/null @@ -1,16 +0,0 @@ -VERDICT: FAIL - -## Major: `rawHeaders` getter is missing from the committed `impit-node/index.d.ts`, so the public TS surface silently lags the Rust source - -- File: `impit-node/src/response.rs:140-151` adds `#[napi(getter, js_name = "rawHeaders", ...)] pub fn raw_headers(&self) -> Vec<(String, Uint8Array)>` on `ImpitResponse`. -- File: `impit-node/index.d.ts:117-146` (the `ImpitResponse` class declaration, which is checked into the repo and listed as `"types": "index.d.ts"` in `impit-node/package.json`) still only declares `status`, `statusText`, `headers`, `ok`, `url`, `decodeBuffer`, etc. — no `rawHeaders` entry anywhere in the file. -- The design doc (`2-design.md`, "Major changes" section) explicitly lists updating "the `.d.ts`/napi surface" as part of this change's scope, so this isn't an incidental miss — it's a stated deliverable that wasn't done. -- Concrete symptom: `impit-node/test/basics.test.ts:582` has to write `(response as unknown as { rawHeaders: Array<[string, Uint8Array]> }).rawHeaders` to access the new accessor, whereas the adjacent `response.headers.get(...)` call three lines above (line 87 elsewhere in the same file) needs no cast. The test itself is proof the declared public type doesn't match the implementation. -- Impact: any TypeScript consumer calling `response.rawHeaders` today gets a compile error (property does not exist on `ImpitResponse`) until someone manually re-runs `napi build` to regenerate `index.d.ts` from source — the exact feature this PR is supposed to ship is unusable from TypeScript without an undocumented extra build step, and the checked-in artifact is inconsistent with the checked-in source in the same commit. - -## Major: `response.clone()` silently drops `rawHeaders`, with zero test coverage and no documentation of the gap - -- File: `impit-node/index.wrapper.js:417-491` (`Object.defineProperty(originalResponse, 'clone', ...)`). The clone path explicitly re-threads `headers` into the new object (`new Response(stream2, { status: this.status, statusText: this.statusText, headers: this.headers })`, line 474-478) but the `clone` produced this way is a plain global `Response` (confirmed by the existing test at `impit-node/test/basics.test.ts:719`: `expect(clone).toBeInstanceOf(Response)`), not an `ImpitResponse`. The global `Response` has no `rawHeaders` property at all. -- Concrete scenario: an HMAC/signature-verification caller — the exact use case this feature and the design doc (`2-design.md`, "Justified because HMAC callers need exact bytes") were built for — calls `response.clone()` (e.g. to inspect headers while leaving the original body stream intact for a downstream consumer) and then reads `clone.rawHeaders`. This returns `undefined` silently; there is no error, no `TypeError`, nothing to signal that the raw-bytes escape hatch stopped working across the clone boundary. The consumer only discovers this at HMAC-mismatch time, one step removed from the code that dropped the data — a hard bug to trace. -- No test in this diff (or pre-existing) exercises `clone()` together with `rawHeaders`; the new `rawHeaders` test (`impit-node/test/basics.test.ts:573-591`) only checks the pre-clone response, and the `clone()` describe block (`impit-node/test/basics.test.ts:714-791`) predates this PR and knows nothing about `rawHeaders`. -- Given the design doc frames `rawHeaders` as an explicit, first-class, HMAC-oriented API (not an afterthought), this silent data loss on a documented, tested code path (`clone()`) is a real correctness gap introduced by this change, not a pre-existing limitation being merely inherited — `headers` (the string map) was carried across `clone()` on purpose in the same function; `rawHeaders` was not, and nothing calls that asymmetry out. diff --git a/.devforge/iter-2-rev2/review-staff-review.md b/.devforge/iter-2-rev2/review-staff-review.md deleted file mode 100644 index 58833fee..00000000 --- a/.devforge/iter-2-rev2/review-staff-review.md +++ /dev/null @@ -1,73 +0,0 @@ -VERDICT: PASS - -## Scope of this round's diff (verified independently) - -Diffed `iter-2-rev2/diff.patch` against the correct prior-round baseline -(`iter-1-rev2/diff.patch` — the design-rev-2 implementation that received the major finding; -`iter-2`/`iter-3` belong to the earlier, abandoned design-rev-1 track and are not the right -baseline). Result: - -- `impit-node/src/response.rs`, `impit-node/test/basics.test.ts`, `impit-node/test/mock.server.ts`, - `impit-python/src/response.rs`, `impit/src/lib.rs`, `impit/src/response_parsing/mod.rs` — **byte- - identical** to the prior round. No regression risk introduced. -- `impit-python/test/response_test.py` — one line changed: `'naïve'.encode('utf-8')` → - `'naïve'.encode()`. Behaviorally identical (`str.encode()` defaults to UTF-8); cosmetic only. -- `impit-python/test/async_client_test.py` — new `header_encoding_server` helper and new test - `test_header_value_decoding_and_raw_bytes`, purely additive (file didn't exist in the prior - round's diff in this form; no existing test modified). - -This matches the expected shape: the fix is scoped exactly to adding the missing Python -integration test, with no incidental changes to the asymmetric-decode/raw-accessor logic itself. - -## Verification of the new test - -- **Real wire path, not the `#[new]` shortcut**: `test_header_value_decoding_and_raw_bytes` uses - `AsyncClient(browser=browser).get(...)` against a real `socket`-based server - (`header_encoding_server`), landing in the `From` conversion in - `impit-python/src/response.rs:566-577` (the code path with `decode_header_value` and the - `raw_headers` Vec built from `val.headers()`), not the manually-constructed `Response(...)` - path. This is the exact gap the prior review flagged, and it's now closed — it mirrors the - existing JS wire-level pattern (`utf8Header` route in `mock.server.ts` + `basics.test.ts`). -- **Hand-built HTTP/1.1 response is well-formed**: reconstructed the exact byte sequence locally - and confirmed `Content-Length: 2` matches the 2-byte body `ok`, headers are correctly - `\r\n`-terminated, and the header/body boundary (`\r\n\r\n`) is correct. Verified via a live - IPv4 socket round-trip that a client reading this stream would parse it exactly as intended: - `X-Utf8: attachment; filename="naïve.pdf"` (0xC3 0xAF for `ï`) and `X-Latin1: Märtz`-style value - carrying a lone `0xE4` byte, matching PR #434's existing regression scenario. -- **Header-name case handled correctly**: the `From` conversion builds both the string - `headers` dict and `raw_headers` from `k.as_str()` on `reqwest`'s `HeaderName`, which the `http` - crate always normalizes to lowercase. The test asserts against lowercase keys - (`response.headers['x-utf8']`, `raw[b'x-utf8']`) even though the server sends `X-Utf8`/ - `X-Latin1` — this is correct given `as_str()` semantics, and is consistent with how the - pre-existing, unchanged constructor-path test (`test_response_constructor_with_headers`, uses - `'Content-Type'` verbatim) differs because that path does *not* lowercase (no `HeaderName` - involved) — no contradiction, just two different, correctly-modeled code paths. -- **Assertions are correct**: UTF-8 value decodes as the original Python `str` (httpx path via - `decode_header_value`'s UTF-8-first branch); the lone `0xE4` byte falls back to ISO-8859-1 - producing `'März'`; `raw_headers` (as a `dict`) returns the exact wire bytes for both headers, - matching the manually reconstructed byte sequences. -- **IPv6 dual-stack binding pattern**: `header_encoding_server` binds `('::', 0)` with - `IPV6_V6ONLY=0` and the test connects via `127.0.0.1:{port}` — this exactly follows the - pre-existing, proven pattern already used by `truncating_server`/`test_truncated_response` in - the same file (not a new/untested pattern). -- **Test isolation**: uses its own dedicated `header_encoding_server` (own port via `port_holder`, - own thread), doesn't interfere with or reuse state from other tests. - -## Style / lint - -- `ruff check` (`select = ["ALL"]`) and `ruff format --check` both pass clean on - `async_client_test.py` and `response_test.py` (verified locally, matches `test-results.txt`). -- `rustfmt --check --edition 2021` passes clean on all three touched Rust files. -- `py_compile` succeeds on both test files. -- New test function/class placement is consistent with siblings (correct indentation inside - `TestBasicRequests`, correct blank-line spacing, no orphaned/duplicate definitions). - -## Conclusion - -The prior major finding — Python's `raw_headers`/decode behavior being tested only via the -`#[new]` manual-construction path and not the real `from_async` fetch path — is genuinely -resolved. The new test exercises the identical wire-level scenario the JS suite already covered, -the hand-built response is protocol-correct, and the assertions correctly reflect httpx-style -decode semantics and exact-byte `raw_headers` parity. No new issues found; the diff since the -last review is exactly the two Python test files (plus one no-op cosmetic edit), with zero -changes to the reviewed decode/raw-accessor implementation itself. diff --git a/.devforge/iter-2/claim.md b/.devforge/iter-2/claim.md deleted file mode 100644 index bd08f86c..00000000 --- a/.devforge/iter-2/claim.md +++ /dev/null @@ -1,25 +0,0 @@ -# Iteration 2 — implementer claim - -## Addressed both iter-1 review findings (blockers) -1. **Broken doctest (`mod.rs` byte literal `b've'`)** — FIXED. Changed to separate elements - `b'v', b'e'`. Verified: the doctest now compiles and passes under `rustdoc --test` (see - `iter-2/test-results.txt`). -2. **rustfmt violation (long array literal)** — FIXED. Ran `rustfmt` on all touched files. - -## Additional fix surfaced by the strengthened oracle -- Collapsing the header-map closures to call `decode_header_value` changed line lengths, so - `rustfmt` rewrapped the Node push (now one line) and the Python `HashMap::from_iter` map. - Applied. `rustfmt --check` is now clean across all four touched Rust files. - -## Oracle (strengthened this iteration to close the gap that let #1/#2 through) -Three commands, all green: -- `rustfmt --check` on core + both binding files → CLEAN. -- `rustc --test` unit tests → 5/5 pass. -- `rustdoc --test` doctest → 1/1 pass (this is what would have caught the byte-literal typo; - `rustc --test` alone does not run doctests). - -## Unchanged from iter-1 (still true) -- Algorithm: UTF-8-first, byte-preserving ISO-8859-1 fallback; both binding call sites use the - shared helper; Node UTF-8 regression test added. -- Full `cargo`/napi/maturin build still cannot run here (github.com/apify/h2 egress 403); - binding compile + JS/Py suites must run in CI. No code reason they would fail. diff --git a/.devforge/iter-2/final-review-code-review.md b/.devforge/iter-2/final-review-code-review.md deleted file mode 100644 index 2d255b58..00000000 --- a/.devforge/iter-2/final-review-code-review.md +++ /dev/null @@ -1,26 +0,0 @@ -VERDICT: PASS - -## Method -Independent fresh review per code-review.md's approach (read-diff, shallow-bug-scan, comment/doc-guidance-compliance, historical-context reasoning), applied to `impit/src/response_parsing/mod.rs`, `impit/src/lib.rs`, `impit-node/src/response.rs`, `impit-python/src/response.rs`, and the Node test additions. Verified the pure helper with a standalone rustc oracle (rustc 1.x local toolchain) since the full workspace cannot build here. - -## Verification performed -- Extracted `decode_header_value` into a standalone snippet and ran the 4 tests from the diff plus 4 adversarial tests I added (mixed valid-UTF-8-prefix-then-invalid-byte, truncated multibyte lead byte, whole-buffer-fallback-not-partial-decode check, char-count-equals-byte-count check). All 8 passed. -- Confirmed `String::from_utf8` failure causes the *entire* original byte buffer (via `FromUtf8Error::into_bytes()`, which returns the original vec unmodified) to fall back to the byte-for-byte latin-1 map — no partial UTF-8 decoding/mixing occurs, so PR #434's guarantee (byte-exact latin-1 on invalid UTF-8) holds even for buffers with a valid UTF-8 prefix followed by a bad byte. -- Confirmed the fallback path never invokes `DecoderTrap::Replace` or any lossy path, so no `U+FFFD` can appear (issue #430 guarantee) — this is structural, not incidental (there is no lossy call in either branch). -- Confirmed the new doctest on `decode_header_value` compiles and passes under `rustdoc --test` once given a matching `--edition 2021` (my harness's first failure was a self-inflicted edition mismatch in the oracle harness, not a defect in the source). -- Confirmed rustfmt reports no formatting diff on the new function. -- Confirmed both binding call sites (`impit-node/src/response.rs:94`, `impit-python/src/response.rs:545`) import and call `decode_header_value(v.as_bytes())` with matching signature `&[u8] -> String`, directly replacing the old `v.as_bytes().iter().map(|&b| b as char).collect()` inline closures 1:1 — no behavior divergence between bindings. -- Confirmed `impit/src/lib.rs` re-exports `decode_header_value` through `pub mod utils`, alongside the existing `decode`/`determine_encoding`/`ContentType` exports, so both bindings' `use impit::utils::{..., decode_header_value, ...}` imports resolve. -- Confirmed the Node regression test (`impit-node/test/basics.test.ts:574-577`) and its mock route (`impit-node/test/mock.server.ts:124-138`) mirror the pre-existing, already-proven `nonAsciiHeader` raw-socket-header-injection pattern; verified with a quick Node snippet that `Buffer.from(headerValue, 'utf-8')` produces the expected UTF-8 bytes (0xC3 0xAF for `ï`), which `decode_header_value` will correctly round-trip back to the original string. - -## Acceptance criteria (from `_verified_task.md`) -1. UTF-8 header decodes as UTF-8 — verified (oracle test + doctest). -2. Invalid-UTF-8 latin-1 bytes still decode byte-for-byte as latin-1 — verified, including the harder case of a valid-UTF-8-looking prefix followed by an invalid byte (whole buffer still falls back atomically). -3. No `U+FFFD` ever introduced — verified structurally (no lossy decode call exists in either code path). -4. Applied symmetrically in Node and Python — verified, identical call pattern in both `response.rs` files. -5. Regression test present for UTF-8 case in Node — verified, added and consistent with existing test infra. - -## Findings with confidence >= 80 -None. - -Minor items noted but explicitly out of scope per the design doc's own risk section (ambiguous-bytes tradeoff, no Python test due to unavailable maturin build here, un-runnable full workspace build due to blocked `h2` git dependency) — these are called out and accepted as intentional/environment-limited in `.devforge/2-design.md`, not defects introduced by this change, and do not meet the >=80 confidence bar as unaddressed regressions. diff --git a/.devforge/iter-2/final-review-thermonuclear.md b/.devforge/iter-2/final-review-thermonuclear.md deleted file mode 100644 index a142435c..00000000 --- a/.devforge/iter-2/final-review-thermonuclear.md +++ /dev/null @@ -1,67 +0,0 @@ -VERDICT: FAIL - -## MINOR — needless allocation on the common path + inaccurate doc claim in `decode_header_value` - -**File:** `impit/src/response_parsing/mod.rs:159-162` - -```rust -pub fn decode_header_value(bytes: &[u8]) -> String { - String::from_utf8(bytes.to_vec()) - .unwrap_or_else(|e| e.into_bytes().iter().map(|&b| b as char).collect()) -} -``` - -`bytes.to_vec()` unconditionally copies the entire input buffer into a freshly allocated -`Vec` *before* UTF-8 validity is even checked, on every call, for every header, on every -response. The design doc (`.devforge/2-design.md:22-25`) and the function's own rustdoc-adjacent -commentary claim this is "a single move with no per-byte work" on the common path — that's not -accurate for `to_vec()`, which is a byte-for-byte `memcpy`, not a move. `Vec::from(bytes)`/`to_vec` -never reuses the caller's buffer since the caller only hands over a borrowed `&[u8]` -(`HeaderValue::as_bytes()`), so there's no way to "move" here regardless. - -The direct, idiomatic, and cheaper version is the standard borrow-first pattern: - -```rust -pub fn decode_header_value(bytes: &[u8]) -> String { - match std::str::from_utf8(bytes) { - Ok(s) => s.to_string(), - Err(_) => bytes.iter().map(|&b| b as char).collect(), - } -} -``` - -This validates against the borrowed slice with zero allocation, and only allocates once (via -`to_string()`) on the success path — i.e. it does strictly less work than the current -`to_vec()` (which allocates+copies unconditionally) followed by `String::from_utf8` (which just -re-wraps that buffer). It's also easier to read: no `Result`-unwrapping through -`.into_bytes()` in the error arm, no implicit conversion of "the same owned buffer" that isn't -actually being reused (the current code's error path calls `.into_bytes()` on the `FromUtf8Error` -only to immediately throw the `Vec` away byte-by-byte in the `.iter().map(...)` — so the -"reuses the same owned buffer" claim in the design doc is also not realized: the fallback path -re-walks the vec one byte at a time and builds a brand new `String`, it does not reuse the -allocation). - -Concrete scenario this matters for: this helper runs on *every response header, for every -request*, in both bindings (Node hot path: `impit-node/src/response.rs:94`; Python hot path: -`impit-python/src/response.rs:545`). It's exactly the kind of small, shared, called-everywhere -core-crate helper where an avoidable per-call heap copy is worth eliminating now rather than -carrying it forward as "how the shared helper has always worked" — and the mis-description in -the design doc/rustdoc-adjacent rationale ("single move," "reuses the same owned buffer") makes -the code read as more optimized than it actually is, which will mislead the next person who -touches this function into thinking the allocation profile is already minimal. - -This is a one-line fix, low risk, behavior-preserving (verified via a standalone rustc oracle: -both the current implementation and the `std::str::from_utf8` version produce identical output -for the UTF-8, invalid-UTF-8/latin-1-fallback, and empty-input cases). Given the ENGINE.md -standard of "code-judo simplification" and "no needless allocation," this should be fixed before -merge rather than accepted as-is. - -Everything else in this diff is sound: the helper lives in the correct canonical location -(`impit/src/response_parsing/mod.rs`, re-exported via `impit::utils`), both bindings now call the -one shared implementation with no leftover duplicate `b as char` logic anywhere in the tree, no -file approaches the 1k-line ceiling (largest touched file is `impit-python/src/response.rs` at -604 lines), there is no new branching/spaghetti introduced into `response.rs` in either binding -(the call sites are direct one-line substitutions), and the unit tests plus the new Node UTF-8 -regression test are well-targeted and correctly assert the documented invariants (UTF-8 first, -latin-1 fallback, no `U+FFFD`, byte-reversibility). The only issue is the allocation/doc-accuracy -point above. diff --git a/.devforge/iter-2/review-staff-review.md b/.devforge/iter-2/review-staff-review.md deleted file mode 100644 index 59e71cb4..00000000 --- a/.devforge/iter-2/review-staff-review.md +++ /dev/null @@ -1,88 +0,0 @@ -VERDICT: PASS - -## Summary - -Reviewed the iter-2 diff fresh (both prior blockers claimed fixed by implementer) against -`_verified_task.md` and `2-design.md`, with independent local verification (rustc/rustfmt/rustdoc), -not by trusting the oracle output or implementer claims alone. - -## Verification performed - -1. **Algorithm correctness (`impit/src/response_parsing/mod.rs:159-162`)** - `decode_header_value` = `String::from_utf8(bytes.to_vec()).unwrap_or_else(|e| e.into_bytes().iter().map(|&b| b as char).collect())`. - - Confirmed `FromUtf8Error::into_bytes()` returns the **complete original buffer**, not a - truncated one, via standalone test (`[b'a', b'b', 0xE4, b'c', b'd']` round-trips fully through - the fallback) — this is the crux of correctness for #434/#430: any single invalid byte - anywhere in the header falls back to whole-buffer latin-1, not partial UTF-8 + partial mangle. - - Verified all four use cases directly by compiling and running equivalent Rust: - - UTF-8 (`naïve.pdf`, `héllo 世界 🎉`) decodes as UTF-8 (#479 fixed). - - Invalid-UTF-8 latin-1 (lone `0xE4`) falls back to `ä` byte-for-byte (#434 preserved). - - ASCII and empty-string pass through unchanged. - - No input (tried lone `0xE4`, `0xFF 0xFE 0x41`, truncated multi-byte `0xC3` at end of buffer) - ever produces `U+FFFD` (#430 non-crash/non-empty guarantee preserved). - - This is strictly better than `from_utf8_lossy` as the design claims — confirmed lossy would - introduce `U+FFFD` for the lone-`0xE4` case; the chosen implementation does not. - -2. **Doctest / byte-literal blocker (prior iteration's blocker #1)** - Extracted the exact doctest from `impit/src/response_parsing/mod.rs:150-155` into a standalone - crate and ran real `rustdoc --test` against it (compiled a `.rlib` and linked it properly, not - just `rustc` on a `fn main`). Result: **passes** (`test ... - response_parsing::decode_header_value - (line 10) ... ok`). The old broken single out-of-range byte literal is gone; `0xC3, 0xAF` are now - two separate valid `u8` array elements. This matches the oracle's `test-results.txt` doctest - result and is independently confirmed as a genuine fix, not just an oracle artifact. - -3. **rustfmt cleanliness (prior iteration's blocker #2)** - Ran `rustfmt --check` (not the oracle's cached result) on all four touched files, respecting the - project's actual `impit-node/rustfmt.toml` (`tab_spaces = 2`) by running from within - `impit-node/`: - - `impit/src/response_parsing/mod.rs` — clean - - `impit/src/lib.rs` — clean - - `impit-node/src/response.rs` — clean (2-space indent matches diff) - - `impit-python/src/response.rs` — clean - All exit 0. Matches oracle's "FMT CLEAN". - -4. **Binding call sites** - - Node (`impit-node/src/response.rs:3,94`): import adds `decode_header_value` alongside existing - `decode, ContentType`; call site `decode_header_value(v.as_bytes())` where `v: &HeaderValue` - (`.as_bytes()` returns `&[u8]`) — matches `fn decode_header_value(bytes: &[u8]) -> String`; - assigned into `Vec<(String, String)>` element — types match exactly what was there before. - - Python (`impit-python/src/response.rs:8-11,542-546`): import restructured into a multi-item - `use impit::{errors::ImpitError, utils::{decode_header_value, ContentType}};` — syntactically - valid Rust (verified structurally); call site inside `HashMap::from_iter(...)` closure, - `decode_header_value(v.as_bytes())` returns `String`, matching the `HashMap` - target type exactly as before. - - Confirmed no naming collision with the pre-existing unrelated `impit::utils::decode` (body - decoder) — `impit-python/src/response.rs:458` still calls fully-qualified `impit::utils::decode` - for body content, untouched. - - Confirmed no leftover duplicate inline `b as char` logic anywhere outside the shared helper - (`grep` for `as_bytes().iter()` in both bindings' `src/` returns nothing). - - Re-export chain verified: `impit/src/lib.rs:81` adds `pub use crate::response_parsing::decode_header_value;` - inside the existing `pub mod utils { ... }` block, consistent with how `decode`, - `determine_encoding`, `ContentType` are already re-exported. - -5. **Test coverage** - - Core unit tests (`impit/src/response_parsing/mod.rs:169-212`, oracle: 5/5 pass) cover ASCII, - empty, UTF-8, invalid-UTF-8-latin-1, and an explicit round-trip/no-replacement-char assertion. - Bytes are genuinely exercised as raw `&[u8]` / byte arrays, not derived from a `String` that - would mask the code path (e.g. `utf8_is_decoded_as_utf8` uses `"...".as_bytes()`, but since the - literal is valid UTF-8 source text this correctly represents the UTF-8-bytes-on-the-wire case). - - Node regression test (`impit-node/test/basics.test.ts:574-577` + `mock.server.ts:27-30,124-138`) - mirrors the existing latin-1 guard exactly: writes the raw HTTP response via a raw socket with - literal UTF-8 bytes for the header value (`Buffer.from(routes.utf8Header.headerValue, 'utf-8')`), - bypassing any client-side normalization, so it genuinely exercises the wire-bytes decode path - inside the Rust binding rather than a value Node's HTTP stack might re-encode. This is the - correct test shape (same pattern as the pre-existing `nonAsciiHeader` test) and satisfies - acceptance criterion #5 (UTF-8 regression test present in Node). - - Design explicitly defers a Python-side test as a nice-to-have (build unavailable locally); this - is an accepted, documented risk in `2-design.md`, not a gap introduced by this diff. - -6. **Scope check** - - No out-of-scope raw-header-bytes API was added (design explicitly deferred this). - - No stray edits outside the five touched files (`impit-node/src/response.rs`, - `impit-node/test/basics.test.ts`, `impit-node/test/mock.server.ts`, - `impit-python/src/response.rs`, `impit/src/lib.rs`, `impit/src/response_parsing/mod.rs`). - -No blockers, majors, minors, or nits found. Both prior-iteration blockers are genuinely fixed -(independently reproduced, not just re-reading the oracle log), the algorithm satisfies all four -acceptance criteria under adversarial edge-case testing, both binding call sites compile-shape -correctly, and test coverage is adequate and genuinely exercises the byte-level decode path. diff --git a/.devforge/iter-3-rev2/claim.md b/.devforge/iter-3-rev2/claim.md deleted file mode 100644 index b648fa82..00000000 --- a/.devforge/iter-3-rev2/claim.md +++ /dev/null @@ -1,29 +0,0 @@ -# Iteration 3 (rev 2, final-reopen round 1) — implementer claim - -Addressed all findings from both final reviewers (thermonuclear: 2; code-review: 3; one overlaps). - -## Fixed -1. **`rawHeaders` dropped after `clone()`** (both reviewers) — the clone is a plain Fetch - `Response`. `index.wrapper.js` now copies `rawHeaders` onto the clone via `Object.defineProperty`. - Added a clone-preservation assertion to the JS test. -2. **`rawHeaders` missing from public TS surface / test needed `as unknown as`** (thermonuclear) — - added `get rawHeaders(): Array<[string, Uint8Array]>` to the `ImpitResponse` class in - `index.d.ts` (fetch returns `ImpitResponse`). The main test now accesses `response.rawHeaders` - without a cast (only `clone()`, typed `Response`, still casts — honest, since clone returns a - Fetch Response augmented at runtime). -3. **Overstated "wire order + original casing" claims** (code-review findings 2 & 3) — corrected. - Root cause: reqwest's `HeaderMap` normalizes header names to lowercase and discards original - cross-header wire order before impit sees the response, so true httpx-`.raw` parity is - impossible. Softened the docstrings (Node + Python), `2-design.md`, `_verified_task.md`, and - the PR ecosystem section to state: header **values** are exact bytes (the part that matters for - HMAC), duplicate values are preserved, but **names are lowercased and cross-header order is not - guaranteed**. No code change needed for values — they were already exact. - -## No compile-breaking issues -Both final reviewers verified (against published crate source) that the napi tuple return and -pyo3 per-method lifetime signatures compile; no code change there. - -## Oracle — green -Rust: `rustfmt --check` CLEAN; `rustc --test` 5/5; `rustdoc --test` 1/1. -Python: `ruff check` + `ruff format --check` clean; `py_compile` OK. -(Binding compilation + JS/Py test execution remain CI-gated — h2 egress block.) diff --git a/.devforge/iter-3-rev2/final-review-code-review.md b/.devforge/iter-3-rev2/final-review-code-review.md deleted file mode 100644 index 192a82fd..00000000 --- a/.devforge/iter-3-rev2/final-review-code-review.md +++ /dev/null @@ -1,76 +0,0 @@ -VERDICT: PASS - -## Verification of prior findings - -### Finding 1 — JS `rawHeaders` dropped after `clone()` — RESOLVED -`impit-node/index.wrapper.js` (`clone()`, ~line 483-487) now does: -```js -Object.defineProperty(clone, 'rawHeaders', { - value: this.rawHeaders, - enumerable: true, -}); -``` -`this` here is the native `ImpitResponse` (`originalResponse`), so `this.rawHeaders` invokes the -napi getter (`impit-node/src/response.rs:141-152`) and copies the resulting `Array<[string, -Uint8Array]>` onto the clone as a static value — correct, since `rawHeaders` is plain data, not a -stream that can be double-consumed. Covered by a new test in -`impit-node/test/basics.test.ts:574-595` (`'raw header bytes preserve the exact wire value while -the string stays ISO-8859-1 (Fetch-style)'`), which explicitly clones the response and asserts -`cloned.rawHeaders` still contains the exact bytes. Confirmed rustfmt-clean on -`impit-node/src/response.rs`. - -### Finding 2 — `HeaderMap::iter()` order claim — RESOLVED -All public-facing wording now correctly states the wire order is NOT preserved (removed the prior -overclaim of "in wire order"): -- `impit-node/index.d.ts:153-155` (`rawHeaders` getter doc) -- `impit-node/src/response.rs:138-140` (napi getter `///` doc) -- `impit-python/src/response.rs:456-459` (`raw_headers` `///` doc, "note two differences... the - original wire order is not preserved") -- `.devforge/2-design.md:30-34`, `.devforge/_verified_task.md:20-24` - -Verified against the actual `http` crate (v1.4.2, vendored at -`/root/.cargo/registry/src/.../http-1.4.2/src/header/map.rs:943`): `HeaderMap::iter()` docs state -"The iterator order is arbitrary, but consistent across platforms for the same crate version" — -i.e., not wire order and not even a stable/documented order guarantee beyond same-key insertion -order for duplicates. The corrected docs ("wire order not preserved") are accurate (if anything, -conservative, since the map's overall order isn't even loosely tied to wire order). - -### Finding 3 — Python `raw_headers` lowercases names vs. httpx `.raw` — RESOLVED -`impit-python/src/response.rs:456-459` no longer claims unqualified "httpx `Response.headers.raw` -equivalent." It now reads: "Similar to httpx's `Response.headers.raw`, but note two differences -imposed by the underlying HTTP client: header names are normalized to lowercase and the original -wire order is not preserved... Header *values* are the exact bytes received." This is accurate: -`k.as_str()` on an `http::HeaderName` always returns the lowercased form (confirmed prior round), -and the new doc no longer implies name-case parity. `.devforge/pr-ecosystem-section.md:22-27` and -`2-design.md` state the same caveat consistently. New test coverage: -`impit-python/test/async_client_test.py` (`test_header_value_decoding_and_raw_bytes`) asserts -`raw[b'x-utf8']` / `raw[b'x-latin1']` using lowercase keys, consistent with actual behavior; ruff -check passes clean on the touched test files. The unrelated constructor path -(`ImpitPyResponse::new`, used by `impit-python/test/response_test.py::test_response_raw_headers`) -builds `raw_headers` directly from the caller-supplied Python dict (no `HeaderMap` involved), so -it correctly preserves `Content-Type` casing there — consistent with the getter doc, which only -promises byte-exact values and flags the lowercasing caveat as a limitation "imposed by the -underlying HTTP client" (i.e., only applies to responses that actually went through reqwest). - -## Additional checks performed -- `rustfmt --check` on all touched Rust files (`impit-node/src/response.rs`, - `impit-python/src/response.rs`, `impit/src/response_parsing/mod.rs`, `impit/src/lib.rs`): clean. -- Standalone `rustc` compile of the `decode_header_value` logic: UTF-8 input decodes as UTF-8 - (`naïve.pdf`), invalid-UTF-8 single byte (`0xE4`) falls back to ISO-8859-1 (`März`) — matches - docstring and unit tests in `impit/src/response_parsing/mod.rs:454-497`. -- `ruff check` on `impit-python/test/async_client_test.py` and `impit-python/test/response_test.py`: - all checks passed. -- Working tree matches `diff.patch` exactly (`git status` clean); no drift to account for. -- No other file references stale "wire order preserved" / unqualified "httpx equivalent" wording - in any public doc, design doc, or verified-task doc. - -## Notes (not reported as findings, confidence < 80 / non-doc) -- `impit-node/src/response.rs:65` and `impit-python/src/response.rs:230` retain stale plain `//` - (non-doc) comments above the private `raw_header_pairs`/`raw_headers` struct fields ("in wire - order (duplicates preserved)", "httpx `Headers.raw` equivalent"). These are internal - implementation comments, not rendered rustdoc/public API documentation, and the actual `///` - getter docs immediately below are correct and consistent with the design. Cosmetic only; no - functional or user-facing documentation impact. - -No correctness or compatibility regressions found. Value bytes remain exact end-to-end in both -bindings. diff --git a/.devforge/iter-3-rev2/final-review-thermonuclear.md b/.devforge/iter-3-rev2/final-review-thermonuclear.md deleted file mode 100644 index 1f0d1423..00000000 --- a/.devforge/iter-3-rev2/final-review-thermonuclear.md +++ /dev/null @@ -1,73 +0,0 @@ -VERDICT: FAIL - -## Findings - -### Major — stale "in wire order" comments contradict the corrected public docstrings (docs accuracy regression not fully fixed) - -The rev-2 fix correctly rewrote the *public* docstrings for both `rawHeaders` (Node) and `raw_headers` -(Python) to state that header names are lowercased and the original wire order is **not** preserved -(only duplicate values for a given name are preserved). However, the private field comments sitting -right next to the struct definitions in the same files were left unchanged and still assert the -opposite: - -- `impit-node/src/response.rs:65` - ```rust - // Raw, undecoded header name/value byte pairs, in wire order (duplicates preserved). - ``` - This directly contradicts the getter's own doc comment 70 lines below it - (`impit-node/src/response.rs:138`): *"the original wire order is not preserved."* - -- `impit-python/src/response.rs:230` - ```rust - // Raw, undecoded header name/value byte pairs, in wire order (duplicates preserved). - ``` - Same contradiction against `impit-python/src/response.rs:458`: *"the original wire order is not - preserved."* - -Concrete scenario: a maintainer skimming the struct definition (the first thing you read when -opening the file) sees "in wire order" and walks away with the exact overstated/incorrect claim the -prior review round flagged and that the design doc explicitly calls out as a reqwest `HeaderMap` -caveat. The public-facing docs were fixed, but the adjacent internal comments were not brought into -sync, leaving self-contradictory documentation in both `impit-node/src/response.rs` and -`impit-python/src/response.rs`. This is exactly the kind of leftover overstated claim the re-review -was asked to confirm is gone — it is not gone, it just moved one comment upward. - -Fix: reword both field comments to match the getter docs, e.g. "Raw, undecoded header name/value -byte pairs (names lowercased, cross-header wire order not preserved by the underlying `HeaderMap`; -duplicate values for the same name are preserved in insertion order)." - -## Verified as correctly fixed (no findings) - -- `impit-node/index.d.ts:157` now declares `get rawHeaders(): Array<[string, Uint8Array]>` on - `ImpitResponse`, in the same getter style as `get body()`. The main test's direct access - (`impit-node/test/basics.test.ts:582`, `response.rawHeaders.find(...)`) no longer casts. -- `impit-node/index.wrapper.js:483-487` propagates `rawHeaders` onto the `clone()`-created plain - `Response` object via `Object.defineProperty(clone, 'rawHeaders', { value: this.rawHeaders, ... })`, - reading from the original native `ImpitResponse` (`this` inside the `clone` function is - `originalResponse`, the patched native instance) — correct source of truth. The test - (`impit-node/test/basics.test.ts:591-594`) asserts the raw bytes survive `clone()` and match the - pre-clone bytes. - - Note: `clone()` is declared as `clone(): Response` (`impit-node/index.d.ts:250`), the global DOM - `Response` type, which has no `rawHeaders` member — so the test's - `response.clone() as unknown as { rawHeaders: ... }` cast at - `impit-node/test/basics.test.ts:591` is still present. This is a *different*, pre-existing typing - gap (the clone's declared return type was always the plain DOM `Response`, unrelated to this - fix) rather than a recurrence of the originally flagged issue, which was about accessing - `rawHeaders` directly on `ImpitResponse` before the getter existed. Not treated as a blocker since - the acceptance criteria only requires runtime survival across `clone()`, not a fully-typed clone - contract — but flagged here for visibility in case the intent was to eliminate all casts. -- Docstrings in `impit-node/index.d.ts:147-156`, `impit-node/src/response.rs:125-140`, and - `impit-python/src/response.rs:456-462` now accurately state: names lowercased, wire order not - preserved (reqwest `HeaderMap` caveat), duplicate values preserved, value bytes exact. Matches - `2-design.md`'s caveat section and `_verified_task.md` acceptance item 3. -- Header-pair construction (`impit-node/src/response.rs:99-106`, `impit-python/src/response.rs:574-579`) - correctly derives both the decoded-string headers and the raw byte pairs from the same - `response.headers().iter()` pass, so the "duplicates preserved, values exact" guarantee actually - holds against `HeaderMap::iter()` semantics — not just claimed. -- No new maintainability/structural issue introduced: the raw-header plumbing is a small, additive - field + getter in each binding, reusing the existing `decode_header_value` core helper where - applicable; no file crossed a size threshold, no new branching was bolted onto unrelated code - paths, and no bespoke duplicate-of-an-existing-helper was added. -- No concrete compile hazard spotted in the new Rust code (napi `Uint8Array` import, `getter`/ - `js_name`/`ts_return_type` attributes, and pyo3 `PyBytes` construction all follow existing patterns - already used elsewhere in the same files). diff --git a/.devforge/iter-3/claim.md b/.devforge/iter-3/claim.md deleted file mode 100644 index cc9a068f..00000000 --- a/.devforge/iter-3/claim.md +++ /dev/null @@ -1,28 +0,0 @@ -# Iteration 3 (final-reopen round 1) — implementer claim - -## Addressed the thermonuclear final-review finding -- **Needless allocation on the decode path** — FIXED. Rewrote `decode_header_value` to validate - against the borrow with `std::str::from_utf8(bytes)`: - ```rust - match std::str::from_utf8(bytes) { - Ok(valid) => valid.to_owned(), - Err(_) => bytes.iter().map(|&b| b as char).collect(), - } - ``` - This removes the unconditional `bytes.to_vec()` copy. The common UTF-8 path now allocates - exactly once (`to_owned`); the latin-1 fallback allocates exactly once (`collect`) instead of - twice. Byte semantics are identical (whole-buffer UTF-8 check, else full byte→codepoint map). -- **Inaccurate design-doc wording** ("single move / reuses the same buffer") — FIXED in - `2-design.md` to describe the borrow-validate approach accurately. - -## code-review final reviewer: PASS (no changes needed). - -## Oracle — green -- `rustfmt --check` (core + both bindings): CLEAN. -- `rustc --test` unit tests: 5/5 pass (unchanged behavior confirmed after rewrite). -- `rustdoc --test` doctest: 1/1 pass. - -## Scope -- Change is confined to the helper body + doc wording; no call-site, test, or public-signature - change. Behavior is byte-for-byte identical, so this is a final-reopen (final reviewers only). -- Full workspace build / JS-Py suites still deferred to CI (github.com/apify/h2 egress 403). diff --git a/.devforge/iter-3/final-review-code-review.md b/.devforge/iter-3/final-review-code-review.md deleted file mode 100644 index cea9a7c9..00000000 --- a/.devforge/iter-3/final-review-code-review.md +++ /dev/null @@ -1,22 +0,0 @@ -VERDICT: PASS - -## Scope of this round -Reviewed the single targeted change: `decode_header_value` in `impit/src/response_parsing/mod.rs:159-164` was rewritten from -`String::from_utf8(bytes.to_vec()).unwrap_or_else(|_| bytes.iter().map(|&b| b as char).collect())` -to -`match std::str::from_utf8(bytes) { Ok(valid) => valid.to_owned(), Err(_) => bytes.iter().map(|&b| b as char).collect() }`. -Call sites (`impit-node/src/response.rs:94`, `impit-python/src/response.rs:545`), the re-export (`impit/src/lib.rs:81`), the public signature, and tests are confirmed unchanged from the prior iteration. - -## Verification performed -- Extracted both the old and new implementations into a standalone Rust program and ran an exhaustive equivalence check with `rustc -O`: all 256 single-byte inputs, all 65,536 two-byte inputs, a curated set of UTF-8 edge cases (valid multi-byte sequences, truncated 2/3/4-byte sequences, overlong encodings, encoded surrogate halves, empty input), and 2,000 deterministic pseudo-random byte sequences (lengths 0-11). Result: 67,807/67,807 identical outputs between old and new — zero mismatches. This is expected since both paths use the same UTF-8 validator (`std::str::from_utf8` internally backs `String::from_utf8`); the change only avoids validating/copying via an intermediate `Vec`. -- Compiled and ran the two doc-comment examples (naïve UTF-8 case, März latin-1 fallback case) standalone — both pass. -- Compiled and ran the four `#[cfg(test)]` unit tests (`ascii_is_unchanged`, `empty_is_empty`, `utf8_is_decoded_as_utf8`, `invalid_utf8_falls_back_to_iso_8859_1`, `iso_8859_1_fallback_never_produces_replacement_char`) via standalone `rustc --test` — all 5 pass. -- Ran `cargo clippy --all-targets -- -D warnings` on the extracted function in an isolated crate — zero warnings (no `needless_collect`/allocation lints triggered). - -## Acceptance criteria re-confirmed -1. Valid-UTF-8 header bytes decode as UTF-8 (fixes #479 mojibake) — confirmed via unit test and doctest. -2. Lone `0xE4` still decodes byte-for-byte as latin-1 → `ä` (keeps #434) — confirmed via unit test and equivalence check. -3. No `U+FFFD` ever introduced; function never panics or returns empty for non-empty invalid-UTF-8 input (keeps #430) — confirmed via `str::from_utf8` never causing a panic path here (only produces `Result`, no `.unwrap()`), and the fallback branch is a total function over `u8` (`b as char` is defined for all `0..=255`). - -## Findings with confidence >= 80 -None. No correctness, regression, or compilation-risk findings meet the threshold. diff --git a/.devforge/iter-3/final-review-thermonuclear.md b/.devforge/iter-3/final-review-thermonuclear.md deleted file mode 100644 index ed0d685c..00000000 --- a/.devforge/iter-3/final-review-thermonuclear.md +++ /dev/null @@ -1,58 +0,0 @@ -VERDICT: PASS - -## Method -Independent fresh review of `impit/src/response_parsing/mod.rs`, the binding call sites -(`impit-node/src/response.rs`, `impit-python/src/response.rs`), `impit/src/lib.rs`, and the -design-doc wording change in `.devforge/2-design.md`, per ENGINE.md's thermonuclear standards -(1k-line ceiling, spaghetti/branching growth, code-judo simplification, canonical-layer reuse). -Did not assume the fix was correct; independently derived and checked the expected helper body, -then diffed it against what's actually in the tree. - -## Verification performed -- Confirmed the current helper body is exactly: - ```rust - pub fn decode_header_value(bytes: &[u8]) -> String { - match std::str::from_utf8(bytes) { - Ok(valid) => valid.to_owned(), - Err(_) => bytes.iter().map(|&b| b as char).collect(), - } - } - ``` - (`impit/src/response_parsing/mod.rs:159-164`) — matches the prescribed fix. `std::str::from_utf8` - validates against the borrowed `&[u8]` with zero allocation; only the success arm (`to_owned`) - or the fallback arm (`collect`) allocates, exactly once each. The prior finding's `bytes.to_vec()` - unconditional pre-copy is gone; there is no remaining needless allocation on either path. -- Wrote a standalone rustc harness comparing the old body - (`String::from_utf8(bytes.to_vec()).unwrap_or_else(|e| e.into_bytes().iter().map(|&b| b as - char).collect())`) against the new body across: empty input, plain ASCII, valid multi-byte UTF-8 - (`naïve.pdf`), a lone invalid byte (`0xE4`), multiple invalid bytes, a valid-UTF-8 prefix followed - by an invalid trailing byte, a truncated multi-byte lead byte (`0xC3` alone), and all 256 - single-byte inputs individually. Result: `ALL EQUIVALENT` — byte-for-byte identical output in - every case. This confirms the rewrite is UTF-8-first / whole-buffer latin-1 fallback / never - emits `U+FFFD` (latin-1 fallback only ever maps `0x00..=0xFF` → `U+0000..=U+00FF`, structurally, - since there's no lossy call in either branch), i.e. semantics are unchanged from the - already-reviewed-and-accepted algorithm. -- Confirmed `.devforge/2-design.md:22-25` no longer contains the inaccurate "single move" / "reuses - the same owned buffer" claim; it now says `str::from_utf8` checks validity without copying and - each path allocates exactly once — accurate for the new code. -- Confirmed `.devforge/iter-3/test-results.txt` oracle output is green: `rustfmt --check` clean, - 5/5 unit tests pass unchanged, 1/1 doctest passes. -- Confirmed no other occurrence of the old inline `b as char` pattern or the old `to_vec()`-based - helper body remains anywhere in the tree (only the new helper's own fallback arm, which - necessarily uses `b as char` per the design's chosen algorithm). -- Confirmed both binding call sites are unchanged one-line substitutions (`impit-node/src/response.rs:94`, - `impit-python/src/response.rs:545`) — no new branching, no new call-site logic, nothing to review - there beyond what already passed prior rounds. -- File-size check: `impit/src/response_parsing/mod.rs` 250 lines, `impit-node/src/response.rs` 336 - lines, `impit-python/src/response.rs` 604 lines, `impit/src/lib.rs` 91 lines — all well under the - 1k-line ceiling; this iteration's diff is a 2-line body swap plus a doc-wording correction, so no - file-size or decomposition concern. -- No new conditionals, flags, wrappers, or abstractions were introduced — the diff strictly - replaces one expression with an equivalent, cheaper one inside an already-isolated pure - function. No spaghetti growth, no canonical-layer violation, no boundary/type churn. - -## Conclusion -The prior finding (needless `bytes.to_vec()` allocation before the UTF-8 check, and the design -doc's inaccurate "single move / reuses the buffer" description) is genuinely resolved: the code -now validates against the borrow first and allocates minimally on both paths, and the doc wording -was corrected to match. No new issue was introduced by the fix. Zero findings. diff --git a/.devforge/iter-4-rev2/claim.md b/.devforge/iter-4-rev2/claim.md deleted file mode 100644 index a2494ca2..00000000 --- a/.devforge/iter-4-rev2/claim.md +++ /dev/null @@ -1,17 +0,0 @@ -# Iteration 4 (rev 2, final-reopen round 2) — implementer claim - -## Addressed the remaining thermonuclear finding -- **Internal field comments still said "in wire order"** (`impit-node/src/response.rs:65`, - `impit-python/src/response.rs:230`), contradicting the corrected getter docstrings — FIXED. - Both field comments now state values are exact, names lowercased, order not the original wire - order, and point to the getter docs. Also dropped the stale "httpx `Headers.raw` equivalent" - phrasing from the Python field comment. -- Verified no remaining overstated "wire order" claim: the only surviving mentions are accurate - negations ("… not preserved"). - -## code-review: PASS last round (no further changes needed). - -## Oracle — green -Rust: `rustfmt --check` CLEAN; `rustc --test` 5/5; `rustdoc --test` 1/1. -Python: `ruff` + `ruff format --check` clean; `py_compile` OK. -(Comment-only change since iter-3-rev2; binding compile + JS/Py tests remain CI-gated.) diff --git a/.devforge/iter-4-rev2/final-review-code-review.md b/.devforge/iter-4-rev2/final-review-code-review.md deleted file mode 100644 index 27b959f9..00000000 --- a/.devforge/iter-4-rev2/final-review-code-review.md +++ /dev/null @@ -1,11 +0,0 @@ -VERDICT: PASS - -Reviewed the comment-only change to `impit-node/src/response.rs` (field comment above `raw_header_pairs`, ~line 65-66) and `impit-python/src/response.rs` (field comment above `raw_headers`, ~line 230-231), plus their corresponding public getter docstrings (`impit-node/src/response.rs:90-105` `rawHeaders`; `impit-python/src/response.rs:447-454` `raw_headers`). - -Confirmed: -- The leftover "in wire order" phrasing flagged in prior review rounds is gone from both internal field comments. -- Both field comments now read "values exact; names lowercased, order not the original wire order," which accurately matches the getter docstrings' claims (names lowercased, wire order not preserved, duplicate values kept, value bytes exact). -- No other lines in either file, or elsewhere in the diff, still contain the stale "in wire order" phrase (checked via full-repo grep). -- No other code changed — this iteration is comment-only, as expected; working tree is clean and matches `diff.patch` from the prior iteration plus this comment edit. - -No findings with confidence >= 80. Nothing regressed; no overclaim remains. diff --git a/.devforge/iter-4-rev2/final-review-thermonuclear.md b/.devforge/iter-4-rev2/final-review-thermonuclear.md deleted file mode 100644 index bf3bd580..00000000 --- a/.devforge/iter-4-rev2/final-review-thermonuclear.md +++ /dev/null @@ -1,26 +0,0 @@ -VERDICT: PASS - -No findings. The prior finding (internal field comments at `impit-node/src/response.rs:65` and -`impit-python/src/response.rs:230` still saying "in wire order", contradicting the corrected -getter docstrings) is fully resolved: - -- `impit-node/src/response.rs:65-66` now reads: "Raw, undecoded header name/value byte pairs - (values exact; names lowercased, order not the original wire order - see the `rawHeaders` - getter docs)." This matches the `rawHeaders` getter docstring (lines 125-140), which states - names are lowercased, wire order is not preserved, and duplicates are kept. -- `impit-python/src/response.rs:230-231` now reads the equivalent: "values exact; names - lowercased, order not the original wire order - see the `raw_headers` getter docs." This - matches the `raw_headers` getter docstring (lines 456-462), which states the same caveats and - that the getter is "similar to httpx's `Response.headers.raw`" (correctly qualified, not an - unqualified equivalence claim). - -Repo-wide scan for "wire order" / "httpx equivalent" / "httpx-like" wording turned up no other -unqualified overclaims in touched source, `.d.ts`, or test files. Every remaining mention (in -`impit-node/index.d.ts:153`, `impit-node/src/response.rs:138`, `impit-python/src/response.rs:458`, -and test comments in `async_client_test.py`/`response_test.py`) correctly states the order is -*not* preserved and/or qualifies the httpx comparison as approximate. - -Diff scope: the only change relative to the previous review round is the two comment edits -described above, in the two `.rs` files. No other lines in `impit-node/src/response.rs` or -`impit-python/src/response.rs` changed, and no other files in the diff were touched by this -round's fix — no regression introduced. diff --git a/.devforge/pr-ecosystem-section.md b/.devforge/pr-ecosystem-section.md deleted file mode 100644 index cf742f72..00000000 --- a/.devforge/pr-ecosystem-section.md +++ /dev/null @@ -1,65 +0,0 @@ -## Consistency with ecosystem - -impit's two bindings each emulate a reference client, so header decoding is deliberately -**asymmetric** — and each side matches its reference exactly. Both bindings additionally expose -the raw header bytes, following the byte-access pattern each ecosystem already relies on. - -### Python — matches `httpx` (which impit-python implements) - -impit-python advertises the httpx interface ("drop-in replacement for `httpx.AsyncClient`"), and -httpx decodes header values **UTF-8-first with an ISO-8859-1 fallback** — exactly what this PR -does via the shared `decode_header_value` helper: - -- httpx `Headers.encoding` tries `ascii`, then `utf-8`, then falls back to `iso-8859-1`: - [`httpx/_models.py` @ v0.28.1, `encoding` property](https://github.com/encode/httpx/blob/0.28.1/httpx/_models.py#L125-L145) - — *"Header encoding is mandated as ascii, but we allow fallbacks to utf-8 or iso-8859-1."* -- httpx exposes raw bytes via `Headers.raw: list[tuple[bytes, bytes]]`: - [`httpx/_models.py` @ v0.28.1, `raw` property](https://github.com/encode/httpx/blob/0.28.1/httpx/_models.py#L152-L156). - Our new `Response.raw_headers` returns the same `list[tuple[bytes, bytes]]` shape. - -So Python callers get the same decoded strings *and* a raw-bytes escape hatch like httpx's. - -> **Caveat vs. httpx `.raw`:** impit is built on `reqwest`, whose `HeaderMap` normalizes header -> names to lowercase and does not retain the original cross-header wire order — that information -> is gone before impit ever sees the response. So `raw_headers` (and JS `rawHeaders`) is -> httpx-`.raw`-*like* but not byte-identical: header **names** are lowercased and cross-header -> order is not guaranteed. Header **values** — the bytes that matter for signature/HMAC -> verification — are exact, and duplicate values for a name are preserved. - -### JavaScript — matches the Fetch API / undici (which impit-node implements) - -impit-node is "API-compatible with the Fetch API `Response`". In Fetch, header values are a -**byte sequence** exposed to JS as a `ByteString`, i.e. via **isomorphic decode** — each byte -`0x00–0xFF` maps to the code point of equal value (ISO-8859-1). This PR keeps impit-node on that -exact behavior (`b as char`): - -- Fetch Standard: a header value is a [byte sequence](https://fetch.spec.whatwg.org/#concept-header-value), - and the `Headers` interface types names/values as - [`ByteString`](https://fetch.spec.whatwg.org/#headers-class) (`ByteString get(ByteString name)`). -- [WebIDL `ByteString`](https://webidl.spec.whatwg.org/#idl-ByteString) is the isomorphic - (byte ↔ code-point) mapping — i.e. ISO-8859-1. -- undici (Node's `fetch`) implements exactly this: [nodejs/undici#1560 "ByteString checks & - conversion in Headers"](https://github.com/nodejs/undici/pull/1560) and - [#1317](https://github.com/nodejs/undici/issues/1317) confirm header values are handled as - Latin-1 `ByteString`s. -- Node's core `http` parser likewise decodes header values as `latin1`/`binary` - ([nodejs/node#17390](https://github.com/nodejs/node/issues/17390), - [#58240](https://github.com/nodejs/node/issues/58240)); **axios** inherits this because its - Node adapter reads `http.IncomingMessage` headers and its browser adapter reads - `XMLHttpRequest`/Fetch headers. - -Because ISO-8859-1 is isomorphic, the JS string stays **byte-recoverable** — the standard Fetch -workaround `Buffer.from(value, 'latin1')` (or `Uint8Array.from(value, c => c.charCodeAt(0))` in -the browser) reproduces the exact wire bytes, so a UTF-8 header can be recovered with -`Buffer.from(value, 'latin1').toString('utf8')`. - -The Fetch `Headers` interface has **no** raw-byte accessor, so `response.rawHeaders` -(`Array<[string, Uint8Array]>`) is an explicit impit extension. It's justified because -signature/HMAC callers need the exact bytes without the manual round-trip, and it mirrors the -byte-pair access httpx already offers on the Python side. - -### Net effect on #479 - -- **Python**: fully fixed — UTF-8 header values decode correctly (httpx behavior). -- **JavaScript**: string values remain ISO-8859-1 **by design** (Fetch parity, byte-recoverable); - callers needing the decoded UTF-8 value read `response.rawHeaders` and decode with `TextDecoder`. diff --git a/.gitignore b/.gitignore index 43076265..80728873 100644 --- a/.gitignore +++ b/.gitignore @@ -1,6 +1,7 @@ /target /artifacts .idea +.devforge/ # Python __pycache__ From deac5577b085c9828083ba401afee083e18d1577 Mon Sep 17 00:00:00 2001 From: Claude Date: Wed, 1 Jul 2026 11:09:53 +0000 Subject: [PATCH 19/20] fix(python): declare raw_headers in the type stub mypy flagged `Response.raw_headers` as an unknown attribute because the .pyi stub did not declare the new getter. Adds it as a read-only property returning list[tuple[bytes, bytes]], matching the pyo3 getter. Co-Authored-By: Claude Opus 4.8 Claude-Session: https://claude.ai/code/session_01VrUiE5CzcJ9TiRTqvqb1JE --- impit-python/python/impit/impit.pyi | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/impit-python/python/impit/impit.pyi b/impit-python/python/impit/impit.pyi index 7be1fbf1..06fb6dcd 100644 --- a/impit-python/python/impit/impit.pyi +++ b/impit-python/python/impit/impit.pyi @@ -180,6 +180,22 @@ class Response: print(response.headers) # {'content-type': 'text/html; charset=utf-8', ... } """ + @property + def raw_headers(self) -> list[tuple[bytes, bytes]]: + """Raw, undecoded header name/value pairs as ``(bytes, bytes)``. + + Similar to httpx's ``Response.headers.raw``, but note two differences imposed by the + underlying HTTP client: header names are normalized to lowercase and the original wire + order is not preserved (duplicate values for a name are kept). Header *values* are the + exact bytes received - useful when a header carries UTF-8 or when verifying a header + signature/HMAC. + + .. code-block:: python + + response = await client.get("https://crawlee.dev") + print(response.raw_headers) # [(b'content-type', b'text/html; charset=utf-8'), ... ] + """ + text: str """Response body as text. Decoded from :attr:`content` using :attr:`encoding`. From 060101a22d78bbb6190affbf41ba0eee9bcb5e95 Mon Sep 17 00:00:00 2001 From: Claude Date: Wed, 1 Jul 2026 11:48:34 +0000 Subject: [PATCH 20/20] chore: drop .devforge from committed .gitignore Ignoring devforge's local run files belongs in a personal/local exclude, not the shared repo. Kept out of the working tree via .git/info/exclude instead. Co-Authored-By: Claude Opus 4.8 Claude-Session: https://claude.ai/code/session_01VrUiE5CzcJ9TiRTqvqb1JE --- .gitignore | 1 - 1 file changed, 1 deletion(-) diff --git a/.gitignore b/.gitignore index 80728873..43076265 100644 --- a/.gitignore +++ b/.gitignore @@ -1,7 +1,6 @@ /target /artifacts .idea -.devforge/ # Python __pycache__