Skip to content

optimize TextDecoder for legacy encodings#6295

Open
anonrig wants to merge 3 commits intomainfrom
yagiz/optimize-text-decoder
Open

optimize TextDecoder for legacy encodings#6295
anonrig wants to merge 3 commits intomainfrom
yagiz/optimize-text-decoder

Conversation

@anonrig
Copy link
Member

@anonrig anonrig commented Mar 11, 2026

This should optimize performance of TextDecoder for legacy encodings (as reported by @ChALkeR)

@anonrig anonrig requested a review from mikea March 11, 2026 02:21
@anonrig anonrig requested review from a team as code owners March 11, 2026 02:21
@anonrig anonrig requested review from guybedford and jasnell March 11, 2026 02:22
Copy link
Contributor

@ask-bonk ask-bonk bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm Bonk, and I've done a quick review of your PR.

This PR optimizes TextDecoder for legacy encodings by reusing a Rust-side output buffer across decode calls instead of allocating a new Vec<u16> each time, returning a raw pointer+length pair through the CXX bridge instead of moving the Vec.

Issues

1. [Critical] Trailing semicolon makes decode() fail to compile

Line 194 of lib.rs has }; (with a trailing semicolon) on the final ffi::DecodeResult expression. In Rust, this turns the tail expression into a statement, changing the function's implicit return from ffi::DecodeResult to (). This is a compile error — the function signature requires -> ffi::DecodeResult. (The early return on the Malformed path at line 156 is correct — it uses return explicitly.)

2. [Low] Unbounded buffer growth

The state.output buffer doubles on OutputFull but is never shrunk. A single large decode permanently inflates the allocation for the lifetime of the decoder. For streaming use cases (TextDecoderStream), this is probably fine since decoders are short-lived. For long-lived decoders that occasionally see a large chunk, you may want to shrink the buffer if it exceeds some threshold (e.g., if state.output.capacity() > 4 * state.output.len() { state.output.shrink_to(...) }). Not blocking — just something to be aware of.

@ask-bonk
Copy link
Contributor

ask-bonk bot commented Mar 11, 2026

I'm Bonk, and I've done a quick review of your PR.

The review has been posted on PR #6295. Here's a summary of what I found:

1. [Critical] Trailing semicolon on lib.rs:194 — The final ffi::DecodeResult { ... }; has a stray semicolon that turns the tail expression into a statement. This makes the function return () instead of ffi::DecodeResult, which is a compile error. I posted a suggestion to remove the semicolon.

2. [Low] Unbounded buffer growth — The reusable state.output buffer doubles on OutputFull but is never shrunk back. A single large decode permanently inflates memory for the decoder's lifetime. Not blocking, but noted as something to consider for long-lived decoders.

github run

@github-actions
Copy link

github-actions bot commented Mar 11, 2026

The generated output of @cloudflare/workers-types matches the snapshot in types/generated-snapshot 🎉

Co-authored-by: ask-bonk[bot] <249159057+ask-bonk[bot]@users.noreply.github.com>
@anonrig anonrig requested a review from mikea March 12, 2026 20:47
@anonrig anonrig force-pushed the yagiz/optimize-text-decoder branch from e99a6d4 to 39c25ea Compare March 16, 2026 13:37
@anonrig anonrig enabled auto-merge (squash) March 16, 2026 13:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants