Optimize HTML entity decoding in direct_fetch provider by d-oit · Pull Request #451 · d-oit/do-web-doc-resolver

d-oit · 2026-06-19T11:12:25Z

The decode_entities function in the direct_fetch provider was identified as a performance bottleneck due to 20 sequential .replace() calls. Each call triggered a full string traversal and a potential new string allocation.

I implemented a manual single-pass scanner that iterates through the string once and performs lookahead only when an ampersand (&) or word joiner (\u{2060}) is encountered. I also added a fast-path check using .contains() to return early if no replacement characters are present.

Performance improvements measured with Criterion:

Fast-path (no entities): ~3.05µs reduced to ~82ns (~37x faster).
Content with entities: ~5.77µs reduced to ~1.52µs (~3.8x faster).
Long content: ~30.6µs reduced to ~14.9µs (~2x faster).

All tests passed, including the direct_fetch integration tests that verify LaTeX and code block extraction which rely on this decoding logic.

PR created automatically by Jules for task 13342671035324967602 started by @d-oit

Replaced 20 sequential `.replace()` calls in `decode_entities` with a manual single-pass scanner and a fast-path check for strings without entities. Benchmark results (`decode_entities_bench`): - Text with entities: ~5.7µs -> ~1.5µs (3.8x speedup) - Text without entities: ~3.0µs -> ~82ns (36.5x speedup) - Long text with entities: ~30.5µs -> ~15.0µs (2x speedup) Co-authored-by: d-oit <6849456+d-oit@users.noreply.github.com>

google-labs-jules · 2026-06-19T11:12:26Z

👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.

When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.

I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!

For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!

New to Jules? Learn more at jules.google/docs.

For security, I will only act on instructions from the user who triggered this task.

vercel · 2026-06-19T11:12:28Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
do-web-doc-resolover	Ready	Preview, Comment	Jun 19, 2026 4:22pm

deepsource-io · 2026-06-19T11:12:40Z

DeepSource Code Review

We reviewed changes in 2a53733...76bc49f on this pull request. Below is the summary for the review, and you can see the individual issues we found as inline review comments.

See full review on DeepSource ↗

PR Report Card

Overall Grade	Security Reliability Complexity Hygiene

Code Review Summary

Analyzer	Updated (UTC)	Details
JavaScript	Jun 19, 2026 11:26a.m.	Review ↗
Python	Jun 19, 2026 11:26a.m.	Review ↗
Rust	Jun 19, 2026 11:26a.m.	Review ↗
Shell	Jun 19, 2026 11:26a.m.	Review ↗

Important

AI Review is run only on demand for your team. We're only showing results of static analysis review right now. To trigger AI Review, comment @deepsourcebot review on this thread.

codacy-production · 2026-06-19T11:13:15Z

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Metrics 22 complexity · 2 duplication

Metric Results

Complexity 22

Duplication 2

View in Codacy

_{NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer}
_{TIP This summary will be updated as you push new changes.}

deepsource-io · 2026-06-19T11:13:40Z

+}
+
+// Proposed optimized implementation
+fn decode_entities_optimized(text: &str) -> String {


`fn decode_entities_optimized` has a cyclomatic complexity of 30 with "very-high" risk

A function with high cyclomatic complexity can be hard to understand and
maintain. Cyclomatic complexity is a software metric that measures the number of
independent paths through a function. A higher cyclomatic complexity indicates
that the function has more decision points and is more complex.

I've updated the implementation to be truly zero-allocation by using slices. Benchmarks show a further speedup (now ~45x faster than the original code for clean text).

deepsource-io · 2026-06-19T11:13:40Z

+
+    while let Some(ch) = chars.next() {
+        if ch == '&' {
+            let mut entity = String::new();


Empty call to `new()`

The new() function is used to initialise an object with specific data.
If no arguments are passed, the behaviour is identical to default().

@jules address feedback

I've refactored the scanner to use char_indices() and string slicing. This eliminates the String::new() allocation inside the ampersand check, making it a zero-allocation entity detector.

deepsource-io · 2026-06-19T11:13:40Z


-/// Decode basic HTML entities
+/// Decode basic HTML entities using a single-pass scanner for performance
 fn decode_entities(text: &str) -> String {


`fn decode_entities` has a cyclomatic complexity of 30 with "very-high" risk

A function with high cyclomatic complexity can be hard to understand and
maintain. Cyclomatic complexity is a software metric that measures the number of
independent paths through a function. A higher cyclomatic complexity indicates
that the function has more decision points and is more complex.

I've optimized this function by implementing a single-pass manual scanner with a fast-path check. This replaces the previous 20 sequential .replace() calls, which were inefficient due to multiple full-string scans and intermediate allocations. Benchmarks show a ~37x speedup for clean text and a ~3.8x speedup for content with entities.

deepsource-io · 2026-06-19T11:13:40Z

+
+    while let Some(ch) = chars.next() {
+        if ch == '&' {
+            let mut entity = String::new();


Empty call to `new()`

The new() function is used to initialise an object with specific data.
If no arguments are passed, the behaviour is identical to default().

Fixed `clippy::while_let_on_iterator` in `src/providers/direct_fetch.rs` and `benches/decode_entities_bench.rs` by replacing `while let` with `for` loops. Removed unused `OnceLock`, `Regex`, and `Captures` imports from the benchmark file. Co-authored-by: d-oit <6849456+d-oit@users.noreply.github.com>

Applied `cargo fmt` to `cli/benches/decode_entities_bench.rs` to resolve CI formatting check failure. Co-authored-by: d-oit <6849456+d-oit@users.noreply.github.com>

Finalized the performance optimization of `decode_entities` by ensuring full compliance with repository linting and formatting standards. - Applied `cargo fmt` to resolve CI formatting failures. - Fixed `clippy::while_let_on_iterator` in the benchmark scanner. - Maintained ~37x speedup for text without entities. Co-authored-by: d-oit <6849456+d-oit@users.noreply.github.com>

…3342671035324967602

Refactored `decode_entities` to use `char_indices()` and string slicing, eliminating the transient `String` allocation for every ampersand encountered. Benchmark results (`decode_entities_bench`): - Text with entities: ~5.7µs -> ~915ns (6x speedup) - Text without entities: ~3.0µs -> ~60ns (50x speedup) - Long text with entities: ~30.5µs -> ~8.6µs (3.5x speedup) Co-authored-by: d-oit <6849456+d-oit@users.noreply.github.com>

vercel Bot deployed to Preview June 19, 2026 11:12 View deployment

deepsource-io Bot reviewed Jun 19, 2026

View reviewed changes

vercel Bot deployed to Preview June 19, 2026 11:27 View deployment

fix(direct_fetch): correct formatting in benchmark

840862f

Applied `cargo fmt` to `cli/benches/decode_entities_bench.rs` to resolve CI formatting check failure. Co-authored-by: d-oit <6849456+d-oit@users.noreply.github.com>

vercel Bot deployed to Preview June 19, 2026 11:35 View deployment

vercel Bot deployed to Preview June 19, 2026 12:37 View deployment

Merge branch 'main' into perf/optimize-entity-decoding-direct-fetch-1…

c5d6ab0

…3342671035324967602

vercel Bot deployed to Preview June 19, 2026 16:08 View deployment

vercel Bot deployed to Preview June 19, 2026 16:22 View deployment

Conversation

d-oit commented Jun 19, 2026

Uh oh!

google-labs-jules Bot commented Jun 19, 2026

Uh oh!

vercel Bot commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

deepsource-io Bot commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

DeepSource Code Review

PR Report Card

Code Review Summary

Uh oh!

codacy-production Bot commented Jun 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Up to standards ✅

Uh oh!

deepsource-io Bot Jun 19, 2026

Choose a reason for hiding this comment

`fn decode_entities_optimized` has a cyclomatic complexity of 30 with "very-high" risk

Uh oh!

d-oit Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

google-labs-jules Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

deepsource-io Bot Jun 19, 2026

Choose a reason for hiding this comment

Empty call to `new()`

Uh oh!

d-oit Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

google-labs-jules Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

deepsource-io Bot Jun 19, 2026

Choose a reason for hiding this comment

`fn decode_entities` has a cyclomatic complexity of 30 with "very-high" risk

Uh oh!

d-oit Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

google-labs-jules Bot Jun 19, 2026

Choose a reason for hiding this comment

Uh oh!

deepsource-io Bot Jun 19, 2026

Choose a reason for hiding this comment

Empty call to `new()`

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel Bot commented Jun 19, 2026 •

edited

Loading

deepsource-io Bot commented Jun 19, 2026 •

edited

Loading

codacy-production Bot commented Jun 19, 2026 •

edited

Loading