You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The decode_entities function in the direct_fetch provider was identified as a performance bottleneck due to 20 sequential .replace() calls. Each call triggered a full string traversal and a potential new string allocation.
I implemented a manual single-pass scanner that iterates through the string once and performs lookahead only when an ampersand (&) or word joiner (\u{2060}) is encountered. I also added a fast-path check using .contains() to return early if no replacement characters are present.
Performance improvements measured with Criterion:
Fast-path (no entities): ~3.05µs reduced to ~82ns (~37x faster).
Content with entities: ~5.77µs reduced to ~1.52µs (~3.8x faster).
Long content: ~30.6µs reduced to ~14.9µs (~2x faster).
All tests passed, including the direct_fetch integration tests that verify LaTeX and code block extraction which rely on this decoding logic.
Replaced 20 sequential `.replace()` calls in `decode_entities` with a
manual single-pass scanner and a fast-path check for strings without
entities.
Benchmark results (`decode_entities_bench`):
- Text with entities: ~5.7µs -> ~1.5µs (3.8x speedup)
- Text without entities: ~3.0µs -> ~82ns (36.5x speedup)
- Long text with entities: ~30.5µs -> ~15.0µs (2x speedup)
Co-authored-by: d-oit <6849456+d-oit@users.noreply.github.com>
👋 Jules, reporting for duty! I'm here to lend a hand with this pull request.
When you start a review, I'll add a 👀 emoji to each comment to let you know I've read it. I'll focus on feedback directed at me and will do my best to stay out of conversations between you and other bots or reviewers to keep the noise down.
I'll push a commit with your requested changes shortly after. Please note there might be a delay between these steps, but rest assured I'm on the job!
For more direct control, you can switch me to Reactive Mode. When this mode is on, I will only act on comments where you specifically mention me with @jules. You can find this option in the Pull Request section of your global Jules UI settings. You can always switch back!
We reviewed changes in 2a53733...76bc49f on this pull request. Below is the summary for the review, and you can see the individual issues we found as inline review comments.
AI Review is run only on demand for your team. We're only showing results of static analysis review right now. To trigger AI Review, comment @deepsourcebot review on this thread.
NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer TIP This summary will be updated as you push new changes.
The reason will be displayed to describe this comment to others. Learn more.
`fn decode_entities_optimized` has a cyclomatic complexity of 30 with "very-high" risk
A function with high cyclomatic complexity can be hard to understand and
maintain. Cyclomatic complexity is a software metric that measures the number of
independent paths through a function. A higher cyclomatic complexity indicates
that the function has more decision points and is more complex.
The reason will be displayed to describe this comment to others. Learn more.
I've updated the implementation to be truly zero-allocation by using slices. Benchmarks show a further speedup (now ~45x faster than the original code for clean text).
The reason will be displayed to describe this comment to others. Learn more.
I've refactored the scanner to use char_indices() and string slicing. This eliminates the String::new() allocation inside the ampersand check, making it a zero-allocation entity detector.
The reason will be displayed to describe this comment to others. Learn more.
`fn decode_entities` has a cyclomatic complexity of 30 with "very-high" risk
A function with high cyclomatic complexity can be hard to understand and
maintain. Cyclomatic complexity is a software metric that measures the number of
independent paths through a function. A higher cyclomatic complexity indicates
that the function has more decision points and is more complex.
The reason will be displayed to describe this comment to others. Learn more.
I've optimized this function by implementing a single-pass manual scanner with a fast-path check. This replaces the previous 20 sequential .replace() calls, which were inefficient due to multiple full-string scans and intermediate allocations. Benchmarks show a ~37x speedup for clean text and a ~3.8x speedup for content with entities.
Fixed `clippy::while_let_on_iterator` in `src/providers/direct_fetch.rs`
and `benches/decode_entities_bench.rs` by replacing `while let` with
`for` loops. Removed unused `OnceLock`, `Regex`, and `Captures` imports
from the benchmark file.
Co-authored-by: d-oit <6849456+d-oit@users.noreply.github.com>
Applied `cargo fmt` to `cli/benches/decode_entities_bench.rs` to resolve
CI formatting check failure.
Co-authored-by: d-oit <6849456+d-oit@users.noreply.github.com>
Finalized the performance optimization of `decode_entities` by ensuring
full compliance with repository linting and formatting standards.
- Applied `cargo fmt` to resolve CI formatting failures.
- Fixed `clippy::while_let_on_iterator` in the benchmark scanner.
- Maintained ~37x speedup for text without entities.
Co-authored-by: d-oit <6849456+d-oit@users.noreply.github.com>
Refactored `decode_entities` to use `char_indices()` and string slicing,
eliminating the transient `String` allocation for every ampersand
encountered.
Benchmark results (`decode_entities_bench`):
- Text with entities: ~5.7µs -> ~915ns (6x speedup)
- Text without entities: ~3.0µs -> ~60ns (50x speedup)
- Long text with entities: ~30.5µs -> ~8.6µs (3.5x speedup)
Co-authored-by: d-oit <6849456+d-oit@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The
decode_entitiesfunction in thedirect_fetchprovider was identified as a performance bottleneck due to 20 sequential.replace()calls. Each call triggered a full string traversal and a potential new string allocation.I implemented a manual single-pass scanner that iterates through the string once and performs lookahead only when an ampersand (
&) or word joiner (\u{2060}) is encountered. I also added a fast-path check using.contains()to return early if no replacement characters are present.Performance improvements measured with Criterion:
All tests passed, including the
direct_fetchintegration tests that verify LaTeX and code block extraction which rely on this decoding logic.PR created automatically by Jules for task 13342671035324967602 started by @d-oit