refactor(es/minifier): Improve tpl to str#11415
Conversation
🦋 Changeset detectedLatest commit: 02a7aab The changes in this PR will be included in the next version bump. Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
Binary Sizes
Commit: 58e014a |
7eb2654 to
e65312b
Compare
e65312b to
7c9954b
Compare
|
I'm not sure why we have
There are some encoding logic in |
400c7bb to
b94f214
Compare
b94f214 to
f5de72d
Compare
|
Does CI passes, if you remove it? |
|
I have changed some outputs here, but looks not a bug. |
|
@claude Review this PR. My concerns are
|
|
Claude encountered an error —— View job I'll analyze this and get back to you. |
|
@claude Review this PR. My concerns are
You should do |
|
Claude encountered an error —— View job I'll analyze this and get back to you. |
kdy1
left a comment
There was a problem hiding this comment.
Review Summary
I've identified several critical issues that need to be addressed:
- CRITICAL: Compilation error in - calls removed function
- HIGH: Surrogate pair handling logic has multiple bugs in
- MEDIUM: Error handling uses instead of graceful degradation
- LOW: Missing test coverage for edge cases
Detailed comments are provided on each file. Please address the critical and high priority issues before merging.
kdy1
left a comment
There was a problem hiding this comment.
Review Summary
I have identified several critical issues that need to be addressed:
- CRITICAL: Compilation error in misc.rs line 1429 - calls removed function
- HIGH: Surrogate pair handling logic has multiple bugs in lit.rs
- MEDIUM: Error handling uses panic instead of graceful degradation
- LOW: Missing test coverage for edge cases
Detailed comments are provided on each file. Please address the critical and high priority issues before merging.
|
Correction on my first comment: You are absolutely right that my comment about line 1429 in misc.rs was incorrect. Looking at the diff more carefully, the change on that line is: - cur_cooked.push_wtf8(&convert_str_value_to_tpl_cooked(&s.value));
+ cur_cooked.push_wtf8(&Cow::Borrowed(&s.value));This shows you already fixed this correctly in the PR - the new version uses The other comments about surrogate pair handling, iterator advancement, and panic usage remain valid concerns for the |
|
I assumed the panic is unreachable code, I will add error handing later. |
|
So, Can I try to remove |
|
Yeap, it would be nice. Thank you! |
766606f to
627a9f4
Compare
|
Seems |
627a9f4 to
2a37002
Compare
|
Remove force_str_for_tpl is complicated, I would open another PR instead. |
|
I see, it sounds good. Thanks! |
|
OK, now force_str_for_tpl has been removed, and all tests passed |
|
@claude Review this PR |
|
Claude encountered an error —— View job I'll analyze this and get back to you. |
Change version from patch to major for swc_ecma_ast.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 16 out of 16 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // We can be sure `prev_result` is less than 0xdc00 | ||
| buf.push(unsafe { | ||
| CodePoint::from_u32_unchecked(prev_result) | ||
| }); |
There was a problem hiding this comment.
After pushing the previous high surrogate at lines 369-371, prev_result should be reset to NO_PREV_RESULT to avoid pushing it again at line 396. Without this reset, if the current result is also a high surrogate but not followed by \u, the previous surrogate will be pushed twice. Add prev_result = NO_PREV_RESULT; after line 371.
| }); | |
| }); | |
| prev_result = NO_PREV_RESULT; |
| ) | ||
| } else { | ||
| break; | ||
| } | ||
| } |
There was a problem hiding this comment.
When count >= max_len and result > 0x10ffff, the code emits an error but doesn't break out of the loop, allowing it to continue parsing more hex digits beyond the maximum length. This should break after emitting the error to prevent parsing extra digits. Change line 350 to include a break statement or restructure the logic to break after the if-else at line 351-353.
| ) | |
| } else { | |
| break; | |
| } | |
| } | |
| ); | |
| } | |
| break; | |
| } |
| '\n' | '\u{2028}' | '\u{2029}' => {} | ||
| 'u' | 'x' => { | ||
| let mut count: u8 = 0; | ||
| // result is a 4 digit hex value |
There was a problem hiding this comment.
The comment "result is a 4 digit hex value" is misleading because the result can be 2 digits for \x escapes, 4 digits for \uXXXX escapes, or up to 6 digits for \u{...} escapes. Consider updating the comment to be more accurate, such as "result accumulates the hex digits".
| // result is a 4 digit hex value | |
| // result accumulates the hex digits of the escape sequence |
| iter.next(); | ||
| let mut cloned_iter = iter.clone(); | ||
| while let Some(cp) = cloned_iter.next() { | ||
| if let Some(ch) = cp.to_char() { | ||
| if ch == '}' { | ||
| iter = cloned_iter; | ||
| template_longer_count += 1; | ||
| break; | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
The logic for handling ${ sequences in string-to-template conversion is incorrect. When converting a string literal containing ${ to a template, only the ${ needs to be escaped as \${ (adding 1 character for the backslash). The current code incorrectly tries to find a matching } and only increments the count if found. This should be simplified to: when ${ is encountered, increment template_longer_count by 1 (for the added backslash) without looking for a closing brace.
| iter.next(); | |
| let mut cloned_iter = iter.clone(); | |
| while let Some(cp) = cloned_iter.next() { | |
| if let Some(ch) = cp.to_char() { | |
| if ch == '}' { | |
| iter = cloned_iter; | |
| template_longer_count += 1; | |
| break; | |
| } | |
| } | |
| } | |
| // When converting to a template, `${` becomes `\${`, | |
| // adding exactly one extra character (the backslash). | |
| iter.next(); // Consume the '{' | |
| template_longer_count += 1; |
kdy1
left a comment
There was a problem hiding this comment.
Opus is happy with it
The removal of convert_str_value_to_tpl_cooked appears to be correct. The function was doing an extra level of unescaping that was unnecessary because Str::value already contains the decoded string value. The direct use of Cow::Borrowed(&s.value) is semantically appropriate.
|
Thank you! |
… es5 (#11529) **Description:** When the target environment is below ES2015, and non-BMP characters (like emojis) are encountered, we stop the conversion. This is because: 1. Tpl: `🦀` (may output directly in source code or require minimal escaping) -> shorter 2. Str (in ES5 mode): `\uD83E\uDD80` (escape sequence for surrogate pair) -> extremely long **Related issue (if exists):** #11415 --------- Co-authored-by: claude[bot] <41898282+claude[bot]@users.noreply.github.com>
Description:
Pure::convert_tpl_to_strto prefercookedvalues as much as possible.Str::from_tpl_rawto returnWtf8Atominstead ofAtom.