Fix Lexer Issues - Unicode Identifiers and Escape Sequences #146
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fix Lexer Issues - Unicode Identifiers and Escape Sequences
Summary
This PR fixes critical lexer issues in the PerlOnJava compiler, specifically addressing Unicode identifier parsing and escape sequence handling in strings.
Issues Fixed
✅ High Priority Issues Resolved
Unicode Identifier Parsing (Tests 67-79)
?tokens, causing parser failuresLexer.nextToken()to detect and handle surrogate pairs properlyconsumeIdentifier()to process Unicode code points correctlyIdentifierParser.validateIdentifier()to use code points instead of char values\Q Sequence Interpolation (Test 16)
\Qsequences were not being processed correctly, resulting in literal\Qtext in output\Qsequences in quotemeta modeparseEscapeSequence()to properly handle nested\QsequencesTest Results
Before Fix
After Fix
Technical Changes
Lexer.java
nextToken()consumeIdentifier()to handle Unicode code pointsIdentifierParser.java
validateIdentifier()to usecodePointAt()instead ofcharAt()codePointCount()instead oflength()StringDoubleQuoted.java
parseEscapeSequence()to handle nested\Qsequences\QescapesImpact
This fix resolves fundamental lexer issues that were blocking proper parsing of:
The changes are backward compatible and significantly improve the robustness of the PerlOnJava compiler.
Files Changed
src/main/java/org/perlonjava/lexer/Lexer.javasrc/main/java/org/perlonjava/parser/IdentifierParser.javasrc/main/java/org/perlonjava/parser/StringDoubleQuoted.javaTesting
Investigation of re/pat.t Test
Note: The re/pat.t test failures are pre-existing issues, not regressions caused by this PR.
Investigation Results:
Conclusion: The lexer fixes in this PR are working correctly and do not impact the pre-existing runtime issues in re/pat.t.