Open
Conversation
This adds the `!` prefix which represents negative lookahead. This was included in the original PEG paper, though it was called "NOT", whereas I went with a more explicit "NegativeLookahead". This will be helpful in several productions which need to have these kinds of exclusions. The syntax is also commonly used in regular expression engines which usually use `(?!expr)`. This is also common in many other PEG libraries. There is a small risk this could be confusing, since `!` is sometimes used for other purposes in other contexts. For example, Prolog uses `!` for their cut operator. I think this should be fine since it is common with PEG.
This adds the ability to specify Unicode code points in a character range. This will be useful for defining some productions without using English, and perhaps to be a little clearer. This also extends the Unicode grammar to allow up to 6 characters for larger code points.
This replaces some suffixes and prose with the new negative lookahead syntax instead. This should all have the same meaning.
This clarifies that bare `//` is explicitly meant to be either followed by LF or EOF. Otherwise it incorrectly matches other comment rules.
This fixes the BLOCK_COMMENT grammar so that it follows the rule that the first alternation that matches wins. The previous grammar would fail with the use of the cut operator to parse these two forms.
This fixes the doc comments so that they properly handle a carriage return by using the cut operator. Rustc will fail parsing if a doc comment contains a carriage return. This requires including (LF|EOF) at the end of line so the cut operator has something to complete the line. This also removes the negative `/` from OUTER_LINE_DOC. This does not work correctly with the check for CR, and is not needed because LINE_COMMENT already matches `////`. Later I plan to include a rule for comments that makes it clear the order that they are parsed. A negative lookahead is necessary in OUTER_BLOCK_DOC to prevent it from trying to parse what should be a BLOCK_COMMENT as an OUTER_BLOCK_DOC and failing due to the cut operator.
This is intended to indicate the order that the rules are expected to be processed (as defined in this grammar). Of course real parsers can take a different approach if they have the same results. This is roughly similar to the order that rustc takes, though [`block_comment`](https://github.com/rust-lang/rust/blob/d7daac06d87e1252d10eaa44960164faac46beff/compiler/rustc_lexer/src/lib.rs#L782-L817) roughly takes the approach of combining the `/*` prefix, and then deciding if it is an inner doc comment, outer doc comment, or else a regular block comment. LINE_COMMENT must be first so that it is not confused with a doc comment. BLOCK_COMMENT must be last so that its cut operator does not interfere with doc comments that start with `/*`. It could be moved up higher in the list if it had negative lookahead to disambiguate OUTER_BLOCK_DOC, but the expression for that is more complicated than the one in OUTER_BLOCK_DOC.
rustc actually includes the spaces for doc comments.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This adds the
!negative lookahead to the grammar to make it easier to express certain rules, and to remove some of the English-based rules.This updates several rules to use
!, and also fixes mistakes in several rules. See the individual commits for more details.As part of this, it also adds the ability to specify
U+xxxxUnicode values in character ranges, since it was needed to express some things without English rules.