Skip to content

Generated file missing tokens #44

@kaby76

Description

@kaby76

I am trying to test grammars-v4/verilog/verilog/ using Grammarinator. But, I'm getting problems in parsing some generated output. When I look at output from Trees.print(), the tree doesn't seem to contain all the tokens or sometimes more tokens that aren't in the printed tree.

Here is the code that I am executing:

git clone https://github.com/antlr/grammars-v4.git
cd grammars-v4
git checkout ffecfeee601ffc75edbc52845c1509753d6dd4a1
cd verilog/verilog
# Already cloned and build grammarinator from sources.
grammarinator-process VerilogLexer.g4 VerilogParser.g4 -o .
grammarinator-generate VerilogGenerator.VerilogGenerator  --sys-path . -d 15 -n 100 -r source_text --serializer grammarinator.runtime.simple_space_serializer --no-mutate --no-recombine
# Already built a standardized Antlr4 parser driver for the the grammar.
for  i in tests/test_*; do echo $i; ./Generated/bin/Debug/net5.0/Test.exe -file $i; status=$?; if [[ $status != 0 ]]; then break; fi; done

This loops through the various generated tests, parsing each, and stops the loop on a test file that does not parse.

I've assume that Grammarinator would construct a valid CST ("Unparser" tree) and output that. While most tests parse, some do not, and only appear when -d 15 is specified. I've included the --no-mutate and --no-recombine so that the tree is output as is unmodified.

To understand WHY the parse fails, I need to look at the CST constructed prior to serializing the token stream into a generated test. To do that, I modified generate.py after this line with this code:

    print("Index = ")
    print(index)
    tree.print()

I now rerun the grammarinator-generate command and save the human-readable parse trees, and rerun the parser.

Selecting a test that fails, I've noticed that the tree.print() output is not the same as the generated text, and the tokens reported by the standardized Antlr parser.

For example,

  • Output from tree.print():

    ...
    VERTICAL_BAR
    DOLLAR_RANDOM
    COMMA
    COMMA
    SIMPLE_IDENTIFIER
    ...

  • Tokens recognized by parser:

    ...
    VERTICAL_BAR
    DOLLAR_RANDOM
    COMMA
    SIMPLE_IDENTIFIER
    ...

(Note, only one COMMA.)

  • Relevant sequence in generated file:

    | $random , J

(Note, only one COMMA.)

I have noticed other times similar token differences. It seems that

Grammarinator indicates some tokens in the CST that are not being outputted.

Incidentally, I tried to just save the tree using --keep-trees but there is no tool to print out the trees after reading. I tried something like this, but it did not work.

from pydoc import importfile
module = importfile('/full/path/to/trees.py')
module.Trees.print(module.Trees.load("/full/path/to/test_xxx.grt"))

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions