Generated file missing tokens

I am trying to test [grammars-v4/verilog/verilog/](https://github.com/antlr/grammars-v4/tree/ffecfeee601ffc75edbc52845c1509753d6dd4a1/verilog/verilog) using Grammarinator. But, I'm getting problems in parsing some generated output. When I look at output from Trees.print(), the tree doesn't seem to contain all the tokens or sometimes more tokens that aren't in the printed tree.

Here is the code that I am executing:

    git clone https://github.com/antlr/grammars-v4.git
    cd grammars-v4
    git checkout ffecfeee601ffc75edbc52845c1509753d6dd4a1
    cd verilog/verilog
    # Already cloned and build grammarinator from sources.
    grammarinator-process VerilogLexer.g4 VerilogParser.g4 -o .
    grammarinator-generate VerilogGenerator.VerilogGenerator  --sys-path . -d 15 -n 100 -r source_text --serializer grammarinator.runtime.simple_space_serializer --no-mutate --no-recombine
    # Already built a standardized Antlr4 parser driver for the the grammar.
    for  i in tests/test_*; do echo $i; ./Generated/bin/Debug/net5.0/Test.exe -file $i; status=$?; if [[ $status != 0 ]]; then break; fi; done

This loops through the various generated tests, parsing each, and stops the loop on a test file that does not parse.

I've assume that Grammarinator would construct a valid CST ("Unparser" tree) and output that. While most tests parse, some do not, and only appear when `-d 15` is specified. I've included the `--no-mutate` and `--no-recombine` so that the tree is output as is unmodified.

To understand WHY the parse fails, I need to look at the CST constructed prior to serializing the token stream into a generated test. To do that, I modified generate.py [after this line](https://github.com/renatahodovan/grammarinator/blob/583ebca242b7f2d181b1151a9128fc178f00c4d3/grammarinator/generate.py#L130) with this code:

        print("Index = ")
        print(index)
        tree.print()

I now rerun the `grammarinator-generate` command and save the human-readable parse trees, and rerun the parser.

Selecting a test that fails, I've noticed that the tree.print() output is not the same as the generated text, and the tokens reported by the standardized Antlr parser.

For example,

* Output from tree.print():

    ...
    VERTICAL_BAR
    DOLLAR_RANDOM
    COMMA
    COMMA
    SIMPLE_IDENTIFIER
    ...

* Tokens recognized by parser:

    ...
    VERTICAL_BAR
    DOLLAR_RANDOM
    COMMA
    SIMPLE_IDENTIFIER
    ...

(Note, only one COMMA.)

* Relevant sequence in generated file:

    |  $random     , J 

(Note, only one COMMA.)

I have noticed other times similar token differences. It seems that 

***Grammarinator indicates some tokens in the CST that are not being outputted.***

Incidentally, I tried to just save the tree using --keep-trees but there is no tool to print out the trees after reading. I tried something like this, but it did not work.

	from pydoc import importfile
	module = importfile('/full/path/to/trees.py')
	module.Trees.print(module.Trees.load("/full/path/to/test_xxx.grt"))


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generated file missing tokens #44

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Generated file missing tokens #44

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions