Crates.io | grammar_tool |
lib.rs | grammar_tool |
version | 0.1.0 |
source | src |
created_at | 2020-02-20 23:40:45.341494 |
updated_at | 2020-02-20 23:40:45.341494 |
description | A tool for understanding LL(k) grammars |
homepage | |
repository | https://github.com/JustAPerson/denuocc/tree/master/tools/grammar_tool |
max_upload_size | |
id | 211056 |
size | 65,149 |
This tool aims to help in the development of predictive recursive descent parser.
There are four main subcommand for grammar_tool
:
print
shows and numbers every production in the grammar
$ cargo run -- print ./grammars/aho_ullman/example_5.3.yacc
start: S
terminals: a b
0 S : ;
1 S : a b A ;
2 A : S a a ;
3 A : b ;
It shows the start
nonterminal, which happens to be S
. It shows that the
terminals in this grammar are a
and b
. It then shows the 4 productions in
this grammar.
first
shows the first set for every nonterminal in the grammar
$ cargo run -- first -k2 ./grammars/aho_ullman/example_5.3.yacc
S :
S : a b
A : a a
A : a b
A : b
This means that the production S
can legally start with an empty string or the
token string "ab"
. Similarly, the nonterminal A
may begin with the token
strings "aa"
, "ab"
, or just "b"
. Notably, A
cannot be an empty string.
follow
shows the follow set for every nonterminal in a grammar
$ cargo run -- follow -k2 ./grammars/aho_ullman/example_5.3.yacc
S :
S : a a
A :
The S :
line with nothing after the colon means that is valid for the input
string to end after parsing an S
. The same goes for the line A :
. The
line S : a a
means that the token string "aa"
may sometimes legally
follow after parsing an S
.
test
will verify if a grammar if LL(k)
or explain why it is not
$ cargo run -- test -k1 ./grammars/aho_ullman/example_5.3.yacc
grammar is not LL(1)
$ cargo run -- test -k1 --explain ./grammars/aho_ullman/example_5.3.yacc
productions [0, 1] cause LL-conflicts: [["a"]]
production 0 S : ;
production 1 S : a b A;
conflicting suffix: ["a"]
grammar is not LL(1)
$ cargo run -- test -k2 ./grammars/aho_ullman/example_5.3.yacc
grammar is strong LL(2)
As the follow
command will show, an S
production may be followed by the
sequence a a
(which occurs in production #2 A : S a a
). Thus, it is
impossible to tell with only 1 token lookahead which of the two S
productions to choose.
grammar_tool
accepts a very simple grammar format similar to YACC. The input is
split into two parts: the header and the body, separated by a line of just %%
.
The header may contain a %token
or %start
line. The %token
line declares
identifiers that are terminals. The %start
line specifies which nonterminal is
the root of the grammar.
The body contains the definitions of every nonterminal. A definition may provide
multiple productions using the |
character. Any identifier referenced in a
production, if not declared by a %token
header, is assumed to be a
nonterminal. A quoted string in a production is also a terminal.
%token TERMINAL
%start S
%%
S : variant1
| variant2
;
variant1 : TERMINAL TERMINAL;
variant2 : "terminal" ;
This project is dual licensed under the terms of the MIT license and the Apache License Version 2.0 at your option. See ./LICENSE-MIT and ./LICENSE-APACHE for details.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.