Crates.io | tree-builder |
lib.rs | tree-builder |
version | 0.0.3 |
source | src |
created_at | 2023-07-14 14:44:12.402286 |
updated_at | 2023-07-25 21:26:26.319194 |
description | Parser Generator library which uses Nom and procedural macros to generate recursive-descent parsers. |
homepage | |
repository | https://github.com/edisnord/tree-builder |
max_upload_size | |
id | 916080 |
size | 81,294 |
TreeBuilder is a parser generator focused on the correct parsing of context-free grammars from ASCII input strings. It generates recursive-descent, backtracking parsers with as little code as possible and with convenient features. The parsers it generates aim to be very easy to use and to be compatable with Nom parsers if the user wishes to use Nom together with TreeBuilder. TreeBuilder will not be supporting left-recursive grammars in this version.
TreeBuilder's [Parser] trait shows the type signature of all TreeBuilder-generated parsers.
Example:
use tree_builder::{build_tree, Parser};
build_tree! {
// Parses a Hex color
HexColor #=> "#",
HexDigit,
HexDigit?, HexDigit?, HexDigit?, HexDigit?, HexDigit?;
// Parses a Hex digit
HexDigit #=> [0-9a-fA-F];
}
fn main() {
// Will pass
HexColor::parse("#abd325").unwrap();
HexColor::parse("#asd000").unwrap();
// Will fail
HexColor::parse("#whatever").unwrap();
}
TreeBuilder can either generate parsers whose output data structure is just a String, or parsers whose output data structure is a more complicated tuple type. If you use the arrow #=> to separate the rule name from it's definition, you have created a lexical rule which outputs String, but if you use =>, you have the option to specify what parts of a rule definition you wish to keep.
You specify the parts you wish to keep using the @ operator (called Include). Data structures for parsers are based on the includes you have made, from left to right.
Let's take for example the rules Arraya and ArrayElems which parse JSON arrays below:
build_tree!{
JValue => /* Definition of a JSON value */;
Array => "[", #s*, @ArrayElems?, #s*, "]";
ArrayElems => @JValue, #s*, @(",", #s*, @JValue, #s*)*;
}
The data structures generated for the rules above are:
struct Array(Optional<Box<ArrayElems>>);
struct ArrayElems(Box<JValue>, Vec<Box<JValue>>);
When you use alternations, the data structures generated are enums, whose variant names are either inferred when you only include a single non-terminal in an alternation, or need to be specified.
Alternation rule example:
build_tree!{
// Rule representing a JSON value
JValue => @JString
| @Number
| @Object
| @Array
| "true" <True>
| "false" <False>
| "null" <Null>;
}
And the data structure generated by this rule:
enum JValue {
JString(Box<JString>),
Number(Box<Number>),
Object(Box<Object>),
Array(Box<Array>),
True(),
False(),
Null(),
}
Keep in mind that in the rules denoted by =>, you cannot use alternations inside of groupings at the moment. This limitation will probably be fixed in the next update.
TreeBuilder allows for the specification of grammars which use terminals, non-terminals, metacharacters and groupings/sub-parsers. Different operators can be applied to the aforementioned elements which define how many times one of them may be repeated.
Terminals: Terminals are written just like strings in any C-like language. They are defined as a string of characters which can be either any character except ’\’ and ’”’, or a character escape sequence such as ’\n’, delimited by double quotes. The possible escape sequences are:
Some examples of terminals:
Metacharacters: A metacharacter is a special character that carries a specific meaning or functionality within a grammar specification. Unlike terminals, which parse their string equivalent, metacharacters have special interpretations and serve as building blocks for constructing powerful and elegant parser rules with as little code as possible. Here are the metacharacters and their meanings:
Nonterminals: Their syntax is exactly the same as ASCII Rust identifiers. A string be- ginning with either an alphabetic character or an underscore, followed by a string of either alphanumerics, an underscore, or both. Their use in the definition of grammars is to spec- ify that another rule is part of the rule you are currently specifying. So the parser which will be generated from a rule which uses a nonterminal will apply the parser related to the nonterminal at the point specified by the user.
Groupings: grouping refers to the act of enclosing a sequence of elements or subexpres- sions within parentheses. It allows for establishing precedence and controlling the evalu- ation order of elements within a grammar specification. These are grouped as one of the smaller elements of the language because other operators can be applied groupings just like they can be applied to the other terms.
Operator | Description |
---|---|
e | e | Alternation |
@e | Include |
e? | Optional |
e* | Zero-or-more |
e+ | One-or-more |
Alternations in TreeBuilder are ordered-choice ones, meaning that out of all the alternatives that you may define for a rule, only one of the alternatives can succeed. This occurs because TreeBuilder will try alternatives from the one defined first all the way to the last one in order, and if any of the alternatives successfully parses for an input string, the alternations after it are ignored completely.