Crates.io | regex-bnf |
lib.rs | regex-bnf |
version | 0.1.2 |
source | src |
created_at | 2023-02-01 07:10:03.814479 |
updated_at | 2023-02-01 09:15:03.294315 |
description | A deterministic parser for a BNF inspired syntax with regular expressions |
homepage | |
repository | https://github.com/arduano/regex-bnf/tree/master/regex-bnf |
max_upload_size | |
id | 773353 |
size | 12,938 |
A macro-based BNF style parser for easier grammar definition.
This approach is useful when you need to parse a complex grammar without tokenizing first, e.g. where tokens may contain spaces and newlines and have complex rules surrounding them.
Here is a simple example CSV parser:
use regex_bnf::*;
bnf! {
Value = <!Eof> <!NewLine> val:r"([^,\r\n]|\\,)*" <?Comma>;
Line = <!Eof> values:<[Value]> <LineEnd>;
Document = lines:<[Line]^>;
Comma = ",";
NewLine = r"[\r\n]+";
Eof = ^;
enum LineEnd = [NewLine | Eof];
}
The above macro creates a struct for each token (Value
, Line
, Document
, etc.) and an enum for LineEnd
. Each struct and enum contains a parse function that takes in a StringParser and returns a Result with (parsed value, remaining string)
or an error.
This implementation is entirely deterministic, and succeptable to deadlocks including infite loops and stack overflows. In order to debug it, read it linearly as each struct parses in the order of declaration. So in the CSV example above, LineEnd
would first try to parse NewLine
and then Eof
. If it fails to parse either, it will return an error.
The parser is very high performance, as it has zero nondeterminstic behavior and also performs zero allocations other than boxed tokens wich are optional. All parsed strings are referenced as slices along with their location (index, line number, line column).
There are 2 types of declarations:
Within tag declarations, you can label the tokens with label:<token>
to give them a field name within the generated struct, otherwise they are omitted.
Here are all the possible token types:
""
r""
<Tag>
(inline), <*Tag>
(boxed, to avoid infinite size structs)<?Tag>
<[Tag]*>
(zero or more times), <[Tag]+>
(one or more times), <[Tag]^>
(until the end of the string),<!Tag>
(fails if the tag is parsed successfully)