| Crates.io | nbnf_language |
| lib.rs | nbnf_language |
| version | 0.0.2 |
| created_at | 2025-03-05 01:53:33.805862+00 |
| updated_at | 2025-03-05 02:16:00.171604+00 |
| description | A parser for the NBNF language itself, and the parser generator |
| homepage | |
| repository | https://github.com/Yoplitein/nbnf |
| max_upload_size | |
| id | 1578053 |
| size | 45,155 |
A parser generator based on nom, with syntax inspired by EBNF and regex.
A grammar is a series of rules containing expressions. Whitespace is ignored, rules must end with a semicolon:
rule = ...;
rule2 =
...
...;
...
A rule generates a parser function as Rust code, and so its name must be a valid Rust identifier.
The output type of the generated function can be specified, defaulting to &str if omitted:
rule<Output> = ...;
Any valid Rust code denoting a type is permitted between the chevrons.
Expressions can invoke any parser function defined in Rust, with other rules simply being resolved as symbols in the same enclosing module:
top = inner external_rule nbnf::nom::combinator::eof;
inner = ...;
Rules can match literal chars, strings, or regex-like character ranges; and supports Rust-like escapes:
top = 'a' "bc" [de-g] '\x2A' "\"\0\r\n\t\x7F\u{FF}";
Expressions can be grouped with parentheses, and alternated between with slash:
top = ('a' 'b') / ('c' 'd');
Expressions can be repeated with regex-like syntax:
r1 = 'a'?; // zero or one
r1 = 'b'*; // zero or more
r2 = 'c'+; // one or more
r3 = 'd'{2}; // exactly two
r4 = 'e'{2,}; // at least two
r5 = 'f'{,2}; // at most two
r6 = 'g'{2,4}; // between two to four
Expressions can be tagged with various modifiers, wrapping them in combinators:
!! (cut) prevents backtracking, e.g. when you know no other expressions can matchjson_object_pair<(String, Json)> = string !!(-':' json_value);
! (not) matches only when the expression does not match, consuming no inputident = -![0-9] ~[a-zA-Z0-9_]+;
~ (recognize) will discard the output and instead yield the portion of the input that was matchedr1<(i32, f64)> = ...;
r2<&str> = ~r1;
Expressions can be discarded from output by prefixing them with -:
string<&str> = -'"' ~(string_char+) -'"'
For this particular grammar, foregoing the discards would require a tuple as the return type because the quote chars are included:
string<(char, &str, char)> = ...;
The empty string can be matched with &, allowing various interesting grammar constructs:
parens = ~('(' parens ')') / ~&;
Types and output values can be massaged in a few ways by passing any valid Rust expression:
@<...> (value) discards output and instead returns the given literaltoken<Token> =
... /
'/'@<Token::Slash> /
...;
|<...> (map) runs a mapping function over the outputobject<HashMap> =
-'{' object_pair+ -'}'
|<HashMap::from_iter>;
|?<...> (map_opt) runs a mapping function returning Option over the outputeven_int<i32> =
int
|?<|v| (v & 1 == 0).then_some(v)>;
|!<...> (map_res) runs a mapping function returning Result over the outputnumber<i32> =
~([0-9]+)
|!<i32::from_str>
The main entrypoint is nbnf::nbnf, a proc macro that expands to parsers generated from the given grammar.
Note that the input must be passed as a string (preferably a raw string,)
as certain expressions which are valid grammars are invalid Rust (e.g. the unbalanced quote in [^"].)
use nbnf::nbnf;
nbnf!(r#"
top = ~('a' top 'b') / ~&;
"#);
fn main() {
let input = "aabbc";
let (rest, output) = top.parse(input).unwrap();
assert_eq!(rest, "c");
assert_eq!(output, "aabb");
}