| Crates.io | lexigram |
| lib.rs | lexigram |
| version | 0.8.2 |
| created_at | 2025-12-24 19:42:30.317947+00 |
| updated_at | 2025-12-24 20:09:20.498187+00 |
| description | Lexer and LL(1) parser generator |
| homepage | https://github.com/blueglyph/lexigram |
| repository | https://github.com/blueglyph/lexigram |
| max_upload_size | |
| id | 2003762 |
| size | 24,445 |
Important note: This crate is still under development and shouldn't be considered stable yet (hence the 0.x version). I'm currently adding some documentation and examples, and refactoring the code.
lexigram CrateThis crate provides a lexer/parser generator, which parses a plain lexicon and grammar from a source file, and generates the Rust source code for the corresponding lexer and nonrecursive, predictive LL(1) parser.
lexigram is inspired by ANTLR's lexicon and grammar language, and by its listener interface, which is simple, direct, and efficient.
Its main particularities are:
Here is a lexicon and a grammar for a basic expression parser:
lexicon Calc;
Add : '+';
Div : '/';
Exp : '^';
Mul : '*';
Sub : '-';
Num : [0-9]+;
WhiteSpace : [ \n\r\t]+ -> skip;
grammar Calc;
expr:
Sub expr
| expr <R> Exp expr // <R>: right-associative
| expr (Mul | <P> Div) expr // <P>: Div has the same priority as Mul
| expr (Add | <P> Sub) expr
| Num
;
In such ambiguous rules,
<P> attribute marks a rule alternative that must have the same priority as the previous one.<R> attribute marks a right-associative operator.(Those attributes can be placed anywhere in the alternative.)
lexigram will create the tables for the lexer and parser, and all the wrapper code that interfaces the parser and the user listener. We show only the few illustrative bits below.
Each rule as an initialization method, called when the parser has predicted it was the next rule, and an exit method, called once the rule has been fully parsed.
pub trait CalcListener {
// ... (other general methods)
fn exit(&mut self, expr: SynExpr) {}
fn init_expr(&mut self) {}
fn exit_expr(&mut self, ctx: CtxExpr) -> SynExpr;
}
Here, the type SynExpr represents the expr nonterminal value, and it must be defined by the user (a template is provided).
The exit_expr method must return the value that has been calculated in function of the context, ctx, which contains the value of the terminals (tokens) and nonterminals (rules). Note that this is optional: nonterminals may hold no value, in which case the method has a default, empty definition.
The context reflects the original grammar rules, regardless of how they were tranformed to accommodate an LL(1) parser:
pub enum CtxExpr {
/// `expr -> "-" expr`
V1 { expr: SynExpr },
/// `expr -> expr <R> "^" expr`
V2 { expr: [SynExpr; 2] },
/// `expr -> expr "*" expr`
V3 { expr: [SynExpr; 2] },
/// `expr -> expr <P> "/" expr`
V4 { expr: [SynExpr; 2] },
/// `expr -> expr "+" expr`
V5 { expr: [SynExpr; 2] },
/// `expr -> expr <P> "-" expr`
V6 { expr: [SynExpr; 2] },
/// `expr -> Num`
V7 { num: String },
}
The user then creates a listener object, that will typically hold the AST, and implements the CtxExpr trait for that object. Here, we don't build an AST but just illustrate how the callback can be used for a basic calculator.
pub struct SynExpr(pub f64);
pub struct ExprListener {
value: Option<f64>,
}
impl TestListener for ExprListener {
// ...
fn exit(&mut self, expr: SynExpr) {
self.value = Some(expr.0);
}
fn exit_e(&mut self, ctx: CtxExpr) -> SynExpr {
SynExpr(match ctx {
CtxExpr::V1 { expr: SynExpr(left) } => - left,
CtxExpr::V2 { expr: [SynExpr(l), SynExpr(r)] } => l.powf(r),
CtxExpr::V3 { expr: [SynExpr(l), SynExpr(r)] } => l * r,
CtxExpr::V4 { expr: [SynExpr(l), SynExpr(r)] } => l / r,
CtxExpr::V5 { expr: [SynExpr(l), SynExpr(r)] } => l + r,
CtxExpr::V6 { expr: [SynExpr(l), SynExpr(r)] } => l - r,
CtxExpr::V7 { num } => f64::from_str(&num).unwrap(),
})
}
}
lexigram -h shows the usage summary.
In a nutshell, you must provide the location of the lexicon, the grammar, and specify where to generate the code.
The lexicon and the grammar are usually stored as stand-alone files, like "expr.l" and "expr.g". They may also be included between tags in any file.
The same stands for the output of lexigram: it can create stand-alone files or insert it between tags in an existing file.
For example, if you create a file "calc.rs" with the following content, everything will be in a single file:
/*!
```
// [lexicon]
lexicon Calc;
Add : '+';
Div : '/';
Exp : '^';
Mul : '*';
Sub : '-';
Num : [0-9]+;
WhiteSpace : [ \n\r\t]+ -> skip;
// [lexicon]
// [grammar]
grammar Calc;
expr:
Sub expr
| expr <R> Exp expr // <R>: right-associative
| expr (Mul | <P> Div) expr // <P>: Div has the same priority as Mul
| expr (Add | <P> Sub) expr
| Num
;
// [grammar]
```
*/
#[derive(Debug)]
pub struct SynExpr(pub f64);
mod lexer {
// [lexer_source]
// [lexer_source]
}
mod parser {
// [parser_source]
// [parser_source]
}
Launch the following command:
lexigram --indent 4 -x calc.rs tag lexicon -l calc.rs tag lexer_source \
-g calc.rs tag grammar -p calc.rs tag parser_source \
--lib "super::SynExpr"
It will insert the generated code of the lexer and parser inside the [lexer_source] and [parser_source] tags, respectively. Launching the same command will replace any existing code between the tags, so it can be used iteratively (useful if you need to fix or change the grammar!).
You'll just need to add a dependency to lexigram-core with the version that corresponds to the lexigram binary ("lexigram -V" will show its version).
Using the lexi-gram crate allows you to generate the code programmatically from Rust code, which is another option.
You can find some examples in the repository (https://github.com/blueglyph/lexigram)
lexigram itself uses a couple of lexers/parsers (that it generated itself) to parse the lexicon (Lexi) and the grammar (Gram)—hence the tool name, aside from the obvious pun. Lexi and Gram, however, are a little harder to trace in the source code because they're generated in two steps, to keep them from breaking and to regenerate them more easily when the code evolves.
Rudimentary documentation can be found in the GitHub project.
Some #[ignore] tests can be found in the code base to regenerate the multiple lexers and parsers, including in the examples.
The lexigram project includes the following crates:
lexigram-core, the minimum library required by the code generated by lexigramlexigram-lib, the full library used by lexi-gram and lexigramlexi-gram, the Lexi and Gram parsers, which parse the lexicon and grammar language for the lexigram toollexigram, the lexer/parser generator toolRELEASES.md keeps a log of all the releases (most are on the GitHub release page, too).
This code is licenced under either MIT License or Apache License 2.0, at your option.