| Crates.io | parlex-gen |
| lib.rs | parlex-gen |
| version | 0.3.0 |
| created_at | 2025-09-22 20:02:31.679296+00 |
| updated_at | 2025-10-23 10:16:39.443741+00 |
| description | Lexer generator ALEX and parser generator ASLR |
| homepage | |
| repository | https://github.com/ikhomyakov/parlex.git |
| max_upload_size | |
| id | 1850589 |
| size | 124,960 |
Lexer generator ALEX and parser generator ASLR.
parlex-gen is the companion crate to parlex, providing the ALEX lexer generator and the ASLR parser generator. Together, these tools form the code generation component of the Parlex framework, enabling the automatic construction of efficient lexical analyzers and parsers in Rust.
The system is inspired by the classic lex (flex) and yacc (bison) utilities written for C, but provides a Rust-based implementation that is more composable and improves upon ambiguity resolution. Unlike lex and yacc, which mix custom user code with automatically generated code, Parlex cleanly separates the two: grammar rules and lexer definitions are explicitly named, and user code refers to them by name.
The ALEX lexer generator offers expressive power comparable to that of lex or flex. It leverages Rust’s standard regular expression libraries to construct deterministic finite automata (DFAs) that operate efficiently at runtime to recognize permitted lexical patterns. The system supports multiple lexical states, enabling context-sensitive tokenization.
The ASLR parser generator implements the SLR(1) parsing algorithm, which is somewhat less general than the LALR(1) method employed by yacc and bison. Nevertheless, ASLR introduces a significant enhancement: it supports dynamic runtime resolution of shift/reduce ambiguities, offering greater flexibility in domains such as Prolog, where operator definitions may be introduced or redefined at runtime.
Lexers and parsers generated by the parlex-gen tools depend on the parlex core library, which provides the traits, data structures, and runtime support necessary for their execution. Users define their grammars and lexical rules declaratively, invoke ALEX and ASLR to generate Rust source code, and integrate the resulting components with application logic through the abstractions provided by parlex.
Add this to your Cargo.toml:
[build-dependencies]
parlex-gen = "0.3"
You'll also need the core library:
[dependencies]
parlex = "0.3"
alex and aslrDefine your lexer in lexer.alex and your grammar in parser.g, then run the ALEX and ASLR generators to produce the corresponding Rust source files.
A typical build.rs script might look like this:
// In your build.rs
use std::path::PathBuf;
use parlex_gen::{alex, aslr};
fn main() {
let manifest_dir = std::env::var("CARGO_MANIFEST_DIR").unwrap();
let out_dir = PathBuf::from(std::env::var("OUT_DIR").unwrap());
// --- ALEX Lexer Generation ---
let input_file = PathBuf::from(&manifest_dir).join("src/lexer.alex");
println!("cargo:rerun-if-changed={}", input_file.display());
println!("cargo:warning=ALEX input file: {}", input_file.display());
println!("cargo:warning=ALEX output directory: {}", out_dir.display());
alex::generate(&input_file, &out_dir, "lexer_data", false).unwrap();
// --- ASLR Parser Generation ---
let input_file = PathBuf::from(&manifest_dir).join("src/parser.g");
println!("cargo:rerun-if-changed={}", input_file.display());
println!("cargo:warning=ASLR input file: {}", input_file.display());
println!("cargo:warning=ASLR output directory: {}", out_dir.display());
aslr::generate(&input_file, &out_dir, "parser_data", false).unwrap();
}
The Alex specification defines lexical rules for recognizing the textual structure of a language before parsing. It describes how to match the components of tokens — such as identifiers, numbers, delimiters, operators, and string or block contents — using regular expressions and lexical states.
An Alex specification contains:
Macro definitions Named regular expressions declared as:
NAME = regex
Macros can be referenced with {{NAME}} inside other patterns.
They are used to build complex rules from smaller reusable fragments (e.g., {{DEC}}, {{ATOM}}, {{VAR}}).
Lexical rules Each rule specifies what pattern to match and in which lexical state it applies:
RuleName: <State1, State2> pattern
These rules describe low-level recognition of language elements — not yet semantic tokens, but the raw lexical building blocks.
Lexical states
States define contexts that control which rules are active at any time.
The lexer can switch states dynamically, allowing it to handle nested or context-dependent structures (for example, strings, comments, or embedded data blocks).
A * in the state list indicates that the corresponding regular expression rule is active in all lexical states.
WS = [ \t]
NL = \r?\n
IDENT = [a-z_][a-z_A-Z0-9]*
NUMBER = [0-9]+
Ident: <Expr> {{IDENT}}
Number: <Expr> {{NUMBER}}
Semicolon: <Expr> ;
Equals: <Expr> =
Plus: <Expr> \+
Minus: <Expr> -
Asterisk: <Expr> \*
Slash: <Expr> /
LeftParen: <Expr> \(
RightParen: <Expr> \)
CommentBegin: <Expr, Comment> /\*
CommentEnd: <Comment> \*/
CommentChar: <Comment> [^*\r\n]+
NewLine: <*> {{NL}}
WhiteSpace: <Expr> {{WS}}+
Error: <*> .
Note: The first lexical state encountered in the specification file is used as the starting lexer state (in this case,
Expr).
An ASLR specification defines a context-free grammar for use with the aslr SLR(1) parser generator.
It consists of production rules, written in a simple, line-oriented format:
rule_name: Nonterminal -> Symbol Symbol ...
Rule names follow the pattern:
[a-z]([a-zA-Z0-9])*
Nonterminals use capitalized names (e.g., Expr, Term, Seq).
Terminals follow either:
[a-z]([a-zA-Z0-9])* — for word-like tokens, or. dot |
- minus |
~ tilde |
` backtick |
! exclamation |
@ at |
# hash |
$ dollar |
% percent |
^ caret |
& ampersand |
* asterisk |
+ plus |
= equals |
| pipe |
\\ backslash |
< lessThan |
> greaterThan |
? question |
/ slash |
; semicolon |
( leftParen |
) rightParen |
[ leftBrack |
] rightBrack |
{ leftBrace |
} rightBrace |
, comma |
' singleQuote |
" doubleQuote |
: colon |
stat1: Stat ->
stat2: Stat -> Expr
stat3: Stat -> ident = Expr
expr1: Expr -> number
expr2: Expr -> ident
expr3: Expr -> Expr + Expr
expr4: Expr -> Expr - Expr
expr5: Expr -> Expr * Expr
expr6: Expr -> Expr / Expr
expr7: Expr -> - Expr
expr8: Expr -> ( Expr )
Copyright (c) 2005–2025 IKH Software, Inc.
Released under the terms of the GNU Lesser General Public License, version 3.0 or (at your option) any later version (LGPL-3.0-or-later).
parlex — core support libraryarena-terms-parser — real-world example using ALEX and ASLR