Crates.io | lrpeg |
lib.rs | lrpeg |
version | 0.4.1 |
source | src |
created_at | 2021-03-27 20:26:17.452248 |
updated_at | 2021-10-11 17:37:36.474583 |
description | Left recursive PEG parser generator |
homepage | |
repository | https://github.com/seanyoung/lrpeg |
max_upload_size | |
id | 374398 |
size | 73,324 |
lrpeg allows left recursive rules, and uses ratpack parsing for speed. I wrote a blog post to introduce the ideas of lrpeg.
The existing PEG parser generators for rust do not allow left recursion, which makes it very awkward to write grammars. It is possible to write a PEG parser generator which allows for left recursion, just as python now uses.
See IRP Grammar for a complete lrpeg grammar and handling for IRP.
Add lrpeg to your Cargo.toml in build-dependencies:
[build-dependencies]
lrpeg = "0"
[dependencies]
regex = "1"
unicode-xid = "0.2"
Now add a build.rs
to the root of your project, containing:
use std::env;
use std::path::PathBuf;
fn main() {
let out_dir = env::var("OUT_DIR").unwrap();
lrpeg::process_files(&PathBuf::from("src"), &PathBuf::from(out_dir));
}
Write your peg grammar, and put it in a file which ends with .peg
, for example
src/calculator.peg:
calculator <- expr EOI;
expr <- expr ("+" / "-") WHITESPACE term
/ term;
term <- term ("*" / "/" / "%") WHITESPACE factor
/ factor;
factor <- "(" WHITESPACE expr ")" WHITESPACE
/ num;
num <- re#[0-9]+# WHITESPACE;
When your run cargo build
, calculator.rs
will be generated into your target/...
directory. You need
to include this module in your project, and then you can instantiate the PEG like so:
include!(concat!(env!("OUT_DIR"), "/calculator.rs"));
use calculator::{Node, Rule};
fn main() {
let mut parser = calculator::PEG::new();
let args: Vec<String> = std::env::args().collect();
if args.len() != 2 {
eprintln!("Usage {} EXPRESSION", &args[0]);
std::process::exit(2);
}
let input = &args[1];
println!("parsing: {}", input);
match parser.parse(input) {
Ok(node) => {
fn walk(node: &Node, input: &str) -> u64 {
match node.rule {
Rule::num => u64::from_str_radix(node.children[0].as_str(input), 10).unwrap(),
Rule::expr => {
if node.alternative == Some(0) {
let left = walk(&node.children[0], input);
let right = walk(&node.children[3], input);
match node.children[1].as_str(input) {
"+" => left + right,
"-" => left - right,
_ => unreachable!(),
}
} else {
walk(&node.children[0], input)
}
}
Rule::term => {
if node.alternative == Some(0) {
let left = walk(&node.children[0], input);
let right = walk(&node.children[3], input);
match node.children[1].as_str(input) {
"*" => left * right,
"/" => left / right,
"%" => left % right,
_ => unreachable!(),
}
} else {
walk(&node.children[0], input)
}
}
Rule::factor => {
if node.alternative == Some(0) {
walk(&node.children[2], input)
} else {
walk(&node.children[0], input)
}
}
Rule::calculator => walk(&node.children[0], input),
_ => {
unreachable!()
}
}
}
println!("result: {}", walk(&node, input));
}
Err((line_no, col_no)) => {
eprintln!("parser error at {}:{}", line_no, col_no);
}
}
}
This example is available in lrpeg-example.
PEG grammars are a set of rules. In lrpeg, each rule must end with a ";". Parsing starts at the top rule.
<-
and then the terms and terminated by ;
*
, +
or optional ?
.re#[0-9]+#
./
. For example foo <- "a" / "b";
WHITESPACE
, EOI
, and XID_IDENTIFIER
(for unicode identifiers)The file src/peg.peg
contains the grammar for the peg itself. To generate the parser, run cargo run src/peg.peg > src/peg.rs.new
and then mv src/peg.rs.new src/peg.rs
.