peggen-impl

Crates.iopeggen-impl
lib.rspeggen-impl
version0.2.6
sourcesrc
created_at2024-08-16 00:03:30.274409
updated_at2024-09-22 14:50:23.952389
descriptionGenerate recursive-descent & precedence climbing parsers. (extra implementations)
homepage
repositoryhttps://github.com/Y-jiji/peggen
max_upload_size
id1339769
size6,623
Andy Yang (Y-jiji)

documentation

README

Peggen

A parser generator for parsing expression grammar (PEG) that use inline macros to specify PEG operations.

How is it different from (...)?

/ Conceptual User Experience Performance Error Handling
PEST PEST only annotates text.
Peggen generates AST directly from your text.
In most cases, you still want rust enums for your AST, which is directly provided by Peggen, but you have to manually create enums from PEST rules. PEST use an optimizer to memorize your grammar rules, and use memorization for better performance; Peggen doesn't use memorization, arguably this gives better performance over memorization for most grammars. /
Chumsky Chumsky provides parser combinators. Peggen is a parser generator. Both Chumsky and Peggen provides ast directly. However, Peggen supports arena allocation. Chumsky deallocates successful sub-rules when a rule fails; Peggen uses a internal representation to eliminate deallocation. /
LALRPOP Peggen is PEG-based; LALRPOP uses LR(1) grammar. Peggen is more intuitive to use than LALRPOP; LR(1) grammar is hard to extend and debug. LALRPOP has better performance over Peggen. LR(1) grammar can report errors far away from normally percepted cause; Peggen allows you to capture errors from customary cause.

Performance

I roughly tested the peggen on a sample json file against chumsky.

CPU Model: Intel(R) Core(TM) i7-14700HX

Suprisingly, Peggen is faster than Chumsky.

Here are some numbers:

  • Peggen : 867913 ns/iter
  • Chumsky: 1555256 ns/iter

Example: Json Parser

You can write a json parser in the following several lines:

#[derive(Debug, ParseImpl, Space, Num, EnumAstImpl)]
pub enum Json {
    #[rule(r"null")]
    Null,
    #[rule(r"{0:`false|true`}")]
    Bool(bool),
    #[rule(r"{0:`-?(0|[1-9][0-9]*)\.([0-9]+)`}")]
    Flt(f32),
    #[rule("{0:`0|-?[1-9][0-9]*`}")]
    Num(i32),
    #[rule(r#""{0:`[^"]*`}""#)]
    Str(String),
    #[rule(r#"\{ [*0: "{0:`[^"]*`}" : {1} , ][?0: "{0:`[^"]*`}" : {1} ] \}"#)]
    Obj(RVec<(String, Json)>),
    #[rule(r"\[ [*0: {0} , ][?0: {0} ] \]")]
    Arr(RVec<Json>)
}

Roadmap

  • Optimizations:
    • Rule dispatch: filter rules by the first symbol, instead of trying each of them.
    • Thinner tag: currently each tag in internal representation is 3-pointers wide, I want to make them thinner.
  • Error Handling:
    • Custom error handlers when error handlers fail.
Commit count: 96

cargo fmt