# Peggen
A parser generator for parsing expression grammar (PEG) that use inline macros to specify PEG operations.
## How is it different from (...)?
| / | Conceptual | User Experience | Performance | Error Handling |
| ---- | ---------- | --------------- | ----------- | -------------- |
| [PEST](https://pest.rs) | PEST only annotates text.
Peggen generates AST directly from your text. | In most cases, you still want rust `enum`s for your AST, which is directly provided by **Peggen**, but you have to manually create `enums` from **PEST** rules. | **PEST** use an optimizer to memorize your grammar rules, and use memorization for better performance; **Peggen** doesn't use memorization, arguably this gives better performance over memorization for most grammars. | / |
| [Chumsky](https://crates.io/crates/chumsky) | **Chumsky** provides parser combinators. **Peggen** is a parser generator. | Both **Chumsky** and **Peggen** provides ast directly. However, **Peggen** supports arena allocation. | **Chumsky** deallocates successful sub-rules when a rule fails; **Peggen** uses a internal representation to eliminate deallocation. | / |
| [LALRPOP](https://lalrpop.github.io/lalrpop) | **Peggen** is PEG-based; **LALRPOP** uses **LR(1)** grammar. | **Peggen** is more intuitive to use than **LALRPOP**; **LR(1)** grammar is hard to extend and debug. | **LALRPOP** has better performance over **Peggen**. | **LR(1)** grammar can report errors far away from normally percepted cause; Peggen allows you to capture errors from customary cause. |
## Performance
I roughly tested the peggen on a sample json file against chumsky.
CPU Model: Intel(R) Core(TM) i7-14700HX
Suprisingly, Peggen is faster than Chumsky.
Here are some numbers:
- Peggen : 867913 ns/iter
- Chumsky: 1555256 ns/iter
## Example: Json Parser
You can write a json parser in the following several lines:
```rust
#[derive(Debug, ParseImpl, Space, Num, EnumAstImpl)]
pub enum Json {
#[rule(r"null")]
Null,
#[rule(r"{0:`false|true`}")]
Bool(bool),
#[rule(r"{0:`-?(0|[1-9][0-9]*)\.([0-9]+)`}")]
Flt(f32),
#[rule("{0:`0|-?[1-9][0-9]*`}")]
Num(i32),
#[rule(r#""{0:`[^"]*`}""#)]
Str(String),
#[rule(r#"\{ [*0: "{0:`[^"]*`}" : {1} , ][?0: "{0:`[^"]*`}" : {1} ] \}"#)]
Obj(RVec<(String, Json)>),
#[rule(r"\[ [*0: {0} , ][?0: {0} ] \]")]
Arr(RVec)
}
```
## Roadmap
- Optimizations:
- Rule dispatch: filter rules by the first symbol, instead of trying each of them.
- Thinner tag: currently each tag in internal representation is 3-pointers wide, I want to make them thinner.
- Error Handling:
- Custom error handlers when error handlers fail.