Crates.io | aott |
lib.rs | aott |
version | 0.2.0 |
source | src |
created_at | 2023-08-23 13:21:43.553075 |
updated_at | 2023-08-31 11:56:05.656912 |
description | Aspect Of The Tokens - a parser combinator framework. |
homepage | |
repository | https://github.com/Implodent/AOTT |
max_upload_size | |
id | 952024 |
size | 152,029 |
AOTT is a parser combinator framework - a library that contains utilities for painlessly creating parsers.
It also includes a few built-in utilities,
like parsers for working with text (see text
module; feature gated under flag builtin-text
, enabled by default),
and several handy decoders for working with bytes (see bytes
module; feature gated under flag builtin-bytes
, not enabled by default).
See examples/brainfuck.rs
for a full interpreter (cargo run --example brainfuck - examples/sample.bf
).
use aott::prelude::*;
#[derive(Clone, Debug)]
enum Instruction {
Left,
Right,
Increment,
Decrement,
Read,
Write,
Loop(Vec<Self>),
}
#[parser]
fn parse(input: &str) -> Vec<Instruction> {
choice((
// Basic instructions are just single characters!
just('<').to(Instruction::Left),
just('>').to(Instruction::Right),
just('+').to(Instruction::Increment),
just('-').to(Instruction::Decrement),
just(',').to(Instruction::Read),
just('.').to(Instruction::Write),
// recursion is easy: just put in the function as is!
delimited(just('['), parse, just(']')).map(Instruction::Loop),
))
// Brainfuck is sequential, so we parse as many instructions as is possible
.repeated()
.parse(input)
}
Parser combinators are a technique for implementing parser by defining them in terms of other parsers.
That means you construct smaller parsers, for example, for a string, and then make bigger and bigger parsers, like expressions, then statements, then functions and so on and so on. This way of creating parsers (from small to big) is called the Recursive Descent strategy.
Parser combinators are kind of similar to Rust's Iterator
trait to define the parsing algorithm: the type-driven API of Iterator
makes it much more difficult to make mistakes, and much easier to encode complicated iteration logic than if one were to write the same code by hand.
The same is true for parser combinators.
Writing parsers with good error recovery is made exponentially more difficult the more features you add into the parser. It requires understanding intricacies of the recursive descent algorithm, and then implementing recovery strategies on top of that.
If you are developing, say, a programming language, or even just starting out, you will find yourself making a lot of small changes very often. And if you're writing a parser by hand, there will be a slow and painful step of refactoring your parser for these changes.
Parser combinators solve both problems by providing an ergonomic API that allows for rapidly iterating upon a syntax.
Parser combinators are also a great fit for domain-specific languages (DSLs), for which there is no existing parser. Writing a reliable, fault-tolerant parser for such situations can go from being a weeks-long task to a half-hour long task, if you employ a decent parser combinator library.
AOTT right now doesn't have error recovery, but it will soon have. There is already some code in place for recovery, but it is not used, nor documented, nor tested.
AOTT is designed to be compatible with chumsky, so existing chumsky parsers will work / work with the minimum amount of changes (like removing lifetimes... "ME HATE LIFETIME ARGUMENT! LIFETIME ARGUMENT BAD!").
A list of regexes to fix the parsers will be provided never soon.
About those lifetime arguments. I wanted to make a near lifetimeless chumsky
with functions as the primary unit of parsing (like in nom!), and after hours of pain and agony hard work I have this. A near lifetimeless chumsky with functions. Yay.
Aspect Of The Tokens is licensed under the MIT license (see the LICENSE
file in the root of the repository)