Crates.io | manger |
lib.rs | manger |
version | 0.1.1 |
source | src |
created_at | 2020-12-30 14:59:54.097738 |
updated_at | 2020-12-30 15:05:16.117552 |
description | A performant, low-level, lightweight and intuitive combinatoric parser library |
homepage | https://github.com/coastalwhite/manger |
repository | https://github.com/coastalwhite/manger |
max_upload_size | |
id | 329273 |
size | 90,104 |
Manger allows for translation of the intuition developed for Rust's primitive and standard
library types into your intuition for using this library. Most of the behaviour is defined with
the Consumable
trait, which can be easily implemented using the consume_struct
and
consume_enum
macros.
This library is suited for deterministic regular languages. It is optimally used in addition to a predefined syntax. For example, if you have a predefined EBNF, it is really easy to implement the syntax within this crate.
To get started with implementing Consumable
on your own traits, I suggest taking a look at
the consume_struct
or consume_enum
documentation. Then you can come back here and look
at some common patterns.
Parsing and thus consuming has a lot of often used patterns. Ofcourse, these are very easily available here aswell.
Often we want to express that two patterns follow eachother in a source
string. For example,
you might want to express that every Line
is followed by a ';'
. In manger there are two
ways to do this.
The first way, and the preferred way, is with the consume_struct
or consume_enum
macros
where you can present sequential consume instructions. You can see in the following example that
we are first consuming a '('
, followed by a i32
,
followed by a closing ')'
.
use manger::{ Consumable, consume_struct };
struct EncasedInteger(i32);
consume_struct!(
EncasedInteger => [
> '(',
value: i32,
> ')';
(value)
]
);
Another way to represent the same concept is with the tuple type syntax. This can be done with
up to 10 types. Here we are again parsing the same (i32)
structure.
use manger::chars;
type EncasedInteger = (chars::OpenParenthese, i32, chars::CloseParenthese);
Most of the time you want to represent some kind of repetition. There are a lot of different way to represent repetition. Here there are two easy ways.
The easiest way to do repetition is with the Vec<T>
.
This will consume 0 or
more instances of type T
. Ofcourse, the type T
has have has Consumable
implemented.
Here you can see how what that looks like:
Since
Vec<T>
will consume instances of typeT
until it finds a error, it can never fail itself. You are therefore safe to unwrap the result.
use manger::{ Consumable, consume_struct };
struct EncasedInteger(i32);
consume_struct!(
EncasedInteger => [
> '[',
value: i32,
> ']';
(value)
]
);
let source = "[3][-4][5]";
let (encased_integers, _) = <Vec<EncasedInteger>>::consume_from(source)?;
let sum: i32 = encased_integers
.iter()
.map(|EncasedInteger(value)| value)
.sum();
assert_eq!(sum, 4);
The other easy way to do repetition is with OneOrMore<T>
. This allows for
consuming 1 or more instances of type T
. And again, type T
has to have Consumable
implemented. Here you can see what that looks like:
use manger::{ Consumable, consume_struct };
use manger::common::OneOrMore;
struct EncasedInteger(i32);
consume_struct!(
EncasedInteger => [
> '[',
value: i32,
> ']';
(value)
]
);
let source = "[3][-4][5]";
let (encased_integers, _) = <OneOrMore<EncasedInteger>>::consume_from(source)?;
let product: i32 = encased_integers
.into_iter()
.map(|EncasedInteger(value)| value)
.product();
assert_eq!(product, -60);
To express optional values you can use the Option<T>
standard rust
type. This will consume either 0 or 1 of type T
.
Since
Option<T>
will consume a instance of typeT
if it finds no error, it can never fail itself. You are therefore safe to unwrap the result.
use manger::consume_struct;
use manger::chars;
struct PossiblyEncasedInteger(i32);
consume_struct!(
PossiblyEncasedInteger => [
: Option<chars::OpenParenthese>,
value: i32,
: Option<chars::CloseParenthese>;
(value)
]
);
Another common pattern seen within combinatoric parsers is recursion. Since rust types need to
have a predefined since, we cannot do direct type recursion and we need to do heap allocation
with the Box<T>
type from the standard library. We can make a prefixed
math expression parser as followed:
use manger::consume_enum;
use manger::common::{OneOrMore, Whitespace};
enum Expression {
Times(Box<Expression>, Box<Expression>),
Plus(Box<Expression>, Box<Expression>),
Constant(u32),
}
consume_enum!(
Expression {
Times => [
> '*',
: OneOrMore<Whitespace>,
left: Box<Expression>,
: OneOrMore<Whitespace>,
right: Box<Expression>;
(left, right)
],
Plus => [
> '+',
: OneOrMore<Whitespace>,
left: Box<Expression>,
: OneOrMore<Whitespace>,
right: Box<Expression>;
(left, right)
],
Constant => [
value: u32;
(value)
]
}
);
For whitespace we can use the manger::common::Whitespace
struct. This will consume any
utf-8 character that is identified as a whitespace character by the char::is_whitespace
function.
If two possibilities are present for consuming there are two options to choose from. Both are valid in certain scenarios.
Using the consume_enum
you can create an struct which can be consuming in a number of
options and you can see which option was selected. If you need to see which of the different
options was selected, this should be your choice.
You can also use the Either<L, R>
type to represent the either
relationship. This option is preferred if we do not care about which option is selected.
See the open issues for a list of proposed features (and known issues).
Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.
git checkout -b feature/AmazingFeature
)git commit -m 'Add some AmazingFeature'
)git push origin feature/AmazingFeature
)Distributed under the MIT License. See LICENSE
for more information.