Crates.io | parsable |
lib.rs | parsable |
version | 1.0.0 |
source | src |
created_at | 2022-05-23 09:52:00.849109 |
updated_at | 2024-10-27 11:48:03.033278 |
description | A trait to easily parse data structures. |
homepage | |
repository | https://github.com/symil/parsable |
max_upload_size | |
id | 591686 |
size | 35,548 |
[dependencies]
parsable = "0.1"
Implementation of a basic operation interpreter that only works with positive integer and without operator priorities.
use parsable::{parsable, Parsable, ParseOptions};
#[parsable]
enum Operator {
Plus = "+",
Minus = "-",
Mult = "*",
Div = "/",
Mod = "%"
}
#[parsable]
struct NumberLiteral {
#[parsable(regex=r"\d+")]
value: String
}
impl NumberLiteral {
fn process(&self) -> i32 {
self.value.parse().unwrap()
}
}
#[parsable]
enum Operand {
Number(NumberLiteral),
Wrapped(WrappedOperation)
}
impl Operand {
fn process(&self) -> i32 {
match self {
Operand::Number(number) => number.process(),
Operand::Wrapped(wrapped) => wrapped.process(),
}
}
}
#[parsable]
struct Operation {
first_operand: Operand,
other_operands: Vec<(Operator, Operand)>
}
impl Operation {
fn process(&self) -> i32 {
let mut result = self.first_operand.process();
for (operator, operand) in &self.other_operands {
let value = operand.process();
result = match operator {
Operator::Plus => result + value,
Operator::Minus => result - value,
Operator::Mult => result * value,
Operator::Div => result / value,
Operator::Mod => result % value,
}
}
result
}
}
#[parsable]
struct WrappedOperation {
#[parsable(brackets="()")]
operation: Box<Operation>
}
impl WrappedOperation {
fn process(&self) -> i32 {
self.operation.process()
}
}
fn main() {
let operation_string = "3 + (4 * 5)".to_string();
let parse_options = ParseOptions::default();
match Operation::parse(operation_string, parse_options) {
Ok(operation) => {
println!("result: {}", operation.process());
},
Err(error) => {
dbg!(error);
}
}
}
#[parsable]
macroTagging a struct or enum with the #[parsable]
macro implements the Parsable
trait for the item, with the condition that all fields must also implement the Parsable
trait.
It can also be applied on a field to tweak the way it is parsed.
#[parsable]
enum MyOperation {
BinaryOperation(NumerLiteral, Operator, NumerLiteral),
Number(NumberLiteral),
Zero = "zero"
}
// If the first two variants are swapped, the parsing will never reach the `BinaryOperation` variant.
String
A string field must be tagged with the #[parsable(regex="<pattern>")]
or #[parsable(value="<string>")]
macro option to specify how to parse it.
// Matches at least one digit
#[parsable]
struct NumberLiteral {
#[parsable(regex=r"\d+")]
value: String
}
#[parsable]
// Only matches the string "+"
struct PlusSign {
#[parsable(value="+")]
value: String
}
Option<T>
Matches T
. If it fails, returns None
but the parsing of the field is still considered successful.
#[parsable]
enum Sign {
Plus = "+",
Minus = "-"
}
// Matches a number with an optional sign.
#[parsable]
struct NumberLiteral {
sign: Option<Sign>,
#[parsable(regex=r"\d+")]
value: String
}
Vec<T>
Matches as many T
as possible successively. The following options can be specified:
min=X
: the parsing is only valid if at least X items are parsedseparator=<string>
: after each item, the parser will attempt to consume the separator. The parsing fails if no separator is found.// Matches a non-empty list of numbers separated by a comma
#[parsable]
struct NumberList {
#[parsable(separator=",", min=1)]
numbers: Vec<NumberLiteral>
}
()
: matches nothing, is always successful.(T, U)
: matches T
, then U
.Box<T>
: matches T
.The Parsable
trait provides the parse()
method that takes two arguments:
content: String
: the string to parseoptions: ParseOptions
: parse optionsThe ParseOptions
type has the following fields:
comment_start: Option<&'static str>
: when the specified pattern is matched, the rest of the line is ignored. Common instances are "//"
or "#"
.file_path: Option<String>
: file path of the string being parsed.package_root_path: Option<String>
: root path of package or module containing the file being parsed.The file_path
and package_root_path
fields are forwarded to the FileInfo
struct and are never actually used by the library.
Blank characters (spaces, new lines and tabulations) are always ignored during parsing.
The FileInfo
structure is used accross the library. It has the following fields:
content: String
: the string being parsedpath: String
: the path of the file being parsed, as specified in ParseOptions
package_root_path: String
: the path of the package containing the file, as specified in ParseOptions
It also provides the following methods:
get_line_col(index: usize) -> Option<(usize, usize)>
: returns the line and column numbers (starting at 1) associated with the specified character index. This method assumes 1 character per byte and therefore does not work properly when the file contains non-ascii characters.Tagging a struct with #[parsable]
adds a location
field of type ItemLocation
with the following fields & methods:
file: Rc<FileInfo>
: information on the file containing the itemstart: usize
: starting index of the item in the fileend: usize
: ending index of the item in the fileget_start_line_col() -> (usize, usize)
: get the line and column numbers (starting at 1) of the location startThe Parsable
also trait provides a location()
method:
location
fieldlocation()
method of the variant that was matchedlocation()
on a variant with no field panicsA way to prevent the panic is to wrap enums with unit variants in a structure:
#[parsable]
enum Operator {
Plus = "+",
Minus = "-",
Mult = "*",
Div = "/",
Mod = "%"
}
#[parsable]
struct WrappedOperator {
operator: Operator
}
fn main() {
let string = "+".to_string();
let options = ParseOptions::default();
let result = WrappedOperator::parse(string, options).unwrap();
dbg!(result.location()); // It works!
}
On failure, Parsable::parse()
returns Err(ParseError)
. This structure has the following fields:
file: Rc<FileInfo>
: the file where the error occured.index: usize
: the index at which the error occured.expected: Vec<String>
: a list of item names that where expected at this index.located=<bool>
: on a structure, indicates whether or not the location
field should be generated. Default: true
.cascade=<bool>
: if true
on a structure, indicates that if an Option
field is not matched, then the parser should not attempt to match other Option
fields. It does not invalidate the overall struct parsing. Default: false
.name=<string>
: indicates the name of the struct or enum, which is used in when a parsing error occurs. Default: the name of the struct or enum.#[parsable(located=false)] // The `location` field will not be added
struct Operation {
first_operand: Operand,
other_operands: Vec<(Operator, Operand)>
}
prefix=<string>
: attempt to parse the specified string before parsing the field. If the prefix parsing fails, then the field parsing fails.suffix=<string>
: attempt to parse the specified string after parsing the field. If the suffix parsing fails, then the field parsing fails.brackets=<string>
: shortcut to specify both a prefix and a suffix using the first two characters of the specified string.exclude=<string>
: indicates that the parsing is only valid if the item does not match the specified regexfollowed_by=<string>
: indicates that the parsing if only valid if the item is followed by the specified regex.not_followed_by=<string>
: indicates that the parsing if only valid if the item is not followed by the specified regex.value=<string>
: on a String
field, indicates that the field only matches the specified string.regex=<string>
: on a String
field, indicates that the field only matches the regex with the specified pattern (using the regex
crate).separator=<string>
: on a Vec
field, specify the separator between items.min=<integer>
: on a Vec
field, specify the minimum amount of items for the parsing to be valid.cascade=false
: indicates that this field ignore the root cascade
optionParsable
traitSometimes #[parsable]
is not enough and you want to implement your own parsing mechanism. This is done by implementing the parse_item
, get_item_name
and location
methods.
use parsable::{Parsable, StringReader};
struct MyInteger {
value: u32,
location: ItemLocation,
}
impl Parsable for MyInteger {
fn parse_item(reader: &mut StringReader) -> Option<Self> {
let start = reader.get_index();
match reader.read_regex(r"\d+") {
Some(string) => Some(MyInteger {
value: string.parse().unwrap(),
location: reader.get_item_location(start),
}),
None => None,
}
}
// Only used in errors
fn get_item_name() -> String {
"integer".to_string()
}
// Not required, but convenient
fn location(&self) -> &ItemLocation {
&self.location
}
}
fn main() {
let number_string = "56";
let number = MyInteger::parse(number_string.to_string(), ParseOptions::default()).unwrap();
println!("{}", number.value);
}
StringReader
wraps the string being parsed with an index that increases as the parsing goes on. It has the following methods:
content() -> &str
: returns the whole stringget_index() -> usize
: returns the current index in the stringset_index(index: usize) -> usize
: set the current index in the stringas_str() -> &str
: returns the part of the string that has not been parsed yet (same as &self.content()[self.get_index()..]
)as_char() -> char
: returns the current character (same as &self.content().as_bytes()[self.get_index()]
)is_finished() -> bool
: indicates whether the end of the string has been reachedadvance(length: usize) -> Option<&str>
: advance the current index by length
and returns the corresponsing substring. If length
is 0
, returns None
eat_spaces()
: advance the current index until a non-blank and non-comment character is reachedread_string(string: &str) -> Option<&str>
: if the string starts with string
, advance the current index by string
's length and returns it, otherwise returns None
read_regex(pattern: &'static str) -> Option<&str>
: if the string starts with the specified regex pattern, advance the current index the parsed string'length and returns it, otherwise returns None
peek_regex(pattern: &'static str) -> bool
: indicates if the string starts with the specified regex pattern, without advancing the current indexIf parse_item
returns None
, it must ensure that the index is the same when the function exits as it was when it started.
MIT