## Installation ``` [dependencies] parsable = "0.1" ``` ## Example Implementation of a basic operation interpreter that only works with positive integer and without operator priorities. ```rust use parsable::{parsable, Parsable, ParseOptions}; #[parsable] enum Operator { Plus = "+", Minus = "-", Mult = "*", Div = "/", Mod = "%" } #[parsable] struct NumberLiteral { #[parsable(regex=r"\d+")] value: String } impl NumberLiteral { fn process(&self) -> i32 { self.value.parse().unwrap() } } #[parsable] enum Operand { Number(NumberLiteral), Wrapped(WrappedOperation) } impl Operand { fn process(&self) -> i32 { match self { Operand::Number(number) => number.process(), Operand::Wrapped(wrapped) => wrapped.process(), } } } #[parsable] struct Operation { first_operand: Operand, other_operands: Vec<(Operator, Operand)> } impl Operation { fn process(&self) -> i32 { let mut result = self.first_operand.process(); for (operator, operand) in &self.other_operands { let value = operand.process(); result = match operator { Operator::Plus => result + value, Operator::Minus => result - value, Operator::Mult => result * value, Operator::Div => result / value, Operator::Mod => result % value, } } result } } #[parsable] struct WrappedOperation { #[parsable(brackets="()")] operation: Box<Operation> } impl WrappedOperation { fn process(&self) -> i32 { self.operation.process() } } fn main() { let operation_string = "3 + (4 * 5)".to_string(); let parse_options = ParseOptions::default(); match Operation::parse(operation_string, parse_options) { Ok(operation) => { println!("result: {}", operation.process()); }, Err(error) => { dbg!(error); } } } ``` ## The `#[parsable]` macro Tagging a struct or enum with the `#[parsable]` macro implements the `Parsable` trait for the item, with the condition that all fields must also implement the `Parsable` trait. It can also be applied on a field to tweak the way it is parsed. ### Struct - All fields are parsed one after the other. The parsing is only successful if all fields are succesfully parsed. ### Enum - The parsing stops on the first variant that is successfully parsed. - If a variant contains multiple fields, they are parsed successively and must all be successful for the variant to be matched. - If a variant contains no field, a string must be specified to indicate how to parse it. ```rust #[parsable] enum MyOperation { BinaryOperation(NumerLiteral, Operator, NumerLiteral), Number(NumberLiteral), Zero = "zero" } // If the first two variants are swapped, the parsing will never reach the `BinaryOperation` variant. ``` ## Builtin types ### `String` A string field must be tagged with the `#[parsable(regex="<pattern>")]` or `#[parsable(value="<string>")]` macro option to specify how to parse it. ```rust // Matches at least one digit #[parsable] struct NumberLiteral { #[parsable(regex=r"\d+")] value: String } ``` ```rust #[parsable] // Only matches the string "+" struct PlusSign { #[parsable(value="+")] value: String } ``` ### `Option<T>` Matches `T`. If it fails, returns `None` but the parsing of the field is still considered successful. ```rust #[parsable] enum Sign { Plus = "+", Minus = "-" } // Matches a number with an optional sign. #[parsable] struct NumberLiteral { sign: Option<Sign>, #[parsable(regex=r"\d+")] value: String } ``` ### `Vec<T>` Matches as many `T` as possible successively. The following options can be specified: - `min=X`: the parsing is only valid if at least X items are parsed - `separator=<string>`: after each item, the parser will attempt to consume the separator. The parsing fails if no separator is found. ```rust // Matches a non-empty list of numbers separated by a comma #[parsable] struct NumberList { #[parsable(separator=",", min=1)] numbers: Vec<NumberLiteral> } ``` ### Other types - `()`: matches nothing, is always successful. - `(T, U)`: matches `T`, then `U`. - `Box<T>`: matches `T`. ## Running the parser The `Parsable` trait provides the `parse()` method that takes two arguments: - `content: String`: the string to parse - `options: ParseOptions`: parse options The `ParseOptions` type has the following fields: - `comment_start: Option<&'static str>`: when the specified pattern is matched, the rest of the line is ignored. Common instances are `"//"` or `"#"`. - `file_path: Option<String>`: file path of the string being parsed. - `package_root_path: Option<String>`: root path of package or module containing the file being parsed. The `file_path` and `package_root_path` fields are forwarded to the `FileInfo` struct and are never actually used by the library. Blank characters (spaces, new lines and tabulations) are always ignored during parsing. ## FileInfo The `FileInfo` structure is used accross the library. It has the following fields: - `content: String`: the string being parsed - `path: String`: the path of the file being parsed, as specified in `ParseOptions` - `package_root_path: String`: the path of the package containing the file, as specified in `ParseOptions` It also provides the following methods: - `get_line_col(index: usize) -> Option<(usize, usize)>`: returns the line and column numbers (starting at 1) associated with the specified character index. This method assumes 1 character per byte and therefore does not work properly when the file contains non-ascii characters. ## ItemLocation Tagging a struct with `#[parsable]` adds a `location` field of type `ItemLocation` with the following fields & methods: - `file: Rc<FileInfo>`: information on the file containing the item - `start: usize`: starting index of the item in the file - `end: usize`: ending index of the item in the file - `get_start_line_col() -> (usize, usize)`: get the line and column numbers (starting at 1) of the location start The `Parsable` also trait provides a `location()` method: - on a structure, it returns its `location` field - on an enum, it returns the `location()` method of the variant that was matched - calling `location()` on a variant with no field panics A way to prevent the panic is to wrap enums with unit variants in a structure: ```rust #[parsable] enum Operator { Plus = "+", Minus = "-", Mult = "*", Div = "/", Mod = "%" } #[parsable] struct WrappedOperator { operator: Operator } fn main() { let string = "+".to_string(); let options = ParseOptions::default(); let result = WrappedOperator::parse(string, options).unwrap(); dbg!(result.location()); // It works! } ``` ## ParseError On failure, `Parsable::parse()` returns `Err(ParseError)`. This structure has the following fields: - `file: Rc<FileInfo>`: the file where the error occured. - `index: usize`: the index at which the error occured. - `expected: Vec<String>`: a list of item names that where expected at this index. ## Macro options ### Root attributes - `located=<bool>`: on a structure, indicates whether or not the `location` field should be generated. Default: `true`. - `cascade=<bool>`: if `true` on a structure, indicates that if an `Option` field is not matched, then the parser should not attempt to match other `Option` fields. It does not invalidate the overall struct parsing. Default: `false`. - `name=<string>`: indicates the name of the struct or enum, which is used in when a parsing error occurs. Default: the name of the struct or enum. ```rust #[parsable(located=false)] // The `location` field will not be added struct Operation { first_operand: Operand, other_operands: Vec<(Operator, Operand)> } ``` ### Field attributes - `prefix=<string>`: attempt to parse the specified string before parsing the field. If the prefix parsing fails, then the field parsing fails. - `suffix=<string>`: attempt to parse the specified string after parsing the field. If the suffix parsing fails, then the field parsing fails. - `brackets=<string>`: shortcut to specify both a prefix and a suffix using the first two characters of the specified string. - `exclude=<string>`: indicates that the parsing is only valid if the item does not match the specified regex - `followed_by=<string>`: indicates that the parsing if only valid if the item is followed by the specified regex. - `not_followed_by=<string>`: indicates that the parsing if only valid if the item is not followed by the specified regex. - `value=<string>`: on a `String` field, indicates that the field only matches the specified string. - `regex=<string>`: on a `String` field, indicates that the field only matches the regex with the specified pattern (using the [`regex`](https://docs.rs/regex/latest/regex/) crate). - `separator=<string>`: on a `Vec` field, specify the separator between items. - `min=<integer>`: on a `Vec` field, specify the minimum amount of items for the parsing to be valid. - `cascade=false`: indicates that this field ignore the root `cascade` option ## Manually implementing the `Parsable` trait Sometimes `#[parsable]` is not enough and you want to implement your own parsing mechanism. This is done by implementing the `parse_item`, `get_item_name` and `location` methods. ```rust use parsable::{Parsable, StringReader}; struct MyInteger { value: u32, location: ItemLocation, } impl Parsable for MyInteger { fn parse_item(reader: &mut StringReader) -> Option<Self> { let start = reader.get_index(); match reader.read_regex(r"\d+") { Some(string) => Some(MyInteger { value: string.parse().unwrap(), location: reader.get_item_location(start), }), None => None, } } // Only used in errors fn get_item_name() -> String { "integer".to_string() } // Not required, but convenient fn location(&self) -> &ItemLocation { &self.location } } fn main() { let number_string = "56"; let number = MyInteger::parse(number_string.to_string(), ParseOptions::default()).unwrap(); println!("{}", number.value); } ``` `StringReader` wraps the string being parsed with an index that increases as the parsing goes on. It has the following methods: - `content() -> &str`: returns the whole string - `get_index() -> usize`: returns the current index in the string - `set_index(index: usize) -> usize`: set the current index in the string - `as_str() -> &str`: returns the part of the string that has not been parsed yet (same as `&self.content()[self.get_index()..]`) - `as_char() -> char`: returns the current character (same as `&self.content().as_bytes()[self.get_index()]`) - `is_finished() -> bool`: indicates whether the end of the string has been reached - `advance(length: usize) -> Option<&str>`: advance the current index by `length` and returns the corresponsing substring. If `length` is `0`, returns `None` - `eat_spaces()`: advance the current index until a non-blank and non-comment character is reached - `read_string(string: &str) -> Option<&str>`: if the string starts with `string`, advance the current index by `string`'s length and returns it, otherwise returns `None` - `read_regex(pattern: &'static str) -> Option<&str>`: if the string starts with the specified regex pattern, advance the current index the parsed string'length and returns it, otherwise returns `None` - `peek_regex(pattern: &'static str) -> bool`: indicates if the string starts with the specified regex pattern, without advancing the current index If `parse_item` returns `None`, it must ensure that the index is the same when the function exits as it was when it started. ## License MIT