| Crates.io | sipp |
| lib.rs | sipp |
| version | 0.2.1 |
| created_at | 2024-02-19 22:31:00.682006+00 |
| updated_at | 2025-01-05 16:36:10.643521+00 |
| description | Simple parser package |
| homepage | |
| repository | https://codeberg.org/Bobulous/sipp |
| max_upload_size | |
| id | 1145749 |
| size | 152,726 |
This package provides a Parser which allows you to peek at characters to see what's coming
next in a character stream, and to read expected characters. For example, methods include:
peek() - return the next character from the stream without removing it;require(&str) - returns an error if the given sequence is not found next in the stream;skip_while(Fn(char)->bool) - keep removing characters while the predicate is satisfied;read_up_to(char) - take characters from the stream up to (but not including) the given
character;accept(char) - skips the given character if it is found next in the stream.All fallible methods return a Result and no method in this package should ever panic.
Parsing relies on a ByteBuffer which wraps around a byte stream, and on a decoder such as
Utf8Decoder which wraps around a ByteBuffer and decodes bytes into characters. Finally a
Parser is created by wrapping a decoder. The end result is a Parser which lets you peek and
read characters.
Suppose you want to parse a (simplified) set of Acornsoft Logo instructions, such that you only want to accept the "FORWARD", "LEFT", and "RIGHT" instructions, and each instruction must come on a line of its own (separated by a newline character), and each instruction is followed by any number of space characters, which is then followed by a numeric amount. Example input might look like this:
FORWARD 10
RIGHT 45
FORWARD 20
RIGHT 10
FORWARD 5
LEFT 3
You could use sipp to parse these instructions using code like this:
let input =
"FORWARD 10\nRIGHT 45\nFORWARD 20\nRIGHT 10\nFORWARD 5\nLEFT 3";
// We know that Rust strings are UTF-8 encoded, so wrap the input
// bytes with a Utf8Decoder.
let decoder = Utf8Decoder::wrap(input.as_bytes());
// Now wrap the decoder with a Parser to give us useful methods
// for reading through the input.
let mut parser = Parser::wrap(decoder);
// Keep reading while there is still input available.
while parser.has_more()? {
// Read the command by reading everything up to (but not
// including) the next space.
let command = parser.read_up_to(' ')?;
// Skip past the (one or more) space character.
parser.skip_while(|c| c == ' ')?;
// Read until the next newline (or the end of input, whichever
// comes first).
let number = parser.read_up_to('\n')?;
// Now either there is no further input, or the next character
// must be a newline. If the next character is a newline, skip
// past it.
parser.accept('\n')?;
}
Given a hardcoded string which represents a comma-separated list, you could use this package to parse it like so:
let input = "first value,second value,third,fourth,fifth,etc";
let buffer = ByteBuffer::wrap(input.as_bytes());
let decoder = Utf8Decoder::wrap_buffer(buffer);
let mut parser = Parser::wrap(decoder);
let mut value_list = Vec::new();
// Keep reading while input is available.
while parser.has_more()? {
// Read up to the next comma, or until the end of input
// (whichever comes first).
let value = parser.read_up_to(',')?;
value_list.push(value);
// Now either there is no further input, or the next character
// must be a comma. If the next character is a comma, skip
// past it.
parser.accept(',')?;
}
assert_eq!(value_list
.iter()
.map(|s| s.to_string())
.collect::<Vec<String>>(),
vec!["first value",
"second value",
"third",
"fourth",
"fifth",
"etc"]);
Initial release.
has_more method to Parser.Altered return type of public method Parser.read_up_to(char) so that it now returns None
instead of an empty String. Adjusted examples and unit tests accordingly.