| Crates.io | tokenise |
| lib.rs | tokenise |
| version | 0.1.0 |
| created_at | 2025-03-22 08:40:38.703257+00 |
| updated_at | 2025-03-22 08:40:38.703257+00 |
| description | A flexible tokeniser library for parsing text |
| homepage | |
| repository | https://github.com/HaineSensei/tokenise |
| max_upload_size | |
| id | 1601668 |
| size | 56,453 |
A flexible lexical analyser (tokeniser) for parsing text into configurable token types.
tokenise allows you to split text into tokens based on customisable rules for special characters, delimiters, and comments. It's designed to be flexible enough to handle various syntax styles while remaining simple to configure.
Add this to your Cargo.toml:
[dependencies]
tokenise = "0.1.0"
use tokenise::{Tokeniser, TokenState};
fn main() {
// Create a new tokeniser
let mut tokeniser = Tokeniser::new();
// Configure tokeniser with rules
tokeniser.add_specials(".,;:!?");
tokeniser.add_delimiter_pairs(&vec!["()", "[]", "{}"]).unwrap();
tokeniser.add_balanced_delimiter("\"").unwrap();
tokeniser.set_sl_comment("//").unwrap();
tokeniser.set_ml_comment("/*", "*/").unwrap();
// Tokenise some source text
let source = "let x = 42; // The answer\nprint(\"Hello world!\");";
let tokens = tokeniser.tokenise(source).unwrap();
// Work with the resulting tokens
for token in tokens {
println!("{:?}: '{}'", token.get_state(), token.value());
}
}
The tokeniser recognises several token types represented by the TokenState enum:
Word: Non-special character sequencesLDelimiter/RDelimiter: Left/right delimiters of a pair (e.g., '(', ')')BDelimiter: Balanced delimiters (e.g., quotation marks)SymbolString: Special charactersNewLine: Line breaksWhiteSpace: Spaces, tabs, etc.SLComment: Single-line commentsMLComment: Multi-line commentsThis project is licensed under the MIT License - see the LICENSE file for details.