tinytoken

Crates.iotinytoken
lib.rstinytoken
version0.1.4
sourcesrc
created_at2024-11-09 14:56:36.429111
updated_at2024-11-12 13:39:47.151007
descriptionLibrary for tokenizing text into words, numbers, symbols, and more, with customizable parsing options.
homepagehttps://github.com/luxluth/tinytoken#readme
repositoryhttps://github.com/luxluth/tinytoken
max_upload_size
id1442134
size28,213
0x7C00 (luxluth)

documentation

README

tinytoken

This library provides a tokenizer for parsing and categorizing different types of tokens, such as words, numbers, strings, characters, symbols, and operators. It includes configurable options to handle various tokenization rules and formats, enabling fine-grained control over how text input is parsed.

Example

use tinytoken::{Tokenizer, TokenizerBuilder, Choice};

fn main() {
    let tokenizer = TokenizerBuilder::new()
        .parse_char_as_string(true)
        .allow_digit_separator(Choice::Yes('_'))
        .add_symbol('$')
        .add_operators(&['+', '-'])
        .build("let x = 123_456 + 0xFF");

    match tokenizer.tokenize() {
        Ok(tokens) => {
            for token in tokens {
                println!("{:?}", token);
            }
        }
        Err(err) => {
            eprintln!("Tokenization error: {err}");
        }
    }
}

Contributions

Feel free to send a PR to improve and/or extend the tool capabilities

Commit count: 28

cargo fmt