regex-tokenizer

Crates.io	regex-tokenizer
lib.rs	regex-tokenizer
version	0.1.1
source	src
created_at	2023-03-22 18:59:01.889224+00
updated_at	2023-03-22 19:08:48.423678+00
description	A regex tokenizer
homepage	https://github.com/cmargiotta/regex-tokenizer
repository	https://github.com/cmargiotta/regex-tokenizer
max_upload_size
id	817401
size	11,517

Carmine Margiotta (cmargiotta)

documentation

README

regex-tokenizer

A regex-based tokenizer with a minimal DSL to define it!

Usage

tokenizer! {
    SimpleTokenizer

    r"[a-zA-Z]\w*" => Identifier
    r"\d+" => Number
    r"\s+" => _
}

And, in a function

...
let tokenizer = SimpleTokenizer::new();
...

SimpleTokenizer will generate an enum called SimpleTokenyzer_types, containing Identifier and Number. Regexes with _ as class are ignored; when a substring that does not match a specified regex is found, the tokenization is considered failed.

When multiple non-ignored regexes match with an input, priority is given to the one defined first.

Calling tokenizer.tokenize(...) will return an iterator that extracts tokens from the query. A token is formed by:

{
    value: String,
    position: usize,
    type_: SimpleTokenyzer_types,
}

position will be the position of the token's first character inside the query. A call to .next() will return None if there are no more tokens to extract.

Commit count: 9

regex-tokenizer

documentation

README

regex-tokenizer

Usage

cargo fmt