| Crates.io | scnr2_macro |
| lib.rs | scnr2_macro |
| version | 0.3.3 |
| created_at | 2025-06-19 14:22:43.945328+00 |
| updated_at | 2025-09-19 16:01:04.548684+00 |
| description | Scanner/Lexer with regex patterns and multiple modes |
| homepage | |
| repository | https://github.com/jsinger67/scnr2 |
| max_upload_size | |
| id | 1718417 |
| size | 9,989 |
scnr2 is a high-performance Rust crate for building custom scanners and lexers with advanced regular expression support, multi-mode state management, and compile-time code generation. Designed for simplicity, speed, and flexibility, scnr2 empowers developers to create robust tokenizers for complex parsing tasks with minimal runtime overhead.
set, push, pop).use scnr2::scanner;
scanner! {
MyScanner {
mode INITIAL {
token r"\d+" => 1; // Numbers
token r"[a-zA-Z_][a-zA-Z0-9_]*" => 2; // Identifiers
}
}
}
fn main() {
use my_scanner::MyScanner;
let scanner = MyScanner::new();
let input = "abc 123";
for m in scanner.find_matches(input, 0) {
println!("{}: '{}'", m.token_type, &input[m.span]);
}
}
This example demonstrates a scanner with multiple modes and transitions, handling strings both inside and outside comments.
use scnr2::scanner;
scanner! {
StringsInCommentsScanner {
mode INITIAL {
token r"\r\n|\r|\n" => 1; // "Newline"
token r"[\s--\r\n]+" => 2; // "Whitespace other than newline"
token r#"""# => 5; // "StringDelimiter"
token r"/\*" => 6; // "CommentStart"
token r"[a-zA-Z_][a-zA-Z0-9_]*" => 9; // "Identifier"
on 5 push STRING;
on 6 enter COMMENT;
}
mode STRING {
token r#"""# => 5; // "StringDelimiter"
token r#"([^"\\]|\\.)*"# => 10; // "StringContent"
on 5 pop;
}
mode COMMENT {
token r#"""# => 5; // "StringDelimiter"
token r"\*/" => 7; // "CommentEnd"
token r#"([^*"]|\*[^\/])*"# => 8; // "CommentText"
on 5 push STRING;
on 7 enter INITIAL;
}
}
}
const INPUT: &str = r#"Id
"Text with escaped End\""
/* Comment "String in Comment" and "String again" */"#;
fn main() {
use strings_in_comments_scanner::StringsInCommentsScanner;
let scanner = StringsInCommentsScanner::new();
let tokens = scanner
.find_matches_with_position(INPUT, 0)
.collect::<Vec<_>>();
println!("Tokens found: {}", tokens.len());
for token in &tokens {
println!(
"{}: '{}'",
token,
INPUT[token.span.clone()].escape_default()
);
}
}
Sample output:
Tokens found: 17
[0..2] tok 9 at 1:1-1:3: 'Id'
[2..3] tok 1 at 2:0-2:1: '\n'
[3..4] tok 5 at 2:1-2:2: '\"'
[4..27] tok 10 at 2:2-2:25: 'Text with escaped End\\\"'
[27..28] tok 5 at 2:25-2:26: '\"'
[28..29] tok 1 at 3:0-3:1: '\n'
[29..31] tok 6 at 3:1-3:3: '/*'
[31..40] tok 8 at 3:3-3:12: ' Comment '
[40..41] tok 5 at 3:12-3:13: '\"'
[41..58] tok 10 at 3:13-3:30: 'String in Comment'
[58..59] tok 5 at 3:30-3:31: '\"'
[59..64] tok 8 at 3:31-3:36: ' and '
[64..65] tok 5 at 3:36-3:37: '\"'
[65..77] tok 10 at 3:37-3:49: 'String again'
[77..78] tok 5 at 3:49-3:50: '\"'
[78..79] tok 8 at 3:50-3:51: ' '
[79..81] tok 7 at 3:51-3:53: '*/'
push, enter, and pop transitions.followed by and not followed by for context-sensitive tokens.use scnr2::scanner;let scanner = MyScanner::new();scanner.find_matches(input, 0) yields an iterator of matches.scanner.find_matches_with_position(input, 0) provides line/column data.How do I skip whitespaces?
Define a token for whitespaces and ignore it, or simply omit a whitespace token—unmatched text is skipped.
How do I use multiple scanner modes?
Define multiple mode blocks and use push, enter, or pop transitions.
How do I detect unmatched input?
Add a catch-all token at the end of your mode's token list (e.g., r".") and handle it as an error.
We welcome contributions! Whether you want to add features, improve documentation, or report issues, your input helps make scnr2 better for everyone.
For more examples and API details, see the docs.rs documentation.