| Crates.io | monster-regex |
| lib.rs | monster-regex |
| version | 0.2.2 |
| created_at | 2025-12-31 06:17:15.793536+00 |
| updated_at | 2026-01-09 04:19:58.664197+00 |
| description | A custom regex spec |
| homepage | |
| repository | https://github.com/monster0506/monster-regex |
| max_upload_size | |
| id | 2013988 |
| size | 1,165,053 |
This document outlines the regular expression syntax and features supported by Rift's search engine.
Add monster-regex to your Cargo.toml:
[dependencies]
monster-regex = "0.2.2"
By default, Regex::new uses the BacktrackingRegexEngine. This engine supports advanced features like lookarounds and backreferences but may have exponential runtime on pathological patterns.
use monster_regex::{Regex, Flags};
fn main() {
// Compile using the default backtracking engine
let re = Regex::new(r"\w+", Flags::default()).unwrap();
assert!(re.is_match("hello"));
// Find a match
if let Some(m) = re.find("hello world") {
println!("Found match at {}-{}", m.start, m.end); // 0-5
}
}
For performance-critical code where O(n) guarantees are required, use the LinearRegexEngine (based on PikeVM). Note that this engine does not support lookarounds or backreferences.
use monster_regex::{Regex, Flags};
fn main() {
// Explicit constructor for the linear engine
let re = Regex::new_linear(r"a.*b", Flags::default()).unwrap();
assert!(re.is_match("abbb"));
}
You can switch between engines at runtime using AnyRegexEngine. This allows you to choose the best engine for the pattern or use case.
use monster_regex::engine::{
AnyRegexEngine, RegexEngine, CompiledRegex,
backtracking::BacktrackingRegexEngine,
linear::LinearRegexEngine
};
use monster_regex::Flags;
fn main() {
let use_linear = true;
let flags = Flags::default();
let pattern = "abc";
// Type-erased engine trait object
let engine: Box<dyn RegexEngine<Regex = Box<dyn CompiledRegex>>> = if use_linear {
Box::new(AnyRegexEngine(LinearRegexEngine))
} else {
Box::new(AnyRegexEngine(BacktrackingRegexEngine))
};
// Compile returns a Box<dyn CompiledRegex>
let regex = engine.compile(pattern, flags).unwrap();
assert!(regex.is_match("abc"));
}
monster-regex exposes two key traits for compiled regexes:
CompiledRegex: Object-safe trait containing core methods (is_match, find, captures, replace). Usable with &str. This is the return type when using dynamic dispatch.CompiledRegexHaystack: Generic trait extending CompiledRegex for streaming support via the Haystack trait. Not object-safe.When using dynamic dispatch (Box<dyn CompiledRegex>), you are limited to the methods in CompiledRegex (string-based) and cannot use the streaming Haystack API directly on the trait object.
You can configure behavior using Flags:
use monster_regex::{Regex, Flags};
fn main() {
let mut flags = Flags::default();
flags.ignore_case = Some(true); // Case insensitive
flags.multiline = true; // ^ and $ match line boundaries
let re = Regex::new(r"^hello", flags).unwrap();
assert!(re.is_match("HELLO\nworld"));
}
You can also parse patterns in the pattern/flags format used by Rift:
use monster_regex::parse_rift_format;
use monster_regex::Regex;
fn main() {
let (pattern, flags) = parse_rift_format("abc/i").unwrap();
let re = Regex::new(&pattern, flags).unwrap();
assert!(re.is_match("ABC"));
}
use monster_regex::{Regex, Flags};
fn main() {
let re = Regex::new(r"\d+", Flags::default()).unwrap();
let text = "123 abc 456";
for m in re.find_all(text) {
println!("Match: {}", &text[m.start..m.end]);
}
}
use monster_regex::{Regex, Flags};
fn main() {
let re = Regex::new(r"foo", Flags::default()).unwrap();
// Replace first occurrence only
let result = re.replace("foo bar foo", "baz");
assert_eq!(result, "baz bar foo");
// Replace all occurrences
let result = re.replace_all("foo bar foo", "baz");
assert_eq!(result, "baz bar baz");
}
use monster_regex::{Regex, Flags};
fn main() {
let re = Regex::new(r"(\w+)@(\w+)", Flags::default()).unwrap();
let text = "alice@home bob@work";
for caps in re.captures_all(text) {
println!("Full match: {:?}", caps.full_match);
println!("Groups: {:?}", caps.groups);
}
}
use monster_regex::{Regex, Flags};
fn main() {
let mut flags = Flags::default();
flags.ignore_case = Some(true);
let re = Regex::new(r"hello", flags).unwrap();
// Access the original pattern
assert_eq!(re.pattern(), "hello");
// Access the flags used during compilation
assert_eq!(re.flags().ignore_case, Some(true));
}
For advanced use cases like searching non-contiguous memory (ropes, gap buffers) without allocation, implement the Haystack trait:
use monster_regex::{Regex, Haystack};
#[derive(Copy, Clone)]
struct MyRope<'a> {
// ... custom internal structure
phantom: std::marker::PhantomData<&'a ()>,
}
impl<'a> Haystack for MyRope<'a> {
fn len(&self) -> usize { /* ... */ }
fn char_at(&self, pos: usize) -> Option<(char, usize)> { /* ... */ }
fn char_before(&self, pos: usize) -> Option<char> { /* ... */ }
fn matches_range(&self, pos: usize, other_start: usize, other_end: usize) -> bool { /* ... */ }
fn starts_with(&self, pos: usize, literal: &str) -> bool { /* ... */ }
}
fn main() {
let rope = MyRope { /* ... */ };
let re = Regex::new("pattern", Default::default()).unwrap();
// Check if pattern matches anywhere
if re.is_match_from(rope) {
println!("Found a match!");
}
// Find first match
if let Some(m) = re.find_from(rope) {
println!("Match at {}-{}", m.start, m.end);
}
// Find match starting at a specific offset
if let Some(m) = re.find_from_at(rope, 10) {
println!("Match starting from offset 10: {}-{}", m.start, m.end);
}
// Iterate all matches
for m in re.find_all_from(rope) {
// ...
}
}
Search patterns are entered in the format:
pattern/flags
The following characters have special meaning and must be escaped with \ to be matched literally:
. * + ? ^ $ | ( ) [ ] { } \
All other characters match themselves literally.
Note on Dot (.):
By default, . matches any character except newline. Use the s (dotall) flag to make . match newlines.
i (ignore-case) or c (case-sensitive) flags.Quantifiers specify how many times the preceding atom (character, group, or character class) should match.
| Quantifier | Meaning | Greedy? | Example |
|---|---|---|---|
* |
0 or more | Yes | a* matches "", "a", "aa"... |
+ |
1 or more | Yes | a+ matches "a", "aa"... |
? |
0 or 1 | Yes (prefers 1) | a? matches "" or "a", preferring "a" |
{n} |
Exactly n | — | a{3} matches "aaa" |
{n,m} |
n to m | Yes | a{2,4} matches "aa", "aaa", "aaaa" |
{n,} |
n or more | Yes | a{2,} matches "aa", "aaa"... |
{,m} |
0 to m | Yes | a{,3} matches "", "a", "aa", "aaa" |
*? |
0 or more | No | a*? matches minimal characters |
+? |
1 or more | No | a+? matches minimal characters |
?? |
0 or 1 | No | a?? prefers 0 matches |
{n,m}? |
n to m | No | a{2,4}? matches "aa" before "aaa" |
| Class | Matches |
|---|---|
\d |
Digit [0-9] |
\D |
Non-digit |
\w |
Word character [a-zA-Z0-9_] (ASCII by default) |
\W |
Non-word character |
\s |
Whitespace [ \t\r\n\f\v] |
\S |
Non-whitespace |
| Class | Matches |
|---|---|
\l |
Lowercase character |
\L |
Non-lowercase character |
\u |
Uppercase character |
\U |
Non-uppercase character |
\x |
Hexadecimal digit |
\X |
Non-hexadecimal digit |
\o |
Octal digit |
\O |
Non-octal digit |
\h |
Head of word character (start of a word) |
\H |
Non-head of word character |
\p |
Punctuation [!"#$%&'()*+,\-./:;<=>?@\[\\\]^_{ |
\P |
Non-punctuation |
\a |
Alphanumeric [a-zA-Z0-9] |
\A |
Non-alphanumeric |
\w, \d, \s, \h match ASCII characters only.u flag: These classes include Unicode characters (e.g., \w matches accented characters).Custom character sets and ranges (e.g., [a-z], [^0-9]) are supported.
Note on Escaping in Character Classes:
In character classes, special meaning is different. For example, [\]] matches a literal ], and [a\-z] matches a, \, or -.
Anchors assert a position without matching characters (zero-width).
| Anchor | Meaning |
|---|---|
^ |
Start of string (or start of line in multiline mode) |
$ |
End of string (or end of line in multiline mode) |
\< |
Start of word |
\> |
End of word |
\b |
Word boundary (matches at \< or \>) |
\zs |
Sets the start of the match (everything before is excluded from the result) |
\ze |
Sets the end of the match (everything after is excluded from the result) |
These anchors match at a specific position in the buffer. They are zero-width assertions and do not consume characters.
| Anchor | Meaning | Example |
|---|---|---|
\%nl |
Matches anywhere on line n (1-indexed). | \%5lfoo matches "foo" only if it appears on line 5. |
Not implemented in the parser, clients must handle line-based matching.
| \%nc | Matches at column n (1-indexed). | \%5cfoo matches "foo" starting at column 5. |
| \%# | Matches at the current cursor position. | \%#foo matches "foo" starting exactly under the cursor. |
\<: Matches the position where a word starts (preceded by non-word, followed by word char).\>: Matches the position where a word ends (preceded by word char, followed by non-word).\b: Matches at either \< or \>.Word boundaries \< and \> use the same character definition as \w ([a-zA-Z0-9_]). With the u flag, both adapt to Unicode.
Flags are appended after the pattern delimiter (e.g., pattern/flags).
| Flag | Name | Description |
|---|---|---|
i |
ignore-case | Case-insensitive matching (overrides smartcase). |
c |
case-sensitive | Case-sensitive matching (overrides smartcase). |
m |
multiline | ^ and $ match line boundaries (\n), not just the start/end of the entire buffer. |
s |
dotall | . matches newlines (including end-of-line). |
x |
verbose | Whitespace and # comments in the pattern are ignored. Literal spaces must be escaped (e.g., \ or [ ]). |
g |
global | Match all occurrences (used for find-all or replace operations). |
u |
unicode | Enables Unicode support for character classes (\w, \d, etc.). |
Verbose Mode Examples (x flag):
/foo bar/x matches "foobar" (space is ignored)./foo\ bar/x matches "foo bar" (space is escaped)./foo[ ]bar/x matches "foo bar" (space in bracket).| Sequence | Matches |
|---|---|
\n |
Newline (LF) |
\t |
Tab |
\r |
Carriage return (CR) |
\f |
Form feed |
\v |
Vertical tab |
\\ |
Literal backslash |
pattern1|pattern2 matches either pattern1 or pattern2.(pattern) groups part of the regex and captures it.(?<name>pattern) captures the group with a specific name.(?:pattern) groups without capturing.\1 through \9 refer to captured groups 1-9. \0 refers to the entire match.Lookarounds assert that what follows or precedes the current position matches a pattern, without including it in the match result.
| Assertion | Type | Meaning |
|---|---|---|
(?>=foo) |
Positive Lookahead | Matches if followed by "foo". |
(?>!foo) |
Negative Lookahead | Matches if not followed by "foo". |
(?<=foo) |
Positive Lookbehind | Matches if preceded by "foo". |
(?<!foo) |
Negative Lookbehind | Matches if not preceded by "foo". |