rexile

Crates.iorexile
lib.rsrexile
version0.4.6
created_at2026-01-23 06:05:07.51379+00
updated_at2026-01-25 16:39:48.591328+00
descriptionA blazing-fast regex engine with 10-100x faster compilation and competitive matching performance - now with dot wildcard, non-greedy quantifiers, DOTALL mode, and non-capturing groups
homepagehttps://github.com/KSD-CO/rexile
repositoryhttps://github.com/KSD-CO/rexile
max_upload_size
id2063577
size656,750
(tonthatvu)

documentation

https://docs.rs/rexile

README

ReXile ๐ŸฆŽ

Crates.io Documentation License: MIT OR Apache-2.0

A blazing-fast regex engine with 10-100x faster compilation speed

ReXile is a lightweight regex alternative that achieves exceptional compilation speed while maintaining competitive matching performance:

  • โšก 10-100x faster compilation - Load patterns instantly
  • ๐Ÿš€ Competitive matching - 1.4-1.9x faster on simple patterns
  • ๐ŸŽฏ Dot wildcard support - Full ., .*, .+ implementation with backtracking
  • ๐Ÿ“ฆ Only 2 dependencies - memchr and aho-corasick for SIMD primitives
  • ๐Ÿง  Smart backtracking - Handles complex patterns with quantifiers
  • ๐Ÿ”ง Perfect for parsers - Ideal for GRL, DSL, and rule engines

Key Features:

  • โœ… Literal searches with SIMD acceleration
  • โœ… Multi-pattern matching (alternations)
  • โœ… Character classes with negation
  • โœ… Quantifiers (*, +, ?)
  • โœ… Non-greedy quantifiers (*?, +?, ??) - NEW in v0.2.1
  • โœ… Dot wildcard (., .*, .+) with backtracking
  • โœ… DOTALL mode ((?s)) - Dot matches newlines - NEW in v0.2.1
  • โœ… Non-capturing groups ((?:...)) with alternations - NEW in v0.2.1
  • โœ… Escape sequences (\d, \w, \s, etc.)
  • โœ… Sequences and groups
  • โœ… Word boundaries (\b, \B)
  • โœ… Anchoring (^, $)
  • โœ… Capturing groups - Auto-detection and extraction

๐ŸŽฏ Purpose

ReXile is a high-performance regex engine optimized for fast compilation:

  • ๐Ÿš€ Lightning-fast compilation - 10-100x faster than regex crate
  • โšก Competitive matching - Faster on simple patterns, acceptable on complex
  • ๐ŸŽฏ Ideal for parsers - GRL, DSL, rule engines with dynamic patterns
  • ๐Ÿ“ฆ Minimal dependencies - Only memchr + aho-corasick for SIMD primitives
  • Memory efficient - 15x less compilation memory
  • ๐Ÿ”ง Full control - Custom optimizations for specific use cases

Performance Highlights

Compilation Speed (vs regex crate):

  • Pattern [a-zA-Z_]\w*: 104.7x faster ๐Ÿš€
  • Pattern \d+: 46.5x faster ๐Ÿš€
  • Pattern (\w+)\s*(>=|<=|==|!=|>|<)\s*(.+): 40.7x faster ๐Ÿš€
  • Pattern .*test.*: 15.3x faster
  • Average: 10-100x faster compilation

Matching Speed:

  • Simple patterns (\d+, \w+): 1.4-1.9x faster โœ…
  • Complex patterns with backtracking: 2-10x slower (acceptable for non-hot-path)
  • Perfect trade-off for parsers and rule engines

Use Case Example (Load 1000 GRL rules):

  • regex crate: ~2 seconds compilation
  • rexile: ~0.02 seconds (100x faster startup!)

Memory Comparison:

  • Compilation: 15x less memory (128 KB vs 1920 KB)
  • Peak memory: 5x less in stress tests (0.12 MB vs 0.62 MB)
  • Search operations: Equal memory efficiency

When to Use ReXile:

  • โœ… Parsers & lexers (fast token matching + instant startup)
  • โœ… Rule engines with dynamic patterns (100x faster rule loading)
  • โœ… DSL compilers (GRL, business rules)
  • โœ… Applications with many patterns (instant initialization)
  • โœ… Memory-constrained environments (15x less memory)
  • โœ… Non-hot-path matching (acceptable trade-off for 100x faster compilation)

๐Ÿš€ Quick Start

use rexile::Pattern;

// Literal matching with SIMD acceleration
let pattern = Pattern::new("hello").unwrap();
assert!(pattern.is_match("hello world"));
assert_eq!(pattern.find("say hello"), Some((4, 9)));

// Multi-pattern matching (aho-corasick fast path)
let multi = Pattern::new("foo|bar|baz").unwrap();
assert!(multi.is_match("the bar is open"));

// Dot wildcard matching (with backtracking)
let dot = Pattern::new("a.c").unwrap();
assert!(dot.is_match("abc"));  // . matches 'b'
assert!(dot.is_match("a_c"));  // . matches '_'

// Greedy quantifiers with dot
let greedy = Pattern::new("a.*c").unwrap();
assert!(greedy.is_match("abc"));       // .* matches 'b'
assert!(greedy.is_match("a12345c"));   // .* matches '12345'

let plus = Pattern::new("a.+c").unwrap();
assert!(plus.is_match("abc"));         // .+ matches 'b' (requires at least one char)
assert!(!plus.is_match("ac"));         // .+ needs at least 1 character

// Non-greedy quantifiers (NEW in v0.2.1)
let lazy = Pattern::new(r"start\{.*?\}").unwrap();
assert_eq!(lazy.find("start{abc}end{xyz}"), Some((0, 10))); // Matches "start{abc}", not greedy

// DOTALL mode - dot matches newlines (NEW in v0.2.1)
let dotall = Pattern::new(r"(?s)rule\s+.*?\}").unwrap();
let multiline = "rule test {\n  content\n}";
assert!(dotall.is_match(multiline));    // (?s) makes .* match across newlines

// Non-capturing groups with alternation (NEW in v0.2.1)
let group = Pattern::new(r#"(?:"test"|foo)"#).unwrap();
assert!(group.is_match("\"test\""));    // Matches quoted "test"
assert!(group.is_match("foo"));         // Or matches foo

// Digit matching (DigitRun fast path - 1.4-1.9x faster than regex!)
let digits = Pattern::new("\\d+").unwrap();
let matches = digits.find_all("Order #12345 costs $67.89");
// Returns: [(7, 12), (20, 22), (23, 25)]

// Identifier matching (IdentifierRun fast path)
let ident = Pattern::new("[a-zA-Z_]\\w*").unwrap();
assert!(ident.is_match("variable_name_123"));

// Quoted strings (QuotedString fast path - 1.4-1.9x faster!)
let quoted = Pattern::new("\"[^\"]+\"").unwrap();
assert!(quoted.is_match("say \"hello world\""));

// Word boundaries
let word = Pattern::new("\\btest\\b").unwrap();
assert!(word.is_match("this is a test"));
assert!(!word.is_match("testing"));

// Anchors
let exact = Pattern::new("^hello$").unwrap();
assert!(exact.is_match("hello"));
assert!(!exact.is_match("hello world"));

Cached API (Recommended for Hot Paths)

For patterns used repeatedly in hot loops:

use rexile;

// Automatically cached - compile once, reuse forever
assert!(rexile::is_match("test", "this is a test").unwrap());
assert_eq!(rexile::find("world", "hello world").unwrap(), Some((6, 11)));

// Perfect for parsers and lexers
for line in log_lines {
    if rexile::is_match("ERROR", line).unwrap() {
        // handle error
    }
}

โœจ Supported Features

Fast Path Optimizations (10 Types)

ReXile uses JIT-style specialized implementations for common patterns:

Fast Path Pattern Example Performance vs regex
Literal "hello" Competitive (SIMD)
LiteralPlusWhitespace "rule " Competitive
DigitRun \d+ 1.4-1.9x faster โœจ
IdentifierRun [a-zA-Z_]\w* 104.7x faster compilation
QuotedString "[^"]+" 1.4-1.9x faster โœจ
WordRun \w+ Competitive
DotWildcard ., .*, .+ With backtracking
Alternation foo|bar|baz 2x slower (acceptable)
LiteralWhitespaceQuoted Complex Competitive
LiteralWhitespaceDigits Complex Competitive

Regex Features

Feature Example Status
Literal strings hello, world โœ… Supported
Alternation foo|bar|baz โœ… Supported (aho-corasick)
Start anchor ^start โœ… Supported
End anchor end$ โœ… Supported
Exact match ^exact$ โœ… Supported
Character classes [a-z], [0-9], [^abc] โœ… Supported
Quantifiers *, +, ? โœ… Supported
Non-greedy quantifiers .*?, +?, ?? โœ… Supported (v0.2.1)
Dot wildcard ., .*, .+ โœ… Supported (v0.2.0)
DOTALL mode (?s) - dot matches newlines โœ… Supported (v0.2.1)
Escape sequences \d, \w, \s, \., \n, \t โœ… Supported
Sequences ab+c*, \d+\w* โœ… Supported
Non-capturing groups (?:abc|def) โœ… Supported (v0.2.1)
Capturing groups Extract (group) โœ… Supported (v0.2.0)
Word boundaries \b, \B โœ… Supported
Bounded quantifiers {n}, {n,m} ๐Ÿšง Planned
Lookahead/lookbehind (?=...), (?<=...) ๐Ÿšง Planned
Backreferences \1, \2 ๐Ÿšง Planned

๐Ÿ“Š Performance Benchmarks

Compilation Speed (Primary Advantage)

Pattern Compilation Benchmark (vs regex crate):

Pattern rexile regex Speedup
[a-zA-Z_]\w* 95.2 ns 9.97 ยตs 104.7x faster ๐Ÿš€
\d+ 86.7 ns 4.03 ยตs 46.5x faster ๐Ÿš€
(\w+)\s*(>=|<=|==|!=|>|<)\s*(.+) 471 ns 19.2 ยตs 40.7x faster ๐Ÿš€
.*test.* 148 ns 2.27 ยตs 15.3x faster ๐Ÿš€

Average: 10-100x faster compilation - Perfect for dynamic patterns!

Matching Speed

Simple Patterns (Fast paths):

  • Pattern \d+ on "12345": 1.4-1.9x faster โœ…
  • Pattern \w+ on "variable": 1.4-1.9x faster โœ…
  • Pattern "[^"]+" on quoted strings: Competitive โœ…

Complex Patterns (Backtracking):

  • Pattern a.+c on "abc": 2-5x slower (acceptable)
  • Pattern .*test.* on long strings: 2-10x slower (acceptable)
  • Trade-off: 100x faster compilation vs slightly slower complex matching

Use Case Performance

Loading 1000 GRL Rules:

  • regex crate: ~2 seconds (2ms per pattern)
  • rexile: ~0.02 seconds (20ยตs per pattern)
  • Result: 100x faster startup! Perfect for parsers and rule engines.

Memory Comparison

Test 1: Pattern Compilation (10 patterns):

  • regex: 1920 KB in 7.89ms
  • ReXile: 128 KB in 370ยตs
  • Result: 15x less memory, 21x faster โœจ

Test 2: Search Operations (5 patterns ร— 139KB corpus):

  • Both: 0 bytes memory delta
  • Result: Equal efficiency โœ…

Test 3: Stress Test (50 patterns ร— 500KB corpus):

  • regex: 0.62 MB peak in 46ms
  • ReXile: 0.12 MB peak in 27ms
  • Result: 5x less peak memory, 1.7x faster โœจ

When ReXile Wins

โœ… Simple patterns (\d+, \w+) - 1.4-1.9x faster matching โœ… Fast compilation - 10-100x faster pattern compilation (huge win!) โœ… Identifiers ([a-zA-Z_]\w*) - 104.7x faster compilation โœ… Memory efficiency - 15x less for compilation, 5x less peak โœ… Instant startup - Load 1000 patterns in 0.02s vs 2s (100x faster) โœ… Dot wildcards - Full ., .*, .+ support with backtracking

When regex Wins

โš ๏ธ Complex patterns with backtracking - ReXile 2-10x slower (acceptable trade-off) โš ๏ธ Alternations (when|then) - ReXile 2x slower โš ๏ธ Hot-path matching - For performance-critical matching, regex may be better

Architecture

Pattern โ†’ Parser โ†’ AST โ†’ Fast Path Detection โ†’ Specialized Matcher
                                                        โ†“
                                     DigitRun (memchr SIMD scanning)
                                     IdentifierRun (direct byte scanning)
                                     QuotedString (memchr + validation)
                                     Alternation (aho-corasick automaton)
                                     Literal (memchr SIMD)
                                     ... 5 more fast paths

Run benchmarks yourself:

cargo run --release --example per_file_grl_benchmark
cargo run --release --example memory_comparison

๐Ÿ“ฆ Installation

Add to your Cargo.toml:

[dependencies]
rexile = "0.2"

๐ŸŽ“ Examples

Literal Search

let p = Pattern::new("needle").unwrap();
assert!(p.is_match("needle in a haystack"));
assert_eq!(p.find("where is the needle?"), Some((13, 19)));

// Find all occurrences
let matches = p.find_all("needle and needle");
assert_eq!(matches, vec![(0, 6), (11, 17)]);

Multi-Pattern (Alternation)

// Fast multi-pattern search using aho-corasick
let keywords = Pattern::new("import|export|function|class").unwrap();
assert!(keywords.is_match("export default function"));

Anchored Patterns

// Must start with pattern
let starts = Pattern::new("^Hello").unwrap();
assert!(starts.is_match("Hello World"));
assert!(!starts.is_match("Say Hello"));

// Must end with pattern
let ends = Pattern::new("World$").unwrap();
assert!(ends.is_match("Hello World"));
assert!(!ends.is_match("World Peace"));

// Exact match
let exact = Pattern::new("^exact$").unwrap();
assert!(exact.is_match("exact"));
assert!(!exact.is_match("not exact"));

Cached API (Best for Repeated Patterns)

// First call compiles and caches
rexile::is_match("keyword", "find keyword here").unwrap();

// Subsequent calls reuse cached pattern (zero compile cost)
rexile::is_match("keyword", "another keyword").unwrap();
rexile::is_match("keyword", "more keyword text").unwrap();

๐Ÿ“š More examples: See examples/ directory for:

Run examples with:

cargo run --example basic_usage
cargo run --example log_processing

๐Ÿ”ง Use Cases

ReXile is production-ready for:

โœ… Ideal Use Cases

  • Parsers and lexers - 21x faster pattern compilation, competitive matching
  • Rule engines - Simple pattern matching in business rules (original use case!)
  • Log processing - Fast keyword and pattern extraction
  • Dynamic patterns - Applications that compile patterns at runtime
  • Memory-constrained environments - 15x less compilation memory
  • Low-latency applications - Predictable performance, no JIT warmup

๐ŸŽฏ Perfect Patterns for ReXile

  • Fast compilation: All patterns compile 10-100x faster
  • Simple matching: \d+, \w+ (1.4-1.9x faster matching)
  • Identifiers: [a-zA-Z_]\w* (104.7x faster compilation!)
  • Dot wildcards: ., .*, .+ with proper backtracking
  • Keyword search: rule\s+, function\s+
  • Many patterns: Load 1000 patterns instantly (100x faster startup)

โš ๏ธ Consider regex crate for

  • Complex alternations (ReXile 2x slower)
  • Very sparse patterns (ReXile up to 1.44x slower)
  • Unicode properties (\p{L} - not yet supported)
  • Advanced features (lookahead, backreferences - not yet supported)

๐Ÿค Contributing

Contributions welcome! ReXile is actively maintained and evolving.

Current focus:

  • โœ… Core regex features complete
  • โœ… Dot wildcard (., .*, .+) with backtracking - v0.2.0
  • โœ… Capturing groups - Auto-detection and extraction - v0.2.0
  • โœ… Non-greedy quantifiers (.*?, +?, ??) - v0.2.1
  • โœ… DOTALL mode ((?s)) for multiline matching - v0.2.1
  • โœ… Non-capturing groups ((?:...)) with alternations - v0.2.1
  • โœ… 10-100x faster compilation
  • ๐Ÿ”„ Advanced features: bounded quantifiers {n,m}, lookahead, Unicode support

How to contribute:

  1. Check issues for open tasks
  2. Run tests: cargo test
  3. Run benchmarks: cargo run --release --example per_file_grl_benchmark
  4. Submit PR with benchmarks showing performance impact

Priority areas:

  • ๐Ÿ“‹ Bounded quantifiers ({n}, {n,m})
  • ๐Ÿ“‹ More fast path patterns
  • ๐Ÿ“‹ Unicode support
  • ๐Ÿ“‹ Documentation improvements

๐Ÿ“œ License

Licensed under either of:

at your option.

๐Ÿ™ Credits

Built on top of:

  • memchr by Andrew Gallant - SIMD-accelerated substring search
  • aho-corasick by Andrew Gallant - Multi-pattern matching automaton

Developed for the rust-rule-engine project, providing fast pattern matching for GRL (Grule Rule Language) parsing and business rule evaluation.

Performance Philosophy: ReXile achieves competitive performance through intelligent specialization rather than complex JIT compilation:

  • 10 hand-optimized fast paths for common patterns
  • SIMD acceleration via memchr
  • Pre-built automatons for alternations
  • Zero-copy iterator design
  • Minimal metadata overhead

Status: โœ… Production Ready (v0.2.1)

  • โœ… Compilation Speed: 10-100x faster than regex crate

  • โœ… Matching Speed: 1.4-1.9x faster on simple patterns

  • โœ… Memory: 15x less compilation, 5x less peak

  • โœ… Features: Core regex + dot wildcard + capturing groups + non-greedy + DOTALL + non-capturing groups

  • โœ… Testing: 84 unit tests + 13 group integration tests passing

  • โœ… Real-world validated: GRL parsing, rule engines, DSL compilers

Commit count: 54

cargo fmt