| Crates.io | hyperscan-tokio |
| lib.rs | hyperscan-tokio |
| version | 0.1.0 |
| created_at | 2025-06-18 00:24:51.310858+00 |
| updated_at | 2025-06-18 00:24:51.310858+00 |
| description | High-performance async regex scanning with VectorScan |
| homepage | https://github.com/mikeyasaservice/hyperscan-tokio |
| repository | https://github.com/mikeyasaservice/hyperscan-tokio |
| max_upload_size | |
| id | 1716417 |
| size | 342,340 |
Blazing-fast multi-pattern regex matching for Rust, bringing the power of Vectorscan (Intel Hyperscan's modern fork) to the async ecosystem.
Bytes integrationscanning_throughput/1024 time: [2.1 µs 2.2 µs 2.3 µs]
thrpt: [434 MiB/s 455 MiB/s 476 MiB/s]
scanning_throughput/1048576 time: [45.2 µs 45.8 µs 46.4 µs]
thrpt: [21.5 GiB/s 21.9 GiB/s 22.1 GiB/s]
vs_alternatives/hyperscan_tokio time: [12.3 µs 12.5 µs 12.7 µs]
vs_alternatives/rust_regex time: [891.2 µs 895.7 µs 900.1 µs]
71.6x faster than regex crate for multi-pattern matching
use hyperscan_tokio::prelude::*;
use std::sync::Arc;
#[tokio::main]
async fn main() -> Result<()> {
// Compile patterns into a database
let db = DatabaseBuilder::new()
.add_pattern(Pattern::new(r"\b\d{3}-\d{2}-\d{4}\b").id(1)) // SSN
.add_pattern(Pattern::new(r"\b4\d{15}\b").id(2)) // Credit card
.add_pattern(Pattern::new(r"(?i)password:\s*\S+").id(3)) // Passwords
.build()?;
// Create scanner
let scanner = Scanner::new(db)?;
// Scan data
let data = b"SSN: 123-45-6789, CC: 4111111111111111";
let matches = scanner.scan_bytes(data).await?;
for m in matches {
println!("Pattern {} matched at [{}, {})", m.pattern_id, m.start, m.end);
}
Ok(())
}
hyperscan-tokio offers two ways to use PCRE-compatible patterns with capture groups:
Uses the regex crate internally, no special system dependencies:
hyperscan-tokio = { version = "0.1", features = ["chimera"] }
Requires VectorScan built with Chimera support:
# Build from source automatically
hyperscan-tokio = { version = "0.1", features = ["vendored", "chimera-ffi"] }
# Or use system VectorScan with Chimera (requires manual build)
hyperscan-tokio = { version = "0.1", features = ["system", "chimera-ffi"] }
⚠️ Important: Standard Homebrew/package manager installations of VectorScan do NOT include Chimera. See CHIMERA_SETUP.md for detailed setup instructions.
let pool = WorkerPool::builder()
.num_workers(16)
.core_affinity(true) // Pin workers to CPU cores
.build(db)?;
// Process millions of log lines
let jobs: Vec<ScanJob> = log_lines.into_iter()
.map(|line| ScanJob { id: line.id, data: line.into() })
.collect();
let results = pool.scan_batch(jobs).await?;
let reloadable = ReloadableDatabase::new(db);
// In another task, reload patterns without stopping scanning
tokio::spawn(async move {
let new_db = load_new_patterns().await?;
reloadable.reload(new_db).await?;
});
let stream_scanner = StreamScanner::new(db)?;
// Scan data as it arrives
let match_stream = stream_scanner
.scan_stream(tcp_stream)
.await?;
while let Some(m) = match_stream.next().await {
process_match(m?);
}
Requires VectorScan development files:
# Ubuntu/Debian
sudo apt-get install libhyperscan-dev cmake
# macOS
brew install vectorscan cmake
# Build
cargo build --release
hyperscan-tokio
├── hyperscan-tokio-sys/ # Low-level FFI bindings
│ └── Safe wrappers around VectorScan C API
└── src/ # High-level async Rust API
├── scanner.rs # Core scanning functionality
├── database.rs # Pattern compilation
├── worker_pool.rs # Parallel scanning
└── stream.rs # Streaming mode
Run the benchmarks:
cargo bench
Contributions are welcome! Please read our Contributing Guide for details.
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
Built with ❤️ for the Rust community. Making enterprise-grade regex performance accessible to everyone.