async-regex

Crates.ioasync-regex
lib.rsasync-regex
version0.1.1
created_at2025-09-12 18:54:24.214125+00
updated_at2025-09-12 19:23:46.266163+00
descriptionEmpower regex with streaming capabilities - high-performance async streaming pattern search using regex for multi-byte pattern matching in data streams
homepagehttps://gitlab.com/Efimster/slib/-/tree/master/async-regex
repositoryhttps://gitlab.com/Efimster/slib
max_upload_size
id1836064
size82,106
Efimster (Efimster)

documentation

https://docs.rs/async-regex

README

async-regex

Empower regex with streaming capabilities!

A high-performance library that brings the power of regex pattern matching to streaming data. This crate extends the standard read_until functionality to support multi-byte patterns using regex, making it perfect for parsing protocols, log files, and other structured data streams.

Why async-regex? This crate empowers regex with streaming capabilities - bringing the robust pattern matching of the regex crate to streaming data processing!

โœจ Features

  • ๐Ÿ” Regex-Powered: Built on the robust regex crate for reliable pattern matching
  • ๐ŸŒŠ Streaming Support: Process data as it arrives without loading everything into memory
  • โšก High Performance: Optimized implementations with comprehensive benchmarks
  • ๐Ÿฆ€ Pure Rust Implementation: Entirely written in safe Rust with zero unsafe code
  • ๐Ÿงช Well Tested: Extensive test coverage
  • ๐Ÿ“š Well Documented: Comprehensive documentation and examples
  • ๐Ÿ’พ Memory Efficient: Zero-copy parsing and minimal allocations
  • ๐Ÿ”„ Async & Sync APIs: Both async and synchronous versions available
  • ๐Ÿš€ Multi-byte Patterns: Unlike standard read_until which only supports single bytes
  • ๐ŸŽฏ Protocol Parsing: Perfect for HTTP, custom protocols, and structured data streams

๐ŸŽฏ Use Cases

Perfect for:

  • HTTP Protocol Parsing: Find headers like "Content-Length:" or "Authorization:" in streaming HTTP data
  • Log File Processing: Parse structured logs with regex patterns as they're being written
  • Network Protocol Parsing: Handle custom protocols with complex pattern matching
  • Data Pipeline Processing: Process large files without loading everything into memory
  • Real-time Data Analysis: Find patterns in streaming sensor data or metrics
  • Async Web Applications: Parse request/response data efficiently
  • File Format Parsing: Parse structured files like CSV, JSON, or custom formats
  • Any streaming scenario where you need regex pattern matching on data that arrives incrementally

๐Ÿš€ Quick Start

Async Regex Pattern Search

use async_regex::read_until_pattern_async;
use futures::io::Cursor;
use tokio::runtime::Runtime;

let rt = Runtime::new().unwrap();
rt.block_on(async {
    let mut reader = Cursor::new(b"HTTP/1.1 200 OK\r\nContent-Length: 42\r\n\r\n");
    let mut buffer = Vec::new();

    // Find HTTP status line using regex
    let (matched, size) = read_until_pattern_async(
        &mut reader,
        r"HTTP/\d\.\d \d+",
        &mut buffer
    ).await.unwrap();

    assert_eq!(matched, b"HTTP/1.1 200");
    assert_eq!(buffer, b"HTTP/1.1 200");
});

Complex Regex Pattern Matching

use async_regex::read_until_pattern_async;
use futures::io::Cursor;
use tokio::runtime::Runtime;

let rt = Runtime::new().unwrap();
rt.block_on(async {
    let mut reader = Cursor::new(b"user@example.com and admin@company.org");
    let mut buffer = Vec::new();

    // Find email addresses using regex
    let (matched, size) = read_until_pattern_async(
        &mut reader,
        r"\w+@\w+\.\w+",
        &mut buffer
    ).await.unwrap();

    assert_eq!(matched, b"user@example.com");
    assert_eq!(buffer, b"user@example.com");
});

Sync Regex Pattern Search

use async_regex::read_until_pattern;
use std::io::Cursor;

let mut reader = Cursor::new(b"2024-01-15 10:30:45 INFO: Application started");
let mut buffer = Vec::new();

// Find timestamp using regex
let (matched, size) = read_until_pattern(
    &mut reader, 
    r"\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}", 
    &mut buffer
).unwrap();

assert_eq!(matched, b"2024-01-15 10:30:45");
assert_eq!(buffer, b"2024-01-15 10:30:45");

๐Ÿ“Š Performance

This crate is optimized for high-performance streaming pattern search with regex:

Streaming Performance Benefits

  • Memory Efficient: Process large files without loading everything into memory
  • Regex-Powered: Leverages the robust and fast regex crate for pattern matching
  • Async Optimized: Minimal overhead for async operations (~10% compared to sync)
  • Zero-Copy Operations: Efficient data handling with minimal allocations

Performance Characteristics

Benchmarks run on MacBook Pro (2019) with 8-Core Intel Core i9 @ 2.4GHz, 32GB RAM

Simple Pattern Matching

  • Small data (500 bytes): ~9.3ยตs per operation (async), ~9.1ยตs (sync)
  • Medium data (5KB): ~9.4ยตs per operation (async), ~9.1ยตs (sync)
  • Large data (50KB): ~10.3ยตs per operation (async), ~10.1ยตs (sync)

Regex Pattern Matching

  • Small data (500 bytes): ~481ยตs per operation (regex patterns)
  • Medium data (5KB): ~519ยตs per operation (regex patterns)
  • Large data (50KB): ~835ยตs per operation (regex patterns)

Complex Pattern Matching

  • Small data (500 bytes): ~428ยตs per operation (complex regex)
  • Medium data (5KB): ~431ยตs per operation (complex regex)
  • Large data (50KB): ~468ยตs per operation (complex regex)

Pattern Position Performance

  • Pattern at start: ~7.1ยตs per operation
  • Pattern at middle: ~7.3ยตs per operation
  • Pattern at end: ~7.3ยตs per operation

Performance Notes

  • Memory usage: Constant memory usage regardless of input size
  • Pattern complexity: Performance scales with regex complexity, not input size
  • Async overhead: ~10% performance cost for async operations vs sync
  • Consistent performance: Pattern position has minimal impact on performance

Why Streaming Matters

  • Large Files: Process multi-gigabyte files without memory issues
  • Real-time Data: Handle continuous data streams efficiently
  • Network Protocols: Parse data as it arrives over the network
  • Resource Efficiency: Lower memory footprint and better resource utilization

๐Ÿš€ Empowering Regex with Streaming

This crate bridges the gap between regex and streaming data processing!

The Problem:

  • regex crate: Powerful pattern matching, but requires complete in-memory data
  • tokio::io::AsyncBufRead::read_until: Great for streaming, but only single-byte delimiters
  • Standard libraries: No built-in way to use regex patterns on streaming data

Our Solution:

  • Regex-powered streaming: Use any regex pattern on streaming data
  • Multi-byte patterns: Find complex patterns like "HTTP/1.1" or email addresses
  • Memory efficient: Process data as it arrives, not all at once
  • Async & sync: Both streaming paradigms supported

Perfect for:

  • Protocol parsing: HTTP headers, custom protocols, structured data
  • Log processing: Parse logs as they're written with regex patterns
  • Data pipelines: Process large files with complex pattern matching
  • Real-time systems: Handle streaming data with regex power

When to Use Our Solution vs Other Libraries

Use Case Our Solution regex crate tokio::io::AsyncBufRead
Regex patterns on streaming data โœ… Perfect! โŒ In-memory only โŒ Single-byte only
Multi-byte pattern matching โœ… Regex-powered โœ… Full regex support โŒ Single-byte only
Streaming data processing โœ… Memory efficient โŒ Loads all data โœ… Memory efficient
Complex pattern matching โœ… Full regex support โœ… Full regex support โŒ Single-byte only
Async I/O โœ… Native async โŒ Sync only โœ… Native async
Large file processing โœ… Streaming โŒ Memory intensive โš ๏ธ Limited patterns
Protocol parsing โœ… Perfect โŒ Not suitable โš ๏ธ Limited patterns

๐Ÿ’ก Key Insight: This crate combines the power of regex with the efficiency of streaming, making it perfect for processing large files or continuous data streams with complex pattern matching requirements.

API Reference

Async Functions (Regex-Powered Streaming)

  • read_until_pattern_async<R>(reader: &mut R, pattern: &str, to: &mut Vec<u8>) -> Result<(Vec<u8>, usize)>
    • Find regex pattern in async stream, returns matched substring and total bytes read
    • Where R: AsyncBufRead + Unpin
  • read_while_any_async<R>(reader: &mut R, check_set: &[u8], to: &mut Vec<u8>) -> Result<(u8, usize)>
    • Read while any byte in check_set matches, returns stop byte and count
    • Where R: AsyncBufRead + Unpin

Sync Functions (Regex-Powered Streaming)

  • read_until_pattern<R>(reader: &mut R, pattern: &str, to: &mut Vec<u8>) -> Result<(Vec<u8>, usize)>
    • Find regex pattern in sync stream, returns matched substring and total bytes read
    • Where R: BufRead
  • read_while_any<R>(reader: &mut R, check_set: &[u8], to: &mut Vec<u8>) -> Result<(u8, usize)>
    • Read while any byte in check_set matches, returns stop byte and count
    • Where R: BufRead

Utility Functions

  • find_pattern(haystack: &[u8], needle: &Regex) -> Option<(usize, usize)>
    • Direct regex pattern search in byte slice, returns (start, length)
    • Uses compiled regex for maximum performance

Testing

Run tests:

cargo test

Run benchmarks:

cargo bench

๐Ÿค Contributing

Contributions are welcome! This crate aims to make regex pattern matching accessible for streaming data. Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

๐Ÿ“„ License

This project is licensed under the MIT License. See the LICENSE file for details.

๐ŸŽฏ Summary

async-regex empowers the powerful regex crate with streaming capabilities, making it possible to use complex regex patterns on data streams without loading everything into memory. Perfect for protocol parsing, log processing, and any scenario where you need regex power on streaming data.

Commit count: 48

cargo fmt