| Crates.io | tyrx |
| lib.rs | tyrx |
| version | 0.1.2 |
| created_at | 2026-01-14 12:09:17.766855+00 |
| updated_at | 2026-01-15 12:14:33.801505+00 |
| description | Typed, ergonomic regular expression library |
| homepage | |
| repository | https://github.com/H2CO3/tyrx |
| max_upload_size | |
| id | 2042679 |
| size | 68,855 |
TyRx attempts to bring the strong typing and excellent domain modeling capabilities of Rust into the world of regular expressions.
It provides traits, types, and macros for quickly building types that know how to parse themselves from a string by compiling and matching a regular expression.
The crate name is pronounced "tee-rex", like the dinosaur.
As a trivial example, when you need to parse a string with a list of numbers:
use tyrx::{Result, TyRx};
fn main() -> Result<()> {
let string = "13.37 69.67 -137 +42 -2.718281829";
let numbers: Vec<_> = f64::iter_from_str(string).collect::<Result<_>>()?;
assert_eq!(numbers, [
13.37,
69.67,
-137.0,
42.0,
-2.718281829,
]);
Ok(())
}
Now, for a slightly more complicated example to show off the capabilities of the crate a bit more. Let's say there's a file with each line in the format:
ident1: 3.14, SomeText
ident2: -137.42, OtherStringContent
so the first part before the : is an identifier of the record, while the
rest of the line is a comma-separated pair of values (a fractional number
and some alphanumeric text), represented by a nested type.
You can use the following piece of code to represent the outer and the inner type, specify the subpatterns necessary for matching each field, and have the library generate all the parsing boilerplate:
use tyrx::{
RegexPattern, FromMatch, ErasedLifetime, TyRx,
builder::{Char, Ignore},
};
#[derive(PartialEq, Debug, RegexPattern, FromMatch, ErasedLifetime)]
struct Outer {
#[tyrx(pattern = r"(?<Outer.prefix>[[:alnum:]]+)")]
prefix: String,
colon: Char<':'>,
#[tyrx(pattern = r"(?<Outer.space>\s+)")]
space: Ignore<String>,
/// nested type implementing `RegexPattern` and `FromMatch`
inner: Inner,
}
#[derive(PartialEq, Debug, RegexPattern, FromMatch, ErasedLifetime)]
struct Inner {
number_value: f64,
#[tyrx(pattern = r"(?<Inner.separator>,\s*)")]
separator: tyrx::builder::Ignore<String>,
#[tyrx(pattern = r"(?<Inner.text_content>[[:alnum:]]+)")]
text_content: String,
}
fn main() -> tyrx::Result<()> {
let text = r#"
ident1: 3.14, SomeText
ident2: -137.42, OtherStringContent
"#;
let matches: Vec<_> = Outer::iter_from_str(text).collect::<tyrx::Result<_>>()?;
assert_eq!(matches, [
Outer {
prefix: String::from("ident1"),
colon: Char::default(),
space: Ignore::default(),
inner: Inner {
number_value: 3.14,
separator: Ignore::default(),
text_content: String::from("SomeText"),
},
},
Outer {
prefix: String::from("ident2"),
colon: Char::default(),
space: Ignore::default(),
inner: Inner {
number_value: -137.42,
separator: Ignore::default(),
text_content: String::from("OtherStringContent"),
},
},
]);
Ok(())
}
The main entry point of the crate is the [TyRx] trait. This is automatically
implemented (by means of a blanket impl) for types that also implement the
[RegexPattern], [FromMatch], and [ErasedLifetime] traits, all of which
can be automatically #[derive]'d.
The [RegexPattern] trait is implemented by types that represent a regular
expression pattern. They supply this pattern to the regex engine by writing
it into the provided formatter in the [RegexPattern::fmt_pattern()] method.
The derive macro accepts the following attributes:
Top-level (struct and enum) attributes:
#[tyrx(rename = identifier)]: changes the top-level type name in capture
group names.
#[tyrx(flag(case_insensitive, unicode = false, ignore_whitespace = true, ...))]:
turns on or off the specified flags, as recognized by the regex crate. See
the documentation
for the exact meaning of each flag. The names of the flags are identical to
the corresponding RegexBulider method name.
The current list is:
case_insensitivemulti_linecrlfdot_matches_new_lineunicodeswap_greedignore_whitespaceSpecifying the name of the flag or assigning it the value true turns it on.
Assigning it the value false turns the flag off.
Struct field and variant field attributes:
#[tyrx(rename = identifier)]: causes the field name part of the capture
group in the generated pattern to be replaced by the specified literal
identifier.#[tyrx(pattern = "regex pattern string or other Display-able value")]:
causes the field's portion of the generated pattern to be replacede by the
supplied sub-pattern. By default, the field's sub-pattern is derived from
its type. You may re-use this sub-pattern in the custom pattern by using
e.g. format_args!() and interpolating [RegexPattern::pattern_display()],
forwarded to the field type.Enum variant attributes:
[tyrx(rename = identifier)]: similar to the rename attribute on struct
fields, except that it replaces the variant name part of the capture group
name. When applied to a unit variant, it also changes the literal pattern to
be matched.#[tyrx(flag(multi_line = true, dot_matches_new_line = false, swap_greed, ...))]:
sets or clears flags; carries the same meaning as the top-level struct or
enum attribute (see the section above for the precise list of flags).The [FromMatch] trait represents a type that can parse itself from a match
or a set of matched capture groups.
The derive macro accepts all attributes accepted by the [RegexPattern]
derive, and some more:
#[tyrx(lifetime = 'lt)]: changes the lifetime parameter of the trait
from the default, fresh lifetime to the specified parameter. The specified
lifetime must already exist as a parameter of the type, as it will not be
added to the generic parameter declaration list of the generated impl.The [ErasedLifetime] trait is a technical necessity, arising out of storing
compiled regular expressions in a global cache. For a detailed explanation, see
the relevant section below.
Enums are represented as a choice between each variant. Choices are ordered: each variant is attempted to be matched in sequence. This is important when some patterns overlap (i.e., they match some common subset of haystacks).
Variants are treated identically to structs, with one exception: unit variants, unlike unit structs, match their own literal name. For example:
use tyrx::{TyRx, RegexPattern, FromMatch, ErasedLifetime};
#[derive(Clone, PartialEq, Debug, RegexPattern, FromMatch, ErasedLifetime)]
enum MyChoice {
/// Struct variants
Ratio {
numerator: f64,
slash: tyrx::builder::Char<'/'>,
denominator: f64,
},
/// Unit variants match themselves, except when renamed
#[tyrx(rename = literal_one)]
LiteralOne,
/// Raw identifiers work correctly, too
r#LiteralTwo,
/// Tuple variants
Identifier(
#[tyrx(pattern = "(?<MyChoice.Identifier.foo>[a-zA-Z_][a-zA-Z0-9_]*)", rename = r#foo)]
String,
),
}
fn main() -> tyrx::Result<()> {
let haystack = "42/-13.37 +8./1.0 arbitrary literal_one -69/42 Some LiteralTwo OTHER";
let enum_matches: Vec<_> = MyChoice::iter_from_str(haystack).collect::<tyrx::Result<_>>()?;
assert_eq!(enum_matches, [
MyChoice::Ratio {
numerator: 42.0,
slash: Default::default(),
denominator: -13.37,
},
MyChoice::Ratio {
numerator: 8.0,
slash: Default::default(),
denominator: 1.0,
},
MyChoice::Identifier("arbitrary".into()),
MyChoice::LiteralOne,
MyChoice::Ratio {
numerator: -69.0,
slash: Default::default(),
denominator: 42.0,
},
MyChoice::Identifier("Some".into()),
MyChoice::LiteralTwo,
MyChoice::Identifier("OTHER".into()),
]);
Ok(())
}
Borrowed string-like types (including &str, Cow<'_, str>, etc.) can also be
deserialized from the haystack without copying or allocation. The following example
demonstrates this:
use std::borrow::Cow;
use tyrx::{TyRx, RegexPattern, FromMatch, ErasedLifetime};
#[derive(Clone, PartialEq, Debug, RegexPattern, FromMatch, ErasedLifetime)]
struct Borrowing<'a> {
#[tyrx(pattern = r"(?<Borrowing.first>[0-9]+)\s+")]
first: &'a str,
#[tyrx(pattern = r"(?<Borrowing.last>[a-zA-Z]+)")]
last: Cow<'a, str>,
}
fn main() -> tyrx::Result<()> {
// make this a local instead of a &'static str
let haystack = String::from("123 abc 99 defghi 9876543 foobar");
let borrowed_matches: Vec<_> = Borrowing::iter_from_str(&haystack).collect::<tyrx::Result<_>>()?;
assert_eq!(borrowed_matches, [
Borrowing { first: "123", last: Cow::Borrowed("abc") },
Borrowing { first: "99", last: Cow::Borrowed("defghi") },
Borrowing { first: "9876543", last: Cow::Borrowed("foobar") },
]);
Ok(())
}
This example also demonstrates that the automatically-added bounds should usually
suffice. However, if you need precise control over the lifetime argument of the
[FromMatch], impl, then you can use the #[tyrx(lifetime = 'a)] annotation with
the #[derive] macro.
In order to avoid re-compiling the regex each time a type is parsed, the crate
maintains a global cache of compiled regular expressions. In order to identify
types, their TypeId is used as a key in the cache.
This would, however, preclude non-'static types from being used with the library,
which would be a pretty big loss, as borrowing from the matched string (as opposed
to cloning its substrings) is an important performance optimiation. To solve this
problem, the [ErasedLifetime] trait is defined with the sole purpose of providing
the [ErasedLifetime::Erased] associated type. When automatically derived, this
associated type is set to the Self type but with all lifetime parameters (if any)
replaced with the 'static lifetime, thereby allowing TypeId to work on the
lifetime-erased type, thus allowing borrowed types to also work with the library.
Compiling and caching a regular expression can be performed explicitly by calling
the [build_regex()] function.
The [Spanned] type allows one to preserve the byte range of each match.
This is a transparent newtype wrapper which simply forwards its [RegexPattern]
and [FromMatch] impls to the underlying type, while storing the byte span of
the specific match it came from.
When using #[tyrx(pattern = "...")], the derive macro makes a best-effort attempt
at ensuring that the specified pattern contains the corresponding, appropriately
named capture group. However, this only works when the pattern expression is a
literal or a sufficiently simple expression (e.g., a block, a parenthesized group,
a typecast expression, a reference or dereference) that can be naively determined
to be a literal. If the expression contains more complex subexpressions, then the
macro gives up and lets the code compile, even if the required capture group is
missing.
FromStr implsMany types have an implementation of the standard FromStr
trait as a way of naturally parsing a value from a string. If you have such a
type, you can automatically adapt it to have [RegexPattern] and [FromMatch]
impls by wrapping it in a [MatchFromStr].
builder::Ignore].
TODO(H2CO3): describe this in detail.The [builder] module contains helper types for composing regexes in
frequently-used ways. For example:
crate::builder::Char]crate::builder::CharRange]crate::builder::CharClass]crate::builder::Repeat]crate::builder::Alternation]crate::builder::Ignore]TODO(H2CO3): describe each of these in detail.