regex-specificity

Crates.ioregex-specificity
lib.rsregex-specificity
version0.1.0
created_at2025-12-29 14:24:12.386658+00
updated_at2025-12-31 09:38:14.444599+00
descriptionA heuristic-based crate to calculate the specificity of a regular expression pattern against a specific string.
homepagehttps://github.com/KKW557/regex-specificity
repositoryhttps://github.com/KKW557/regex-specificity
max_upload_size
id2010669
size14,269
557 (KKW557)

documentation

https://docs.rs/regex-specificity

README

Regex Specificity

Latest version Github Actions Docs Downloads

A heuristic-based crate to calculate the specificity of a regular expression pattern against a specific string.

Concept

Specificity measures how "precise" a match is. For example, the pattern abc is more specific to the string "abc" than the pattern a.c or .*.

The calculation follows these principles:

  1. Positional Weighting: Earlier matches contribute more to the total specificity than later ones.
  2. Certainty: Literals (exact characters) specificity higher than character classes or wildcards.
  3. Information Density: Narrower character classes (e.g., [a-z]) specificity higher than broader ones (e.g., .).
  4. Branching Penalty: Patterns with many alternatives (alternations) are penalized as they are less specific.

Usage

The get function assumes that the string provided is already a full match for the pattern. If the pattern does not match the string, the resulting specificity will be mathematically inconsistent and meaningless for comparison purposes.

let string = "abc";
let high = get(string, "abc").unwrap();
let low = get(string, ".*").unwrap();

assert!(high > low);

Counterintuitive

Since this crate uses a greedy heuristic based on the HIR (High-level Intermediate Representation), certain patterns may yield the same specificity even if they look different. A common example is when a wildcard .* "swallows" the entire string before other parts of the pattern can be evaluated.

let string = "cat";

let pattern1 = r".*";
let pattern2 = r".*a.*";

assert_eq!(
    get(string, pattern1).unwrap(),
    get(string, pattern2).unwrap()
)

Recommendation

If you need to distinguish between patterns with identical specificity, we recommend using the pattern length as a secondary tie-breaker:

  • Mathematical: A longer pattern is often less specific because it requires more redundant components to describe the same set.
  • Intuitive: You may prefer the pattern with more literals.
if result_a == result_b {
    return pattern_a.len().cmp(&pattern_b.len()); 
}

License

This project is licensed under the MIT License © 2025 557.

Commit count: 0

cargo fmt