Crates.io | fm |
lib.rs | fm |
version | 0.4.0 |
source | src |
created_at | 2020-07-02 10:56:30.238919 |
updated_at | 2024-08-30 09:44:26.52241 |
description | Non-backtracking fuzzy text matcher |
homepage | |
repository | https://github.com/softdevteam/fm/ |
max_upload_size | |
id | 260582 |
size | 60,923 |
fm
is a simple limited backtracking fuzzy text matcher useful for matching
multi-line patterns and literal text. Wildcard operators can be used to
match parts of a line and to skip multiple lines of text. For example this
pattern:
...A
...
D...
will successfully match against literals such as:
xyzA
B
C
Dxyz
The intraline wildcard operator ...
can appear at the start and/or end of a
line. ...X...
matches any literal line that contains "X"; ...X
matches any
literal line that ends with "X"; and X...
matches any literal line that
starts with "X". ......
matches exactly one literal line (i.e. the contents
of the literal line are irrelevant but this will not match against the end
of the literal text).
There are two interline wildcard operators that match zero or more literal lines until a match for the next item is found, at which point the search is anchored (i.e. backtracking will not occur before the anchor). An item is either:
The interline wildcards are:
...
searches until it finds a match for the
line immediately after the interline operator ("the prefix"), at which
point the search is anchored. This wildcard does not backtrack...~
searches until it finds a match for the
next group, at which point the search is anchored. This wildcard
backtracks, though never further than one group.Interline wildcards cannot directly follow each other i.e. ...\n...?
is an
invalid pattern. Interline wildcards can appear at the beginning or end of
a pattern: at the end of a pattern, both interline wildcards have identical
semantics to each other.
Consider this pattern:
A
...
B
C
...
This will match successfully against the literal:
A
D
B
C
E
but fail to match against the literal:
A
D
B
B
C
E
because the ...
matches against the first "B", which anchors the search, then
immediately fails to match against the second "B".
In contrast the pattern:
A
..~
B
C
...
does match the literal because ..~
backtracks on the second "B".
There are two reasons why you should default to using ...
rather than ..~
.
Most obviously ...
does not backtrack and has linear performance. Less
obviously ...
is a more rigorous test, since it cannot skip prefix matches
(i.e. the next line after the ...
in the pattern) in the literal.
use fm::FMatcher;
assert!(FMatcher::new("a").unwrap().matches("a").is_ok());
assert!(FMatcher::new(" a ").unwrap().matches("a").is_ok());
assert!(FMatcher::new("a").unwrap().matches("b").is_err());
assert!(FMatcher::new("a\n...\nb").unwrap().matches("a\na\nb").is_ok());
assert!(FMatcher::new("a\n...\nb").unwrap().matches("a\na\nb\nb").is_err());
When a match fails, the matcher returns an error indicating the line number at
which the match failed. The error can be formatted for human comprehension
using the provided Display
implementation.
If you want to use non-default options, you will first need to use FMBuilder
before having access to an FMatcher
. For example, to use "name matching"
(where you specify that the same chunk of text must appear at multiple points
in the text, but without specifying exactly what the chunk must contain) you
can set options as follows:
use {fm::FMBuilder, regex::Regex};
let ptn_re = Regex::new(r"\$[0-9]+?\b").unwrap();
let text_re = Regex::new(r"[a-z]+?\b").unwrap();
let matcher = FMBuilder::new("$1 $1")
.unwrap()
.name_matcher(ptn_re, text_re)
.build()
.unwrap();
assert!(matcher.matches("a a").is_ok());
assert!(matcher.matches("a b").is_err());