The README has a quick introduction to the performance of this crate. This will look at some examples and compare them to the Oniguruma engine. ## Catastrophic backtracking Backtracking engines can have worst-case performance when the regular expression forces the engine to consider an exponentially increasing number of sub-cases. For a good explanation of that, read [Runaway Regular Expressions: Catastrophic Backtracking][]. Let's look at the regex from the README again: (a|b|ab)*bc And the input text: ababababababababababababababababababababababababababababac Python's engine has exponential runtime. The regex crate and fancy-regex however have no problem with it. ## Oniguruma [Oniguruma][] implements a backtracking engine. So we'd expect it to have a problem with the above regex too. However, in the above case, it quickly finds that there's no match. How is that possible? The answer is that it has optimizations which sometimes help it avoid having to do any matching at all: In the pattern `(a|b|ab)*bc`, you might notice that if the input doesn't contain `bc`, the pattern will never match. Oniguruma detects that and, before it tries to do any matching, tries to find `bc` in the input. But what happens if we add `bc` at the end of the input, like this: ababababababababababababababababababababababababababababacbc Now the optimization doesn't help anymore, and Oniguruma is slow too. ## fancy-regex For `(a|b|ab)*bc` fancy-regex is fast in all cases because it can delegate to the regex crate which matches it in linear runtime. Let's look at another regex, one that makes use of a "fancy" look-ahead: (a|b|ab)*(?=c) When fancy-regex matches it against this input: abababababababababababababababababababababababababababab It's slow! The reason is that `(?=c)` is not supported by the regex crate, so we need to handle it with backtracking. And because `(a|b|ab)*` is before it, we need to do it with backtracking as well. Oniguruma doesn't have a problem with this particular case because its optimization saves it again: It checks if there's a `c` in the input before doing any matching. There's nothing preventing fancy-regex from adding similar optimizations in the future, but it's not done yet. Note that how much fancy-regex can do without backtracking depends on the structure of the regex. For example, with `(?=(a|b|ab)*bc)`, the inner part of the look-ahead can be delegated to regex entirely. ### Summary * If the regex doesn't use fancy features, fancy-regex should have linear runtime compared to Oniguruma's exponential worst-case. * Even if the regex doesn't use any fancy features, Oniguruma can be faster because it is a mature and highly optimized engine. * With fancy features, Oniguruma can be faster because of optimizations. [Runaway Regular Expressions: Catastrophic Backtracking]: https://www.regular-expressions.info/catastrophic.html [Oniguruma]: https://github.com/kkos/oniguruma