[![Github CI](https://github.com/cmccomb/human_regex/actions/workflows/tests.yml/badge.svg)](https://github.com/cmccomb/human_regex/actions)
[![Crates.io](https://img.shields.io/crates/v/human_regex.svg)](https://crates.io/crates/human_regex)
[![docs.rs](https://img.shields.io/docsrs/human_regex/latest?logo=rust)](https://docs.rs/human_regex)

# Regex for Humans
The goal of this crate is simple: give everybody the power of regular expressions without having 
to learn the complicated syntax. It is inspired by [ReadableRegex.jl](https://github.com/jkrumbiegel/ReadableRegex.jl).
This crate is a wrapper around the [core Rust regex library](https://crates.io/crates/regex). 

# Example usage
If you want to match a date of the format `2021-10-30`, you could use the following code to generate a regex:
```rust
use human_regex::{beginning, digit, exactly, text, end};
let regex_string = beginning()
    + exactly(4, digit())
    + text("-")
    + exactly(2, digit())
    + text("-")
    + exactly(2, digit())
    + end();
assert!(regex_string.to_regex().is_match("2014-01-01"));
```
The `to_regex()` method returns a [standard Rust regex](https://docs.rs/regex/1.5.4/regex/struct.Regex.html). We can do this another way with slightly less repetition though!
```rust
use human_regex::{beginning, digit, exactly, text, end};
let first_regex_string = text("-") + exactly(2, digit());
let second_regex_string = beginning()
    + exactly(4, digit())
    + exactly(2, first_regex_string)
    + end();
assert!(second_regex_string.to_regex().is_match("2014-01-01"));
```
For a more extensive set of examples, please see [The Cookbook](crate::cookbook).

# Features
This crate currently supports the vast majority of syntax available in the [core Rust regex library](https://crates.io/crates/regex) through a human-readable API. 

## Single Character

| Implemented?                                | Expression          | Description                                                   |
|:-------------------------------------------:|:-------------------:|:--------------------------------------------------------------|
| `any()`                                     |         `.`         | any character except new line (includes new line with s flag) |
| `digit()`                                   |        `\d`         | digit (`\p{Nd}`)                                              |
| `non_digit()`                               |        `\D`         | not digit                                                     |
| `unicode_category(UnicodeCategory)`         |        `\p{L}`      | Unicode non-script category                                   |
| `unicode_script(UnicodeScript)`             |     `\p{Greek}`     | Unicode script category                                       |
| `non_unicode_category(UnicodeCategory)`     |        `\P{L}`      | Negated one-letter name Unicode character class               |
| `non_unicode_script(UnicodeCategory)`       |     `\P{Greek}`     | negated Unicode character class (general category or script)  |

## Character Classes

|      Implemented?           |   Expression   | Description                                                                         |
|:---------------------------:|:--------------:|:------------------------------------------------------------------------------------|
|  `or(&['x', 'y', 'z']) `    |    `[xyz]`     | A character class matching either x, y or z (union).                                |
|  `nor(&['x', 'y', 'z'])`    |    `[^xyz]`    | A character class matching any character except x, y and z.                         |
|`within('a'..='z')`          |    `[a-z]`     | A character class matching any character in range a-z.                              |
|`without('a'..='z')`         |    `[^a-z]`    | A character class matching any character outside range a-z.                         |
|       See below             | `[[:alpha:]]`  | ASCII character class (`[A-Za-z]`)                                                  |                
|  `non_alphanumeric()`       | `[[:^alpha:]]` | Negated ASCII character class (`[^A-Za-z]`)                                         |               
|         `or()`              |  `[x[^xyz]]`   | Nested/grouping character class (matching any character except y and z)             |
|      `and(&[])`/`&`         |  `[a-y&&xyz]`  | Intersection (a-y AND xyz = xy)                                                     |             
| `(or[1,2,3,4] & nor(3))`    | `[0-9&&[^4]]`  | Subtraction using intersection and negation (matching 0-9 except 4)                 |    
|    `subtract(&[],&[])`      |   `[0-9--4]`   | Direct subtraction (matching 0-9 except 4). Use .collect::<Vec<char>> to use ranges.|             
|      `xor(&[],&[])`         |  `[a-g~~b-h]`  | Symmetric difference (matching `a` and `h` only). Requires .collect() for ranges.   |          
|`or(&escape_all(&['[',']']))`|    `[\[\]]`    | Escaping in character classes (matching `[` or `]`)                                 |         

## Perl Character Classes

|    Implemented?    | Expression | Description                                                                |
|:------------------:| :--------: |:---------------------------------------------------------------------------|
|     `digit()`      |   `\d`     | digit (`\p{Nd}`)                                                           |
|   `non_digit()`    |   `\D`     | not digit                                                                  |
|   `whitespace()`   |   `\s`     | whitespace (`\p{White_Space}`)                                             |
| `non_whitespace()` |   `\S`     | not whitespace                                                             |
|      `word()`      |   `\w`     | word character (`\p{Alphabetic} + \p{M} + \d + \p{Pc} + \p{Join_Control}`) |
|      `non_word()`  |   `\W`     | not word character                                                         |

## ASCII Character Classes

|   Implemented?   |   Expression   | Description                       |
|:----------------:|:--------------:|:----------------------------------|
| `alphanumeric()` | `[[:alnum:]]`  | alphanumeric (`[0-9A-Za-z]`)      |
|  `alphabetic()`  | `[[:alpha:]]`  | alphabetic (`[A-Za-z]`)           |
|    `ascii()`     | `[[:ascii:]]`  | ASCII (`[\x00-\x7F]`)             |
|    `blank()`     | `[[:blank:]]`  | blank (`[\t ]`)                   |
|   `control()`    | `[[:cntrl:]]`  | control (`[\x00-\x1F\x7F]`)       |
|    `digit()`     | `[[:digit:]]`  | digits (`[0-9]`)                  |
|  `graphical()`   | `[[:graph:]]`  | graphical (`[!-~]`)               |
|  `uppercase()`   | `[[:lower:]]`  | lower case (`[a-z]`)              |
|  `printable()`   | `[[:print:]]`  | printable (`[ -~]`)               |
| `punctuation()`  | `[[:punct:]]`  | punctuation (``[!-/:-@\[-`{-~]``) |
|  `whitespace()`  | `[[:space:]]`  | whitespace (`[\t\n\v\f\r ]`)      |
|  `lowercase()`   | `[[:upper:]]`  | upper case (`[A-Z]`)              |
|     `word()`     |  `[[:word:]]`  | word characters (`[0-9A-Za-z_]`)  |
|   `hexdigit()`   | `[[:xdigit:]]` | hex digit (`[0-9A-Fa-f]`)         |

## Repetitions

|       Implemented?        | Expression | Description                                  |
|:-------------------------:|:----------:|:---------------------------------------------|
|     `zero_or_more(x)`     |    `x*`    | zero or more of x (greedy)                   |
|     `one_or_more(x)`      |    `x+`    | one or more of x (greedy)                    |
|     `zero_or_one(x)`      |    `x?`    | zero or one of x (greedy)                    |
|     `zero_or_more(x)`     |   `x*?`    | zero or more of x (ungreedy/lazy)            |
|  `one_or_more(x).lazy()`  |   `x+?`    | one or more of x (ungreedy/lazy)             |
| `zero_or_more(x).lazy()`  |   `x??`    | zero or one of x (ungreedy/lazy)             |
|    `between(n, m, x)`     |  `x{n,m}`  | at least n x and at most m x (greedy)        |
|     `at_least(n, x)`      |  `x{n,}`   | at least n x (greedy)                        |
|      `exactly(n, x)`      |   `x{n}`   | exactly n x                                  |
| `between(n, m, x).lazy()` | `x{n,m}?`  | at least n x and at most m x (ungreedy/lazy) |
|  `at_least(n, x).lazy()`  |  `x{n,}?`  | at least n x (ungreedy/lazy)                 |

## Composites

| Implemented? | Expression | Description                     |
|:------------:|:----------:|:--------------------------------|
|      `+`     |  `xy`      | concatenation (x followed by y) |
|    `or()`    |    `x\|y`  | alternation (x or y, prefer x)  |

## Empty matches

|     Implemented?      | Expression | Description                                                         |
|:---------------------:|:----------:|:--------------------------------------------------------------------|
|     `beginning()`     |    `^`     | the beginning of text (or start-of-line with multi-line mode)       |
|        `end()`        |    `$`     | the end of text (or end-of-line with multi-line mode)               |
| `beginning_of_text()` |    `\A`    | only the beginning of text (even with multi-line mode enabled)      |
|    `end_of_text()`    |    `\z`    | only the end of text (even with multi-line mode enabled)            |
|   `word_boundary()`   |    `\b`    | a Unicode word boundary (\w on one side and \W, \A, or \z on other) |
| `non_word_boundary()` |    `\B`    | not a Unicode word boundary                                         |

## Groupings 

|                   Implemented?                    |   Expression    | Description                                             |
|:-------------------------------------------------:|:---------------:|:--------------------------------------------------------|
|                  `capture(exp)`                   |     `(exp)`     | numbered capture group (indexed by opening parenthesis) |
|            `named_capture(exp, name)`             | `(?P<name>exp)` | named (also numbered) capture group                     |
| Handled implicitly through functional composition |    `(?:exp)`    | non-capturing group                                     |
|                     See below                     |   `(?flags)`    | set flags within current group                          |
|                     See below                     | `(?flags:exp)`  | set flags for exp (non-capturing)                       |
   
## Flags 
    
|            Implemented?             | Expression | Description                                                   |
|:-----------------------------------:|:----------:|:--------------------------------------------------------------|
|       `case_insensitive(exp)`       |    `i`     | case-insensitive: letters match both upper and lower case     |
|       `multi_line_mode(exp)`        |    `m`     | multi-line mode: `^` and `$` match begin/end of line          |
|   `dot_matches_newline_too(exp)`    |    `s`     | allow `.` to match `\n`                                       |
| will not be implemented<sup>1</sup> |    `U`     | swap the meaning of `x*` and `x*?`                            |
|       `disable_unicode(exp)`        |    `u`     | Unicode support (enabled by default)                          |
| will not be implemented<sup>2</sup> |    `x`     | ignore whitespace and allow line comments (starting with `#`) |

1. With the declarative nature of this library, use of this flag would just obfuscate meaning.
2. When using `human_regex`, comments should be added in source code rather than in the regex string.