[![crates.io](https://img.shields.io/crates/v/gnurx-sys.svg)](https://crates.io/crates/gnurx-sys) [![docs.rs](https://docs.rs/gnurx-sys/badge.svg)](https://docs.rs/gnurx-sys) [![license](https://img.shields.io/github/license/koutheir/gnurx-sys?color=black)](https://raw.githubusercontent.com/koutheir/gnurx-sys/master/LICENSE.txt) # `gnurx-sys`: Unsafe Rust bindings for `libgnurx` This is the regex functionality from `glibc` extracted into a separate library, for Win32. See the [`README`](libgnurx/README) of the C library. [`regcomp()`], [`regexec()`], [`regerror()`] and [`regfree()`] are POSIX regex functions. They conform to `POSIX.1-2001` and `POSIX.1-2008` standards. They are defined as follows: ```rust extern "C" { pub fn regcomp( preg: *mut regex_t, pattern: *const c_char, cflags: c_int, ) -> c_int; pub fn regexec( preg: *const regex_t, string: *const c_char, nmatch: usize, pmatch: *mut regmatch_t, eflags: c_int, ) -> c_int; pub fn regerror( errcode: c_int, preg: *const regex_t, errbuf: *mut c_char, errbuf_size: usize, ) -> usize; pub fn regfree(preg: *mut regex_t); } ``` [`regcomp()`]: fn.regcomp.html [`regexec()`]: fn.regexec.html [`regerror()`]: fn.regerror.html [`regfree()`]: fn.regfree.html ## POSIX regex compiling `regcomp()` is used to compile a regular expression into a form that is suitable for subsequent `regexec()` searches. `regcomp()` is supplied with `preg`, a pointer to a pattern buffer storage area; `pattern`, a pointer to the null-terminated string and `cflags`, flags used to determine the type of compilation. All regular expression searching must be done via a compiled pattern buffer, thus `regexec()` must always be supplied with the address of a `regcomp()` initialized pattern buffer. `cflags` may be the bitwise-or of zero or more of the following: - `REG_EXTENDED`: Use POSIX Extended Regular Expression syntax when interpreting regex. If not set, POSIX Basic Regular Expression syntax is used. - `REG_ICASE`: Do not differentiate case. Subsequent `regexec()` searches using this pattern buffer will be case insensitive. - `REG_NOSUB`: Do not report position of matches. The nmatch and pmatch arguments to `regexec()` are ignored if the pattern buffer supplied was compiled with this flag set. - `REG_NEWLINE`: Match-any-character operators don't match a newline. - A nonmatching list (`[^...]`) not containing a newline does not match a newline. - Match-beginning-of-line operator (`^`) matches the empty string immediately after a newline, regardless of whether `eflags`, the execution flags of `regexec()`, contains `REG_NOTBOL`. - Match-end-of-line operator (`$`) matches the empty string immediately before a newline, regardless of whether `eflags` contains `REG_NOTEOL`. ## POSIX regex matching `regexec()` is used to match a null-terminated string against the precompiled pattern buffer, `preg`. `nmatch` and `pmatch` are used to provide information regarding the location of any matches. `eflags` may be the bitwise-or of one or both of `REG_NOTBOL` and `REG_NOTEOL` which cause changes in matching behavior described below: - `REG_NOTBOL`: The match-beginning-of-line operator always fails to match (but see the compilation flag `REG_NEWLINE` above). This flag may be used when different portions of a string are passed to `regexec()` and the beginning of the string should not be interpreted as the beginning of the line. - `REG_NOTEOL`: The match-end-of-line operator always fails to match (but see the compilation flag `REG_NEWLINE` above). ## Byte offsets Unless `REG_NOSUB` was set for the compilation of the pattern buffer, it is possible to obtain match addressing information. `pmatch` must be dimensioned to have at least `nmatch` elements. These are filled in by `regexec()` with substring match addresses. The offsets of the subexpression starting at the `i`th open parenthesis are stored in `pmatch[i]`. The entire regular expression's match addresses are stored in `pmatch[0]`. (Note that to return the offsets of `N` subexpression matches, `nmatch` must be at least `N + 1`.) Any unused structure elements will contain the value `-1`. The [`regmatch_t`] structure which is the type of `pmatch` is defined as: ```rust pub struct regmatch_t { pub rm_so: regoff_t, pub rm_eo: regoff_t, } ``` Each `rm_so` element that is not `-1` indicates the start offset of the next largest substring match within the string. The relative `rm_eo` element indicates the end offset of the match, which is the offset of the first character after the matching text. [`regmatch_t`]: struct.regmatch_t.html ## POSIX error reporting `regerror()` is used to turn the error codes that can be returned by both `regcomp()` and `regexec()` into error message strings. `regerror()` is passed the error code, `errcode`, the pattern buffer, `preg`, a pointer to a character string buffer, `errbuf`, and the size of the string buffer, `errbuf_size`. It returns the size of the `errbuf` required to contain the null-terminated error message string. If both `errbuf` and `errbuf_size` are nonzero, `errbuf` is filled in with the first `errbuf_size - 1` characters of the error message and a terminating null byte (`\0`). ## POSIX pattern buffer freeing Supplying `regfree()` with a precompiled pattern buffer, `preg` will free the memory allocated to the pattern buffer by the compiling process, `regcomp()`. ## Return values and errors `regcomp()` returns zero for a successful compilation. On failure, it returns one of the following errors (see [`reg_errcode_t`]): - `REG_BADBR`: Invalid use of back reference operator. - `REG_BADPAT`: Invalid use of pattern operators such as group or list. - `REG_BADRPT`: Invalid use of repetition operators such as using `*` as the first character. - `REG_EBRACE`: Un-matched brace interval operators. - `REG_EBRACK`: Un-matched bracket list operators. - `REG_ECOLLATE`: Invalid collating element. - `REG_ECTYPE`: Unknown character class name. - `REG_EEND`: Nonspecific error. This is not defined by POSIX.2. - `REG_EESCAPE`: Trailing backslash. - `REG_EPAREN`: Un-matched parenthesis group operators. - `REG_ERANGE`: Invalid use of the range operator; for example, the ending point of the range occurs prior to the starting point. - `REG_ESIZE`: Compiled regular expression requires a pattern buffer larger than 64Kb. This is not defined by POSIX.2. - `REG_ESPACE`: The regex routines ran out of memory. - `REG_ESUBREG`: Invalid back reference to a subexpression. `regexec()` returns zero for a successful match or `REG_NOMATCH` for failure. [`reg_errcode_t`]: reg_errcode_t/index.html ## Thread safety - `regcomp()` and `regexec()` are thread-safe only if the process locale is not modified during the call. - `regerror()` is thread-safe only if the process environment is not modified during the call. - `regfree()` is thread-safe. ## Supported environment variables This crate depends on some environment variables, and *variants* of those. For each environment variable (*e.g.,* `CC`), the following are the accepted variants of it: - `_`, *e.g.,* `CC_x86_64-pc-windows-gnu`. - `_`, *e.g.,* `CC_x86_64_pc_windows_gnu`. - `TARGET_`, *e.g.,* `TARGET_CC`. - ``, *e.g.,* `CC`. The following environment variables (and their variants) affect how this crate is built: - `GNURX_LIB_DIR_PREFIX` - `SYSROOT` - `CC` - `CFLAGS` - `AR` - `ARFLAGS` ## Linking options By default, this crate builds `libgnurx` from sources and links statically against it. In order to change this behavior, and instruct this crate to dynamically link against an externally built `libgnurx-0.dll` library, please define the environment variable `GNURX_LIB_DIR_PREFIX` (or any of its variants) when building. The value of `GNURX_LIB_DIR_PREFIX` needs to be the absolute prefix path where the library is installed. The `libgnurx` header files are expected to reside in `/include/`, and the shared library should reside in `/bin/`. ## Depending on this crate This crate provides the following variables to other crates that depend on it: - `DEP_GNURX_INCLUDE`: Path of the directory where library C header files reside. - `DEP_GNURX_LIB`: Path of the directory where the library binary resides. ## Platform-specific notes This crate supports only the following target platforms: - `x86_64-pc-windows-gnu`. - `i686-pc-windows-gnu`. This is due to the nature of the `libgnurx` library. ## Versioning This project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html). The `CHANGELOG.md` file details notable changes over time. ## License Copyright (c) 2020-2023 Koutheir Attouchi. See the `LICENSE.txt` file at the top-level directory of this distribution. Licensed under the **LGPL version 2.1 license, or any later version thereof**. This file may not be copied, modified, or distributed except according to those terms.