utf8-ranges

Crates.io	utf8-ranges
lib.rs	utf8-ranges
version	1.0.5
created_at	2015-10-16 01:21:55.954778+00
updated_at	2022-04-04 18:56:49.704446+00
description	DEPRECATED. Use regex-syntax::utf8 submodule instead.
homepage	https://github.com/BurntSushi/utf8-ranges
repository	https://github.com/BurntSushi/utf8-ranges
max_upload_size
id	3231
size	25,017

Andrew Gallant (BurntSushi)

documentation

https://docs.rs/utf8-ranges

README

DEPRECATED: This crate has been folded into the regex-syntax and is now deprecated.

utf8-ranges

This crate converts contiguous ranges of Unicode scalar values to UTF-8 byte ranges. This is useful when constructing byte based automata from Unicode. Stated differently, this lets one embed UTF-8 decoding as part of one's automaton.

Dual-licensed under MIT or the UNLICENSE.

Documentation

https://docs.rs/utf8-ranges

Example

This shows how to convert a scalar value range (e.g., the basic multilingual plane) to a sequence of byte based character classes.

extern crate utf8_ranges;

use utf8_ranges::Utf8Sequences;

fn main() {
    for range in Utf8Sequences::new('\u{0}', '\u{FFFF}') {
        println!("{:?}", range);
    }
}

The output:

[0-7F]
[C2-DF][80-BF]
[E0][A0-BF][80-BF]
[E1-EC][80-BF][80-BF]
[ED][80-9F][80-BF]
[EE-EF][80-BF][80-BF]

These ranges can then be used to build an automaton. Namely:

Every arbitrary sequence of bytes matches exactly one of the sequences of ranges or none of them.
Every match sequence of bytes is guaranteed to be valid UTF-8. (Erroneous encodings of surrogate codepoints in UTF-8 cannot match any of the byte ranges above.)

Commit count: 36

utf8-ranges

documentation

README

utf8-ranges

Documentation

Example

cargo fmt