Crates.io | utf8-ranges |
lib.rs | utf8-ranges |
version | 1.0.5 |
source | src |
created_at | 2015-10-16 01:21:55.954778 |
updated_at | 2022-04-04 18:56:49.704446 |
description | DEPRECATED. Use regex-syntax::utf8 submodule instead. |
homepage | https://github.com/BurntSushi/utf8-ranges |
repository | https://github.com/BurntSushi/utf8-ranges |
max_upload_size | |
id | 3231 |
size | 25,017 |
DEPRECATED: This crate has been folded into the
regex-syntax
and is now deprecated.
This crate converts contiguous ranges of Unicode scalar values to UTF-8 byte ranges. This is useful when constructing byte based automata from Unicode. Stated differently, this lets one embed UTF-8 decoding as part of one's automaton.
Dual-licensed under MIT or the UNLICENSE.
This shows how to convert a scalar value range (e.g., the basic multilingual plane) to a sequence of byte based character classes.
extern crate utf8_ranges;
use utf8_ranges::Utf8Sequences;
fn main() {
for range in Utf8Sequences::new('\u{0}', '\u{FFFF}') {
println!("{:?}", range);
}
}
The output:
[0-7F]
[C2-DF][80-BF]
[E0][A0-BF][80-BF]
[E1-EC][80-BF][80-BF]
[ED][80-9F][80-BF]
[EE-EF][80-BF][80-BF]
These ranges can then be used to build an automaton. Namely: