Crates.io | unicode-intervals |
lib.rs | unicode-intervals |
version | 0.2.0 |
source | src |
created_at | 2023-04-23 17:41:54.17069 |
updated_at | 2023-04-24 22:32:46.474807 |
description | Search for Unicode code points intervals by including/excluding categories, ranges, and custom characters sets. |
homepage | https://github.com/Stranger6667/unicode-intervals |
repository | https://github.com/Stranger6667/unicode-intervals |
max_upload_size | |
id | 846706 |
size | 679,045 |
This library provides a way to search for Unicode code point intervals by categories, ranges, and custom character sets.
The main purpose of unicode-intervals
is to simplify generating strings that matching specific criteria.
[dependencies]
unicode-intervals = "0.1"
The example below will produce code point intervals of uppercase & lowercase letters less than 128 and will include the ☃
character.
use unicode_intervals::UnicodeCategory;
let intervals = unicode_intervals::query()
.include_categories(
UnicodeCategory::UPPERCASE_LETTER |
UnicodeCategory::LOWERCASE_LETTER
)
.max_codepoint(128)
.include_characters("☃")
.intervals()
.expect("Invalid query input");
assert_eq!(intervals, &[(65, 90), (97, 122), (9731, 9731)]);
IntervalSet
for index-like access to the underlying codepoints:
let interval_set = unicode_intervals::query()
.max_codepoint(128)
.interval_set()
.expect("Invalid query input");
// Get 10th codepoint in this interval set
assert_eq!(interval_set.codepoint_at(10), Some('K' as u32));
assert_eq!(interval_set.index_of('K'), Some(10));
Query specific Unicode version:
use unicode_intervals::UnicodeVersion;
let intervals = UnicodeVersion::V11_0_0.query()
.max_codepoint(128)
.include_characters("☃")
.intervals()
.expect("Invalid query input");
assert_eq!(intervals, &[(0, 128), (9731, 9731)]);
Restrict the output to code points within a certain range:
let intervals = unicode_intervals::query()
.min_codepoint(65)
.max_codepoint(128)
.intervals()
.expect("Invalid query input");
assert_eq!(intervals, &[(65, 128)])
Include or exclude specific characters:
use unicode_intervals::UnicodeCategory;
let intervals = unicode_intervals::query()
.include_categories(UnicodeCategory::PARAGRAPH_SEPARATOR)
.include_characters("-123")
.intervals()
.expect("Invalid query input");
assert_eq!(intervals, &[(45, 45), (49, 51), (8233, 8233)])
unicode-intervals
supports Unicode 9.0.0 - 15.0.0.