Crates.io | char-ranges |
lib.rs | char-ranges |
version | 0.1.2 |
source | src |
created_at | 2023-06-04 14:07:33.859411 |
updated_at | 2024-04-01 07:18:02.984774 |
description | Iterate chars and their start and end byte positions |
homepage | |
repository | https://github.com/vallentin/char-ranges |
max_upload_size | |
id | 882200 |
size | 33,902 |
Similar to the standard library's .char_indicies()
, but instead of only
producing the start byte position. This library implements .char_ranges()
,
that produce both the start and end byte positions.
Note that simply using .char_indicies()
and creating a range by mapping the
returned index i
to i..(i + 1)
is not guaranteed to be valid. Given that
some UTF-8 characters can be up to 4 bytes.
Char | Bytes | Range |
---|---|---|
'O' |
1 | 0..1 |
'Ø' |
2 | 0..2 |
'∈' |
3 | 0..3 |
'🌏' |
4 | 0..4 |
Assumes encoded in UTF-8.
The implementation specializes last()
, nth()
, next_back()
,
and nth_back()
. Such that the length of intermediate characters is
not wastefully calculated.
use char_ranges::CharRangesExt;
let text = "Hello 🗻∈🌏";
let mut chars = text.char_ranges();
assert_eq!(chars.as_str(), "Hello 🗻∈🌏");
assert_eq!(chars.next(), Some((0..1, 'H'))); // These chars are 1 byte
assert_eq!(chars.next(), Some((1..2, 'e')));
assert_eq!(chars.next(), Some((2..3, 'l')));
assert_eq!(chars.next(), Some((3..4, 'l')));
assert_eq!(chars.next(), Some((4..5, 'o')));
assert_eq!(chars.next(), Some((5..6, ' ')));
// Get the remaining substring
assert_eq!(chars.as_str(), "🗻∈🌏");
assert_eq!(chars.next(), Some((6..10, '🗻'))); // This char is 4 bytes
assert_eq!(chars.next(), Some((10..13, '∈'))); // This char is 3 bytes
assert_eq!(chars.next(), Some((13..17, '🌏'))); // This char is 4 bytes
assert_eq!(chars.next(), None);
DoubleEndedIterator
CharRanges
also implements DoubleEndedIterator
making it possible to iterate backwards.
use char_ranges::CharRangesExt;
let text = "ABCDE";
let mut chars = text.char_ranges();
assert_eq!(chars.as_str(), "ABCDE");
assert_eq!(chars.next(), Some((0..1, 'A')));
assert_eq!(chars.next_back(), Some((4..5, 'E')));
assert_eq!(chars.as_str(), "BCD");
assert_eq!(chars.next_back(), Some((3..4, 'D')));
assert_eq!(chars.next(), Some((1..2, 'B')));
assert_eq!(chars.as_str(), "C");
assert_eq!(chars.next(), Some((2..3, 'C')));
assert_eq!(chars.as_str(), "");
assert_eq!(chars.next(), None);
If the input text
is a substring of some original text, and the produced
ranges are desired to be offset in relation to the substring. Then instead
of .char_ranges()
use .char_ranges_offset(offset)
or .char_ranges().offset(offset)
.
use char_ranges::CharRangesExt;
let text = "Hello 👋 World 🌏";
let start = 11; // Start index of 'W'
let text = &text[start..]; // "World 🌏"
let mut chars = text.char_ranges_offset(start);
// or
// let mut chars = text.char_ranges().offset(start);
assert_eq!(chars.next(), Some((11..12, 'W'))); // These chars are 1 byte
assert_eq!(chars.next(), Some((12..13, 'o')));
assert_eq!(chars.next(), Some((13..14, 'r')));
assert_eq!(chars.next_back(), Some((17..21, '🌏'))); // This char is 4 bytes