Crates.io | packed-char |
lib.rs | packed-char |
version | 0.1.2 |
source | src |
created_at | 2024-03-19 01:28:54.987797 |
updated_at | 2024-09-05 21:25:51.90401 |
description | Stores a char or a 22-bit integer in 32 bits |
homepage | |
repository | https://github.com/tim-harding/packed-char |
max_upload_size | |
id | 1178657 |
size | 12,151 |
Allows either a char
or a 22-bit integer to be stored in 32 bits, the same
size as a char
.
packed-char
takes advantage of the valid ranges for a char
to determine what
type of data is stored. These ranges are 0..0xD800
and 0xDFFF..0x10FFFF
(see
the documentation for
char
). The range
0xD800..=0xDFFF
contains surrogate code points, which are not valid UTF-8
characters. char
s are stored unmodified. To store a u22
without overlapping
valid char
ranges, it is first split it into two 11-bit chunks. The left chunk
is stored in the leading bits, which char
s never overlap with. The right chunk
is stored in the trailing bits, which do overlap the bits used by char
s. To
make this work, take note of the bit pattern in the surrogate range:
1101100000000000 // Start
1101111111111111 // End
^^^^^
The leading 5 bits are constant in this range. Referred to here as the surrogate
mask, they serve as a signature for u22
values. They are set along with the
left and right 11-bit chunks:
11111111111 00000 11011 11111111111
left chunk | unused | surrogate mask | right chunk
Now we have two cases:
char::MAX
.Thus, char
and u22
values are disambiguated.