| Crates.io | enumerated_latin |
| lib.rs | enumerated_latin |
| version | 1.0.0 |
| created_at | 2025-04-18 16:17:45.995217+00 |
| updated_at | 2025-04-18 16:17:45.995217+00 |
| description | Encodes short strings as numeric values |
| homepage | |
| repository | https://codeberg.org/unobtanium/enumerated_latin |
| max_upload_size | |
| id | 1639619 |
| size | 60,386 |
Enumerated Latin is a crate to map strings made of the 26 letters a to z or A to Z (case insensitive) to a continuous space of integers by treating the text like a base26 encoded number plus an end marker.
Example:
use enumerated_latin::EnumeratedLatinEncode;
use enumerated_latin::EnumeratedLatinDecode;
let encoded: u64 = "Example".enumerated_latin_encode().unwrap();
assert_eq!(encoded, 9540966270);
let decoded_again = encoded.enumerated_latin_decode_lowercase().unwrap();
assert_eq!(decoded_again, "example".to_string());
Intended use of this is to generate numeric identifiers for short pieces of text, while still allowing to compare against ranges in fixed-length scenarios.
This arises — for example — when working with ISO-codes for languages, scripts countries etc. preserving the order within the same length helps with efficiently checking against private-use and similar ranges.
Intended area of use is in the backend of applications, where the difference between a string and a number actually matters.
For frontends it is recommended to prefer readability over performance whenever possible.
In short: The string prefixed with a b and then parsed like a most significant first (same order as everyday numbers) base26 number, where a maps to 0 and z to 25.
Example: az would be encoded as baz: (26^2)*1 + (26^1)*0 + (26^0)*25 = 701
use enumerated_latin::EnumeratedLatinEncode;
assert_eq!("az".enumerated_latin_encode(), Ok(701 as u16))
The b at the start is because with a mapping to zero, leading as act like leading 0s in everyday base10 numbers, there is no way from the numeric value to tell how many of them were present. The trailing b ensures, that one can always deduce the original length from the numeric value.
The everyday base10 equivalent to prepending the b would be prepending a 1 i.e. 000 to 1000 and 00 to 100.
This results in the following facts about the encoding:
1a with a value of 26l, the first value is 26^l and the last one is ((26^l)*2)-1).Encoding each letter takes roughly 5 bits of information plus one bit for the end cap, you can use this information to roughly estimate which datatype you'll need.
Valid encoding target types are:
| Type | supported length |
|---|---|
u8 |
1 |
i16 |
2 |
u16 |
3 |
i32 |
6 |
u32 |
6 |
i64 |
13 |
u64 |
13 |
i128 |
26 |
u128 |
26 |
enumerated_latin is licensed as LGPL-3.0-only and REUSE 3.3 compliant.
When contributing add yourself as a copyright holder to the files you modified.