| Crates.io | rfc9839 |
| lib.rs | rfc9839 |
| version | 0.2.0 |
| created_at | 2025-08-23 15:34:21.987554+00 |
| updated_at | 2025-08-23 16:50:48.79033+00 |
| description | Implementation of the RFC 9839 specification |
| homepage | |
| repository | https://github.com/ryanfowler/rfc9839-rs |
| max_upload_size | |
| id | 1807611 |
| size | 36,843 |
Validation of RFC 9839 Unicode subsets in Rust.
RFC 9839 defines three nested subsets of Unicode characters for use in text protocols:
char is already a scalar value; checks are included for completeness and for raw byte validation.{ TAB, LF, CR } ∪ [0x20–0xD7FF] ∪ [0xE000–0xFFFD] ∪ [0x10000–0x10FFFF].
This is the XML “Char” production with legacy controls and noncharacters excluded.is_unicode_scalar_char, is_xml_char, is_unicode_assignable_charis_unicode_scalar, is_xml_chars, is_unicode_assignableis_unicode_scalar_bytes, is_xml_chars_bytes, is_unicode_assignable_byteschars() only after the first non-ASCII byteuse rfc9839::*;
// Scalars (always true for safe Rust strings)
assert!(is_unicode_scalar("hello 🌍"));
// XML Characters
assert!(is_xml_chars("ok\tline\n"));
assert!(!is_xml_chars("\u{0000}")); // NUL is disallowed
// Unicode Assignables
assert!(is_unicode_assignable("emoji 👍"));
assert!(!is_unicode_assignable("\u{007F}")); // DEL is excluded