rfc9839

Crates.iorfc9839
lib.rsrfc9839
version0.2.0
created_at2025-08-23 15:34:21.987554+00
updated_at2025-08-23 16:50:48.79033+00
descriptionImplementation of the RFC 9839 specification
homepage
repositoryhttps://github.com/ryanfowler/rfc9839-rs
max_upload_size
id1807611
size36,843
Ryan Fowler (ryanfowler)

documentation

README

rfc9839

Crates.io Docs.rs License

Validation of RFC 9839 Unicode subsets in Rust.

RFC 9839 defines three nested subsets of Unicode characters for use in text protocols:

  • Unicode Scalars – all code points except UTF-16 surrogates. Every Rust char is already a scalar value; checks are included for completeness and for raw byte validation.
  • XML Characters{ TAB, LF, CR } ∪ [0x20–0xD7FF] ∪ [0xE000–0xFFFD] ∪ [0x10000–0x10FFFF]. This is the XML “Char” production with legacy controls and noncharacters excluded.
  • Unicode Assignables – “not problematic” characters: useful controls, printable ASCII (excluding DEL/C1), and all assigned scalars minus standardized noncharacters (…FFFE/FFFF in each plane and U+FDD0–FDEF).

Features

  • Character-level APIs: is_unicode_scalar_char, is_xml_char, is_unicode_assignable_char
  • String-level APIs: is_unicode_scalar, is_xml_chars, is_unicode_assignable
  • Byte-level APIs: is_unicode_scalar_bytes, is_xml_chars_bytes, is_unicode_assignable_bytes
  • ASCII fast-path: tight loops for ASCII data, falling back to chars() only after the first non-ASCII byte
  • Zero allocations, no lookup tables

Example

use rfc9839::*;

// Scalars (always true for safe Rust strings)
assert!(is_unicode_scalar("hello 🌍"));

// XML Characters
assert!(is_xml_chars("ok\tline\n"));
assert!(!is_xml_chars("\u{0000}")); // NUL is disallowed

// Unicode Assignables
assert!(is_unicode_assignable("emoji 👍"));
assert!(!is_unicode_assignable("\u{007F}")); // DEL is excluded
Commit count: 6

cargo fmt