icu_normalizer

Crates.io	icu_normalizer
lib.rs	icu_normalizer
version	2.0.0
created_at	2021-04-29 20:11:06.762323+00
updated_at	2025-05-07 20:58:37.983516+00
description	API for normalizing text into Unicode Normalization Forms
homepage	https://icu4x.unicode.org
repository	https://github.com/unicode-org/icu4x
max_upload_size
id	391255
size	347,832

icu4x-release (github:unicode-org:icu4x-release)

documentation

README

icu_normalizer

Normalizing text into Unicode Normalization Forms.

This module is published as its own crate (icu_normalizer) and as part of the icu crate. See the latter for more details on the ICU4X project.

Functionality

The top level of the crate provides normalization of input into the four normalization forms defined in UAX #15: Unicode Normalization Forms: NFC, NFD, NFKC, and NFKD.

Three kinds of contiguous inputs are supported: known-well-formed UTF-8 (&str), potentially-not-well-formed UTF-8, and potentially-not-well-formed UTF-16. Additionally, an iterator over char can be wrapped in a normalizing iterator.

The uts46 module provides the combination of mapping and normalization operations for UTS #46: Unicode IDNA Compatibility Processing. This functionality is not meant to be used by applications directly. Instead, it is meant as a building block for a full implementation of UTS #46, such as the idna crate.

The properties module provides the non-recursive canonical decomposition operation on a per char basis and the canonical compositon operation given two chars. It also provides access to the Canonical Combining Class property. These operations are primarily meant for HarfBuzz via the icu_harfbuzz crate.

Notably, this normalizer does not provide the normalization “quick check” that can result in “maybe” in addition to “yes” and “no”. The normalization checks provided by this crate always give a definitive non-“maybe” answer.

Examples

let nfc = icu_normalizer::ComposingNormalizerBorrowed::new_nfc();
assert_eq!(nfc.normalize("a\u{0308}"), "ä");
assert!(nfc.is_normalized("ä"));

let nfd = icu_normalizer::DecomposingNormalizerBorrowed::new_nfd();
assert_eq!(nfd.normalize("ä"), "a\u{0308}");
assert!(!nfd.is_normalized("ä"));

More Information

For more information on development, authorship, contributing etc. please visit ICU4X home page.

Commit count: 4409

icu_normalizer

documentation

README

icu_normalizer

Functionality

Examples

More Information

cargo fmt