east-asian-width

Crates.ioeast-asian-width
lib.rseast-asian-width
version0.1.0
created_at2025-09-07 17:14:59.860933+00
updated_at2025-09-07 17:14:59.860933+00
descriptionDetermine the display width of Unicode characters in East Asian contexts
homepagehttps://github.com/sabry-awad97/east-asian-width
repositoryhttps://github.com/sabry-awad97/east-asian-width
max_upload_size
id1828340
size129,092
Sabry Awad (sabry-awad97)

documentation

https://docs.rs/east-asian-width

README

East Asian Width

A fast, zero-dependency Rust library for determining the display width of Unicode characters in East Asian contexts. This is essential for terminal applications, text editors, and other software that needs to properly align text containing CJK (Chinese, Japanese, Korean) characters.

Features

  • Fast lookups: Pre-generated lookup tables for O(1) character width determination
  • Unicode compliant: Based on the official Unicode East Asian Width property
  • Flexible API: Support for both simple and configurable width calculations
  • Zero dependencies: No runtime dependencies for the core library
  • Comprehensive: Handles all Unicode East Asian Width categories
  • Safe: Validates Unicode code points and provides fallible APIs

Installation

Add this to your Cargo.toml:

[dependencies]
east-asian-width = "0.1.0"

Quick Start

use east_asian_width::{east_asian_width, DisplayWidth};

// Basic usage - get display width
let width = east_asian_width('字' as u32); // Chinese character
assert_eq!(width, DisplayWidth::Wide);
assert_eq!(width.as_u8(), 2); // Width is 2 columns

// ASCII characters are narrow
let width = east_asian_width('A' as u32);
assert_eq!(width, DisplayWidth::Narrow);
assert_eq!(width.as_u8(), 1); // Width is 1 column

Usage

Basic Width Calculation

The primary function east_asian_width() returns a DisplayWidth enum:

use east_asian_width::{east_asian_width, DisplayWidth};

// Wide characters (CJK ideographs, fullwidth chars, etc.)
assert_eq!(east_asian_width(0x4E00), DisplayWidth::Wide);  // 一 (CJK)
assert_eq!(east_asian_width(0xFF21), DisplayWidth::Wide);  // A (fullwidth A)

// Narrow characters (ASCII, halfwidth, etc.)
assert_eq!(east_asian_width(0x0041), DisplayWidth::Narrow); // A (ASCII)
assert_eq!(east_asian_width(0xFF61), DisplayWidth::Narrow); // 。 (halfwidth)

Handling Ambiguous Characters

Some characters are classified as "ambiguous" and can be displayed as either narrow or wide depending on context:

use east_asian_width::east_asian_width;

let ambiguous_char = 0x00A1; // ¡ (inverted exclamation mark)

// Default: treat ambiguous as narrow
assert_eq!(east_asian_width(ambiguous_char).as_u8(), 1);

// Explicitly treat ambiguous as wide
assert_eq!(east_asian_width((ambiguous_char, true)).as_u8(), 2);

Getting Character Categories

You can also get the specific East Asian Width category:

use east_asian_width::east_asian_width_type;

assert_eq!(east_asian_width_type(0x4E00), "wide");      // CJK ideograph
assert_eq!(east_asian_width_type(0xFF21), "fullwidth"); // Fullwidth A
assert_eq!(east_asian_width_type(0x0041), "narrow");    // ASCII A
assert_eq!(east_asian_width_type(0x00A1), "ambiguous"); // ¡

Error Handling

For applications that need to handle invalid input gracefully:

use east_asian_width::{try_east_asian_width, try_east_asian_width_type};

// These functions return Result<T, EastAsianWidthError>
match try_east_asian_width(0x110000) { // Invalid code point
    Ok(width) => println!("Width: {}", width.as_u8()),
    Err(e) => println!("Error: {}", e),
}

Working with Strings

Calculate the total display width of a string:

use east_asian_width::east_asian_width;

fn string_width(s: &str) -> usize {
    s.chars()
        .map(|c| east_asian_width(c as u32).as_usize())
        .sum()
}

assert_eq!(string_width("Hello"), 5);      // ASCII: 5 chars × 1 = 5
assert_eq!(string_width("こんにちは"), 10);   // Japanese: 5 chars × 2 = 10
assert_eq!(string_width("Hello世界"), 9);   // Mixed: 5×1 + 2×2 = 9

Character Categories

The library recognizes six East Asian Width categories defined by Unicode:

Category Description Display Width Examples
Narrow (Na) Characters that are always narrow 1 Basic Latin letters
Neutral (N) Characters without East Asian context 1 Most symbols, punctuation
Halfwidth (H) Narrow in East Asian context 1 Halfwidth Katakana
Ambiguous (A) Context-dependent width 1 or 2* Greek letters, some symbols
Wide (W) Characters that are always wide 2 CJK ideographs, Hiragana
Fullwidth (F) Wide in East Asian context 2 Fullwidth ASCII variants

*Ambiguous characters default to width 1 but can be configured to width 2.

Performance

This library is optimized for performance:

  • O(1) lookups: Uses pre-generated range checks, not hash tables
  • Zero allocations: All functions work with stack data only
  • Minimal binary size: Lookup tables are efficiently encoded
  • No dependencies: Zero runtime dependencies for core functionality

Unicode Compliance

The library is based on the official Unicode East Asian Width property from the Unicode Character Database. The lookup tables are automatically generated from the latest Unicode data to ensure accuracy and completeness.

Development

Building

# Build the library
cargo build --release

# Run tests
cargo test

# Generate documentation
cargo doc --open

Regenerating Lookup Tables

The lookup tables are generated from the official Unicode data:

# This requires the 'build-deps' feature
make generate
# or
cargo run --bin generate --features build-deps

Development Workflow

# Format, lint, test, and generate docs
make prepare-release

# Quick development check
make quick

License

Licensed under either of

at your option.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

Documentation

Examples

Run the included examples to see the library in action:

# Terminal width calculation example
cargo run --example terminal_width

# Text alignment example
cargo run --example text_alignment

Related Projects

  • unicode-width - Similar functionality with different API design
  • wcswidth - POSIX wcwidth implementation
  • textwrap - Text wrapping with East Asian width support
Commit count: 5

cargo fmt