dec_from_char

Crates.iodec_from_char
lib.rsdec_from_char
version0.2.0
created_at2025-07-06 15:56:51.157691+00
updated_at2025-07-10 11:34:58.328147+00
descriptionSmall library for converting unicode decimal into numbers
homepage
repositoryhttps://github.com/vloldik/dec_from_char
max_upload_size
id1740214
size32,051
Vladislav (vloldik)

documentation

https://docs.rs/dec_from_char

README

Extended Decimal

Crates.io Docs.rs License: MIT

A tiny, zero-cost Rust library to correctly parse any Unicode decimal digit.

Ever needed to parse a number from a string, but it might contain digits from other languages like (Devanagari nine) or ٣ (Arabic-Indic three)? The standard char::to_digit in Rust only handles ASCII digits well. This crate extends that power to all Unicode characters in the "Decimal Number (Nd)" category.

Features

  • Blazing Fast: All Unicode mappings are resolved at compile-time into a highly efficient match statement. This means converting a character at runtime is a zero-cost abstraction with no overhead.
  • Simple API: Provides a straightforward extension trait, DecimalExtended, for the char type. If you know how to use Rust, you already know how to use this.
  • Self-Contained: The necessary Unicode data is bundled into the crate, so you don't need to worry about external files or runtime downloads.
  • Comprehensive: Correctly identifies and converts all decimal digits across various scripts as defined by the Unicode Standard.

Quick Start

  1. Add dec_from_char to your Cargo.toml:

    [dependencies]
    dec_from_char = "0.2.0" # Replace with the latest version
    
  2. Use the DecimalExtended trait to convert characters.

    use dec_from_char::DecimalExtended;
    
    fn main() {
        // Works for common ASCII digits
        assert_eq!('7'.to_decimal_utf8(), Some(7));
    
        // And for a wide range of other Unicode digits!
        assert_eq!('९'.to_decimal_utf8(), Some(9)); // Devanagari
        assert_eq!('०'.to_decimal_utf8(), Some(0)); // Devanagari
        assert_eq!('7'.to_decimal_utf8(), Some(7)); // Fullwidth
        assert_eq!('٣'.to_decimal_utf8(), Some(3)); // Extended Arabic-Indic
    
        // It gracefully returns None for non-digit characters
        assert_eq!('a'.to_decimal_utf8(), None);
        assert_eq!('🎉'.to_decimal_utf8(), None);
    
        // Normalization
        assert_eq!('٣'.normalize_decimal(), Some('3'));
        assert_eq!('7'.normalize_decimal(), Some('7'));
        assert_eq!('🎉'.normalize_decimal(), None);
    }
    

Example: Parsing Numbers from a Mixed-Script String

This crate makes it trivial to extract numbers from text, no matter how they are formatted.

use dec_from_char::DecimalExtended;

let messy_string = "Phone number: (0)𝟗𝟖-𝟳𝟲𝟱 and pin: ٣-١-٤-١";

let digits: String = messy_string.chars()
    .filter_map(|c| c.normalize_decimal()) // Convert each char to a digit if possible
    .collect();

assert_eq!(digits, "0987653141");

// you can do the same with `normalize_decimals_filtering`
assert_eq!(normalize_decimals_filtering(messy_string) "0987653141");
// or you can normalize digits keeping rest chars
assert_eq!(normalize_decimals(messy_string), "Phone number: (0)98-765 and pin: 3-1-4-1");
println!("Extracted digits: {}", digits); // "0987653141"

How It Works

This crate contains two main parts:

  1. A procedural macro that reads the official UnicodeData.txt file at compile time.
  2. An extension trait that uses the code generated by this macro.

When you compile your project, the macro scans the Unicode data file for every character that is a decimal digit (category Nd). It then generates a massive, but hyper-efficient, match statement that maps each of these characters to its u8 value (0-9).

This generated code is then compiled directly into your binary. The result? At runtime, calling .to_decimal_utf8() is as fast as it gets, with no searching, parsing, or hashmaps involved.

API

The crate exposes a single trait:

pub trait DecimalExtended

  • fn to_decimal_utf8(&self) -> Option<u8>: Converts any decimal Unicode digit in the Nd category to a u8. Returns None if the character is not a decimal digit.
  • fn is_decimal_utf8(&self) -> bool: A convenience method that returns true if the character is a decimal digit.

License

This project is licensed under either of

at your option.

Contributing

Contributions, issues, and feature requests are welcome! Feel free to check the issues page.

Commit count: 0

cargo fmt