| Crates.io | arabic_text_utils |
| lib.rs | arabic_text_utils |
| version | 0.1.0 |
| created_at | 2025-03-15 00:26:09.117079+00 |
| updated_at | 2025-03-15 00:26:09.117079+00 |
| description | A Rust library for Arabic text processing and manipulation |
| homepage | |
| repository | https://bdsmovement.net/ |
| max_upload_size | |
| id | 1592890 |
| size | 30,757 |
A comprehensive Rust library for Arabic text processing and manipulation. This crate provides a collection of utilities for working with Arabic text, including character analysis, number conversion, text normalization, and more.
We stand in solidarity with the Palestinian people. To learn more about supporting Palestine, please visit BDS Movement.
Character Analysis
Number Handling
Text Processing
Presentation Forms
Punctuation
Add this to your Cargo.toml:
[dependencies]
arabic_text_utils = "0.1.0"
use arabic_text_utils::{
remove_tashkeel,
normalize_arabic,
convert_numbers_to_arabic,
is_arabic_char,
};
// Remove diacritical marks
let text = "مَرْحَباً بِكُمْ";
assert_eq!(remove_tashkeel(text), "مرحبا بكم");
// Normalize Arabic text
let text = "ﷺ"; // Arabic ligature
let normalized = normalize_arabic(text);
assert_eq!(normalized, "صلى الله عليه وسلم");
// Convert numbers to Arabic
let text = "Page 123";
assert_eq!(convert_numbers_to_arabic(text), "Page ١٢٣");
// Check if a character is Arabic
assert!(is_arabic_char('ع'));
assert!(!is_arabic_char('x'));
For detailed documentation and examples, please visit docs.rs/arabic_text_utils.
is_arabic_char: Detect Arabic charactersget_arabic_char_name: Get Unicode names for Arabic charactersis_haraka: Check for diacritical marksconvert_numbers_to_arabic: Convert Western numerals to Arabicconvert_numbers_from_arabic: Convert Arabic numerals to Westernremove_tashkeel: Strip diacritical marksnormalize_arabic: Standardize Arabic text representationcontains_arabic: Check for Arabic contentcount_arabic_words: Count Arabic words in textextract_arabic_text: Extract Arabic-only contenthas_rtl_characters: Detect right-to-left charactersarabic_character_frequency: Analyze character distributionsegment_words: Split text into wordssegment_sentences: Split text into sentencesgenerate_slug: Create URL-friendly textwrap_text: Wrap text at specified widthsanitize_arabic: Clean and standardize Arabic textnormalize_presentation_forms: Standardize Arabic character formsreplace_ligatures: Handle special character combinationsconvert_punctuation: Convert between Arabic and Latin punctuationContributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.