| Crates.io | yekdast |
| lib.rs | yekdast |
| version | 0.1.0 |
| created_at | 2025-08-19 22:53:51.836565+00 |
| updated_at | 2025-08-19 22:53:51.836565+00 |
| description | A utility library for normalizing and cleaning up Persian (Farsi) text. |
| homepage | |
| repository | https://github.com/Null-Err0r/yekdast |
| max_upload_size | |
| id | 1802631 |
| size | 30,050 |
A fast, configurable, and modern Rust library for normalizing and preprocessing Persian (Farsi) text.
Yekdast is a powerful tool for cleaning up messy Persian text data, preparing it for subsequent steps like search, analysis, or display in your applications. The name "Yekdast" (یکدست) means "uniform" or "consistent" in Persian.
To use Yekdast in your project, add the following line to your Cargo.toml file:
[dependencies]
yekdast = "0.1.0" # Please replace with the latest version
Using the library with its default settings is straightforward.
use yekdast::{normalize_text, NormalizeOptions};
fn main() {
let messy_text = "سلام, من يك برنامه نويس هستم و در كتاب خانه كار مي كنم.";
// Use the default normalization options
let options = NormalizeOptions::default();
let clean_text = normalize_text(messy_text, &options);
println!("Original: {}", messy_text);
println!("Normalized: {}", clean_text);
// Output: Normalized: سلام، من یک برنامه نویس هستم و در کتابخانه کار میکنم.
}
The real power of Yekdast lies in its configurability. You can control every aspect of the normalization process.
use yekdast::{normalize_text, NormalizeOptions, DigitPolicy};
use std::collections::HashMap;
fn main() {
let text = "من توی خونه شماره 123 کار میکنم و علاقه مند به برنامه نویسی هستم. میباشد.";
// 1. Define a custom slang-to-formal dictionary
let mut slang_map = HashMap::new();
slang_map.insert("توی".to_string(), "در".to_string());
slang_map.insert("خونه".to_string(), "خانه".to_string());
// 2. Define a list of compound words for ZWNJ insertion
let zwnj_words = vec![
"علاقه مند".to_string(),
"کار میکنم".to_string(),
];
// 3. Define custom, high-priority replacement rules
let custom_rules = vec![
("میباشد.".to_string(), "است.".to_string()),
];
// 4. Construct the final options
let options = NormalizeOptions {
digits: DigitPolicy::Fa, // Convert all digits to Persian
slang_map,
zwnj_compound_words: zwnj_words,
custom_rules,
..Default::default()
};
let clean_text = normalize_text(text, &options);
println!("{}", clean_text);
// Output: من در خانه شماره ۱۲۳ کارمیکنم و علاقهمند به برنامه نویسی هستم. است.
}
می/نمی), suffixes (ها/تر/ترین), and custom compound wordsContributions are welcome! Please feel free to open an issue to report a bug or suggest a feature, or submit a pull request.
This project is licensed under the MIT License.