| Crates.io | mlmorph |
| lib.rs | mlmorph |
| version | 1.4.1 |
| created_at | 2025-08-22 06:04:57.644714+00 |
| updated_at | 2025-08-23 09:28:21.818136+00 |
| description | Malayalam morphological analyzer |
| homepage | |
| repository | |
| max_upload_size | |
| id | 1805952 |
| size | 18,664,309 |
A Rust implementation of the Malayalam Morphological Analyzer using Finite State Transducer technology.
mlmorph is a Rust port of the mlmorph Malayalam morphological analyzer and generator. It provides fast and efficient morphological analysis and generation for Malayalam text using Finite State Transducers (FST) built with the Stuttgart Finite State Toolkit (SFST).
This library can:
Add this to your Cargo.toml:
[dependencies]
mlmorph = "0.1.0"
cargo install mlmorph
Or build from source:
git clone https://github.com/smc/mlmorph
cd mlmorph/rust
cargo build --release
use mlmorph::Analyser;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let analyser = Analyser::new()?;
// Analyze a Malayalam word
let results = analyser.analyse("കേരളത്തിന്റെ", true, true)?;
for (analysis, weight) in results {
println!("Analysis: {} (weight: {})", analysis, weight);
}
Ok(())
}
use mlmorph::Generator;
fn main() -> Result<(), Box<dyn std::error::Error>> {
let generator = Generator::new()?;
// Generate word forms from morphological description
let results = generator.generate("കേരളം<np><genitive>", true)?;
for (word, weight) in results {
println!("Generated: {} (weight: {})", word, weight);
}
Ok(())
}
use mlmorph::check_foreign_word;
fn main() {
let word = "computer";
let is_foreign = check_foreign_word(word);
if is_foreign == 1 {
println!("{} is a foreign word", word);
} else {
println!("{} is a Malayalam word", word);
}
}
use mlmorph::normalize;
fn main() {
let text = "ണ്";
let normalized = normalize(text);
println!("Normalized: {}", normalized); // Output: "ൺ"
}
The CLI tool provides the same functionality as the Python version:
# Analyze words from stdin
echo "കേരളത്തിന്റെ" | mlmorph --analyse
# Analyze words from a file
mlmorph --analyse --input words.txt
# Output format: word<tab>analysis<tab>weight
കേരളത്തിന്റെ കേരളം<np><genitive> 179
# Generate words from morphological descriptions
echo "കേരളം<np><genitive>" | mlmorph --generate
# Output format: input<tab>generated_word<tab>weight
കേരളം<np><genitive> കേരളത്തിന്റെ 179
# Check if words are foreign
echo -e "കേരളം\ncomputer" | mlmorph --foreign
# Output format: word<tab>is_foreign (1=foreign, 0=Malayalam)
കേരളം 0
computer 1
mlmorph --help
A Malayalam morphological analyser and generator
Usage: mlmorph [OPTIONS]
Options:
-i, --input <INFILE> Source of analysis data
-a, --analyse Analyse the input file strings
-g, --generate Generate the input file strings
-f, --foreign Check if the word is foreign word or not
-v, --verbose Print verbosely while processing
-h, --help Print help
-V, --version Print version
// Analysis result: (analysis_string, weight)
pub type AnalysisResult = (String, i32);
// Generation result: (generated_word, weight)
pub type GenerationResult = (String, i32);
// Individual morpheme
pub struct Morpheme {
pub root: String,
pub pos: Vec<String>,
}
// Parsed analysis structure
pub struct ParsedAnalysis {
pub morphemes: Vec<Morpheme>,
pub weight: i32,
}
impl Analyser {
// Create a new analyser instance
pub fn new() -> Result<Self, Box<dyn std::error::Error>>;
// Analyze a word
pub fn analyse(
&self,
word: &str,
weighted: bool,
foreign_word_check: bool
) -> Result<Vec<AnalysisResult>, Box<dyn std::error::Error>>;
// Parse analysis string into structured data
pub fn parse_analysis(analysis: &str) -> Result<ParsedAnalysis, Box<dyn std::error::Error>>;
}
impl Generator {
// Create a new generator instance
pub fn new() -> Result<Self, Box<dyn std::error::Error>>;
// Generate word forms from morphological description
pub fn generate(
&self,
token: &str,
weighted: bool
) -> Result<Vec<GenerationResult>, Box<dyn std::error::Error>>;
}
// Normalize Malayalam text
pub fn normalize(text: &str) -> String;
// Check if a word is foreign (returns 1 for foreign, 0 for Malayalam)
pub fn check_foreign_word(word: &str) -> i32;
data/malayalam.a)The Rust implementation provides significant performance improvements over the Python version:
This Rust implementation maintains API compatibility with the Python version where possible, making it easy to migrate existing applications.
Contributions are welcome! Please see the main mlmorph project for contribution guidelines.
This project is licensed under the MIT License - see the LICENSE file for details.
If you use this library in academic work, please cite:
@inproceedings{thottingal-2019-finite,
title = "Finite State Transducer based Morphology analysis for {M}alayalam Language",
author = "Thottingal, Santhosh",
booktitle = "Proceedings of the 2nd Workshop on Technologies for MT of Low Resource Languages",
month = "20 " # aug,
year = "2019",
address = "Dublin, Ireland",
publisher = "European Association for Machine Translation",
url = "https://www.aclweb.org/anthology/W19-6801",
pages = "1--5",
}