| Crates.io | uroman |
| lib.rs | uroman |
| version | 0.6.3 |
| created_at | 2025-08-03 16:04:59.296923+00 |
| updated_at | 2025-11-07 11:42:15.8553+00 |
| description | A blazingly fast, self-contained Rust reimplementation of the uroman universal romanizer. |
| homepage | |
| repository | https://github.com/stellanomia/uroman-rs |
| max_upload_size | |
| id | 1779849 |
| size | 4,501,594 |
uroman-rs is a complete rewrite of the original uroman (Universal Romanizer) in Rust. It provides high-speed, accurate romanization for a vast number of languages and writing systems, faithfully reproducing the behavior of the original implementation.
As a reimplementation, it is designed to be a drop-in replacement that passes the original's comprehensive test suite. This means its romanization logic, including its strengths and limitations, is identical to the original. For effective use, we recommend reviewing the original authors' documentation on Reversibility and Known Limitations.
In the same spirit of fidelity, this project respects the licensing of the original uroman software. uroman-rs is licensed under the Apache License 2.0, and includes the original's license as required. For full details, please refer to the License section.
uroman and passes its test suite.str) and structured JSON data (edges, alts, lattice).The uroman-rs project is available as a crate named uroman. You can use it both as a command-line tool and as a library in your Rust projects.
To install the uroman-rs command-line tool, run the following:
cargo install uroman
This will install the executable as uroman-rs on your system.
Add uroman-rs to your project's Cargo.toml.
For library usage, it's recommended to disable default features to avoid pulling in CLI-specific dependencies.
cargo add uroman --no-default-features
uroman-rs can be used directly from your terminal.
Show sample conversions: See examples of how various scripts are romanized.
uroman-rs --sample
View all options:
Display the help message for a full list of commands and flags.
uroman-rs --help
Use in REPL mode:
Run uroman-rs without any arguments to process input line by line. Press Ctrl+D to exit.
$ uroman-rs
>> こんにちは、世界!
konnichiha, shijie!
>> ᚺᚨᛚᛚᛟ ᚹᛟᚱᛚᛞ
hallo world
>> (Ctrl+D)
// Uroman::new() is infallible and does not return a `Result`.
let uroman = Uroman::new();
let romanized_string/*: String*/ = uroman.romanize_string::<rom_format::Str>(
"✨ユーロマン✨",
Some("jpn"),
).to_string();
assert_eq!(romanized_string, "✨yuuroman✨");
println!("{romanized_string}");
For more advanced examples, please see the examples/ directory.
Performance was measured against the original Python implementation using hyperfine.
multi-script.txt from the original uroman repository.| Implementation | Mean Time (± σ) | Relative Performance |
|---|---|---|
uroman-rs (This project) |
82.9 ms ± 2.4 ms | ~27.7x faster |
uroman.py (via uv run) |
2295 ms ± 20 ms | Baseline |
uroman-rs aims to be not only a faithful reimplementation but also a more robust and accurate one. It handles several edge cases that can cause the original uroman.py script to crash or produce incorrect output.
For example, the original script panics on inputs with incomplete fractional patterns like "百分之" ("percent of..."). This occurs because the script expects a subsequent number but does not safely handle cases where one is not found, leading to a NoneType attribute error. This issue has been reported to the original author (see isi-nlp/uroman#16).
$ uv run uroman.py "百分之多少"
Traceback (most recent call last):
...
AttributeError: 'NoneType' object has no attribute 'value'
In contrast, uroman-rs handles this input safely and provides a reasonable fallback romanization, demonstrating its enhanced reliability:
$ uroman-rs "百分之多少"
baifenzhiduoshao
In addition to improving stability, uroman-rs also corrects certain romanization errors found in the original implementation. A notable example is the handling of the Tibetan letter འ (U+0F60, TIBETAN LETTER -A).
The original script incorrectly romanizes this character, which represents the vowel a with a preceding glottal stop [ʔ], by omitting the vowel sound entirely.
# Original uroman.py output omits the 'a' sound
$ uv run uroman.py "འ"
'
uroman-rs provides the linguistically correct romanization, faithfully representing both the glottal stop (as an apostrophe) and the vowel sound. This ensures a higher quality and more accurate transliteration for Tibetan script.
# uroman-rs provides the correct output
$ uroman-rs "འ"
'a
uroman-rs provides a more precise romanization for certain Tibetan characters compared to the original script. The uroman.py implementation fails to distinguish between the glottal stop consonant འ ('a-chung) and the vowel carrier ཨ ('a-chen) when followed by the vowel ེ (e).
The original script produces the same output for both འེ and ཨེ.
# Original uroman.py output is identical for both characters
$ uv run uroman.py "ཨེ"
e
$ uv run uroman.py "འེ"
e
In contrast, uroman-rs correctly preserves the leading glottal stop of འ, maintaining the distinction between the two characters as intended by the script.
# uroman-rs distinguishes the two characters
$ uroman-rs "ཨེ"
e
$ uroman-rs "འེ"
'e
This project is licensed under the Apache License, Version 2.0.
uroman-rs is a Rust implementation of the original uroman software by Ulf Hermjakob. As such, it is a derivative work and includes the original license notice in the NOTICE file.
Please be aware that any academic publication of projects using uroman-rs should acknowledge the use of the original uroman software as specified in its license. For details, please see the NOTICE file.