| Crates.io | yosina |
| lib.rs | yosina |
| version | 1.0.0 |
| created_at | 2025-08-19 20:02:43.175463+00 |
| updated_at | 2025-09-24 05:37:42.193538+00 |
| description | Japanese text transliteration library |
| homepage | https://github.com/yosina-lib/yosina |
| repository | https://github.com/yosina-lib/yosina |
| max_upload_size | |
| id | 1802377 |
| size | 916,430 |
A Rust port of the Yosina Japanese text transliteration library.
Yosina is a library for Japanese text transliteration that provides various text normalization and conversion features commonly needed when processing Japanese text.
Add this to your Cargo.toml:
[dependencies]
yosina = "1.0.0"
use yosina::{make_transliterator, TransliterationRecipe};
use yosina::recipes::{ReplaceCircledOrSquaredCharactersOptions, ToFullWidthOptions};
// Create a recipe with desired transformations
let recipe = TransliterationRecipe {
kanji_old_new: true,
replace_spaces: true,
replace_suspicious_hyphens_to_prolonged_sound_marks: true,
replace_circled_or_squared_characters: ReplaceCircledOrSquaredCharactersOptions::Yes {
exclude_emojis: false,
},
replace_combined_characters: true,
to_fullwidth: ToFullWidthOptions::Yes {
u005c_as_yen_sign: false,
},
..Default::default()
};
// Create the transliterator
let transliterator = make_transliterator(&recipe).unwrap();
// Use it with various special characters
let input = "①②③ ⒶⒷⒸ ㍿㍑㌠㋿"; // circled numbers, letters, space, combined characters
let result = transliterator(input).unwrap();
println!("{}", result); // "(1)(2)(3) (A)(B)(C) 株式会社リットルサンチーム令和"
// Convert old kanji to new
let old_kanji = "舊字體";
let result = transliterator(old_kanji).unwrap();
println!("{}", result); // "旧字体"
// Convert half-width katakana to full-width
let half_width = "テストモジレツ";
let result = transliterator(half_width).unwrap();
println!("{}", result); // "テストモジレツ"
use yosina::{make_transliterator, TransliteratorConfig::*};
// Configure with direct transliterator configs
let configs = vec![
KanjiOldNew,
Spaces,
ProlongedSoundMarks(Default::default()),
CircledOrSquared(Default::default()),
Combined,
];
let transliterator = make_transliterator(&configs).unwrap();
let result = transliterator("some japanese text").unwrap();
println!("{}", result); // Processed text with transformations applied
circled-or-squared)Converts circled or squared characters to their plain equivalents.
templates (custom rendering), includeEmojis (include emoji characters)①②③ → (1)(2)(3), ㊙㊗ → (秘)(祝)combined)Expands combined characters into their individual character sequences.
㍻ (Heisei era) → 平成, ㈱ → (株)hira-kata-composition)Combines decomposed hiraganas and katakanas into composed equivalents.
composeNonCombiningMarks (compose non-combining marks)か + ゙ → が, ヘ + ゜ → ペhira-kata)Converts between hiragana and katakana scripts bidirectionally.
mode ("hira-to-kata" or "kata-to-hira")ひらがな → ヒラガナ (hira-to-kata)hyphens)Replaces various dash/hyphen symbols with common ones used in Japanese.
precedence (mapping priority order)2019—2020 (em dash) → 2019-2020ideographic-annotations)Replaces ideographic annotations used in traditional Chinese-to-Japanese translation.
㆖㆘ → 上下ivs-svs-base)Handles Ideographic and Standardized Variation Selectors.
charset, mode ("ivs-or-svs" or "base"), preferSVS, dropSelectorsAltogether葛󠄀 (葛 + IVS) → 葛japanese-iteration-marks)Expands iteration marks by repeating the preceding character.
時々 → 時時, いすゞ → いすずjisx0201-and-alike)Handles half-width/full-width character conversion.
fullwidthToHalfwidth, convertGL (alphanumerics/symbols), convertGR (katakana), u005cAsYenSignABC123 → ABC123, カタカナ → カタカナkanji-old-new)Converts old-style kanji (旧字体) to modern forms (新字体).
舊字體の變換 → 旧字体の変換mathematical-alphanumerics)Normalizes mathematical alphanumeric symbols to plain ASCII.
𝐀𝐁𝐂 (mathematical bold) → ABCprolonged-sound-marks)Handles contextual conversion between hyphens and prolonged sound marks.
skipAlreadyTransliteratedChars, allowProlongedHatsuon, allowProlongedSokuon, replaceProlongedMarksFollowingAlnumsイ−ハト−ヴォ (with hyphen) → イーハトーヴォ (prolonged mark)radicals)Converts CJK radical characters to their corresponding ideographs.
⾔⾨⾷ (Kangxi radicals) → 言門食spaces)Normalizes various Unicode space characters to standard ASCII space.
A B (ideographic space) → A Broman-numerals)Converts Unicode Roman numeral characters to their ASCII letter equivalents.
Ⅰ Ⅱ Ⅲ → I II III, ⅰ ⅱ ⅲ → i ii iiiThis project uses standard Rust tooling.
# Build the library
cargo build
# Run tests
cargo test
# Run linting
cargo clippy
# Run formatting
cargo fmt
# Build documentation
cargo doc --open
MIT