yosina

Crates.ioyosina
lib.rsyosina
version1.0.0
created_at2025-08-19 20:02:43.175463+00
updated_at2025-09-24 05:37:42.193538+00
descriptionJapanese text transliteration library
homepagehttps://github.com/yosina-lib/yosina
repositoryhttps://github.com/yosina-lib/yosina
max_upload_size
id1802377
size916,430
Moriyoshi Koizumi (moriyoshi)

documentation

https://docs.rs/yosina

README

Yosina Rust

A Rust port of the Yosina Japanese text transliteration library.

Overview

Yosina is a library for Japanese text transliteration that provides various text normalization and conversion features commonly needed when processing Japanese text.

Installation

Add this to your Cargo.toml:

[dependencies]
yosina = "1.0.0"

Usage

Using Recipes (Recommended)

use yosina::{make_transliterator, TransliterationRecipe};
use yosina::recipes::{ReplaceCircledOrSquaredCharactersOptions, ToFullWidthOptions};

// Create a recipe with desired transformations
let recipe = TransliterationRecipe {
    kanji_old_new: true,
    replace_spaces: true,
    replace_suspicious_hyphens_to_prolonged_sound_marks: true,
    replace_circled_or_squared_characters: ReplaceCircledOrSquaredCharactersOptions::Yes {
        exclude_emojis: false,
    },
    replace_combined_characters: true,
    to_fullwidth: ToFullWidthOptions::Yes {
        u005c_as_yen_sign: false,
    },
    ..Default::default()
};

// Create the transliterator
let transliterator = make_transliterator(&recipe).unwrap();

// Use it with various special characters
let input = "①②③ ⒶⒷⒸ ㍿㍑㌠㋿"; // circled numbers, letters, space, combined characters
let result = transliterator(input).unwrap();
println!("{}", result); // "(1)(2)(3) (A)(B)(C) 株式会社リットルサンチーム令和"

// Convert old kanji to new
let old_kanji = "舊字體";
let result = transliterator(old_kanji).unwrap();
println!("{}", result); // "旧字体"

// Convert half-width katakana to full-width
let half_width = "テストモジレツ";
let result = transliterator(half_width).unwrap();
println!("{}", result); // "テストモジレツ"

Using Direct Configuration

use yosina::{make_transliterator, TransliteratorConfig::*};

// Configure with direct transliterator configs
let configs = vec![
    KanjiOldNew,
    Spaces,
    ProlongedSoundMarks(Default::default()),
    CircledOrSquared(Default::default()),
    Combined,
];

let transliterator = make_transliterator(&configs).unwrap();
let result = transliterator("some japanese text").unwrap();
println!("{}", result); // Processed text with transformations applied

Available Transliterators

1. Circled or Squared (circled-or-squared)

Converts circled or squared characters to their plain equivalents.

  • Options: templates (custom rendering), includeEmojis (include emoji characters)
  • Example: ①②③(1)(2)(3), ㊙㊗(秘)(祝)

2. Combined (combined)

Expands combined characters into their individual character sequences.

  • Example: (Heisei era) → 平成, (株)

3. Hiragana-Katakana Composition (hira-kata-composition)

Combines decomposed hiraganas and katakanas into composed equivalents.

  • Options: composeNonCombiningMarks (compose non-combining marks)
  • Example: か + ゙, ヘ + ゜

4. Hiragana-Katakana (hira-kata)

Converts between hiragana and katakana scripts bidirectionally.

  • Options: mode ("hira-to-kata" or "kata-to-hira")
  • Example: ひらがなヒラガナ (hira-to-kata)

5. Hyphens (hyphens)

Replaces various dash/hyphen symbols with common ones used in Japanese.

  • Options: precedence (mapping priority order)
  • Available mappings: "ascii", "jisx0201", "jisx0208_90", "jisx0208_90_windows", "jisx0208_verbatim"
  • Example: 2019—2020 (em dash) → 2019-2020

6. Ideographic Annotations (ideographic-annotations)

Replaces ideographic annotations used in traditional Chinese-to-Japanese translation.

  • Example: ㆖㆘上下

7. IVS-SVS Base (ivs-svs-base)

Handles Ideographic and Standardized Variation Selectors.

  • Options: charset, mode ("ivs-or-svs" or "base"), preferSVS, dropSelectorsAltogether
  • Example: 葛󠄀 (葛 + IVS) →

8. Japanese Iteration Marks (japanese-iteration-marks)

Expands iteration marks by repeating the preceding character.

  • Example: 時々時時, いすゞいすず

9. JIS X 0201 and Alike (jisx0201-and-alike)

Handles half-width/full-width character conversion.

  • Options: fullwidthToHalfwidth, convertGL (alphanumerics/symbols), convertGR (katakana), u005cAsYenSign
  • Example: ABC123ABC123, カタカナカタカナ

10. Kanji Old-New (kanji-old-new)

Converts old-style kanji (旧字体) to modern forms (新字体).

  • Example: 舊字體の變換旧字体の変換

11. Mathematical Alphanumerics (mathematical-alphanumerics)

Normalizes mathematical alphanumeric symbols to plain ASCII.

  • Example: 𝐀𝐁𝐂 (mathematical bold) → ABC

12. Prolonged Sound Marks (prolonged-sound-marks)

Handles contextual conversion between hyphens and prolonged sound marks.

  • Options: skipAlreadyTransliteratedChars, allowProlongedHatsuon, allowProlongedSokuon, replaceProlongedMarksFollowingAlnums
  • Example: イ−ハト−ヴォ (with hyphen) → イーハトーヴォ (prolonged mark)

13. Radicals (radicals)

Converts CJK radical characters to their corresponding ideographs.

  • Example: ⾔⾨⾷ (Kangxi radicals) → 言門食

14. Spaces (spaces)

Normalizes various Unicode space characters to standard ASCII space.

  • Example: A B (ideographic space) → A B

15. Roman Numerals (roman-numerals)

Converts Unicode Roman numeral characters to their ASCII letter equivalents.

  • Example: Ⅰ Ⅱ ⅢI II III, ⅰ ⅱ ⅲi ii iii

Development

This project uses standard Rust tooling.

# Build the library
cargo build

# Run tests
cargo test

# Run linting
cargo clippy

# Run formatting
cargo fmt

# Build documentation
cargo doc --open

Requirements

  • Rust 1.70+

License

MIT

Commit count: 50

cargo fmt