glossa

Crates.ioglossa
lib.rsglossa
version0.0.6
created_at2023-03-26 00:42:21.6321+00
updated_at2025-05-26 07:40:33.899122+00
descriptionGenerates an array based on the similarity between the current locale and all available locales.
homepage
repositoryhttps://github.com/2moe/glossa
max_upload_size
id820708
size138,270
Moe (2moe)

documentation

README

glossa

glossa.crate

Documentation Apache-2 licensed

Language/语言
Table of Contents

Locale Fallback Chain

The core functionality of the glossa crate:

  • Generates an array based on the similarity between the current locale and all available locales.
    • (Theoretically) Higher similarity locales are prioritized.

Q: Why is fallback necessary?

A: When localized text for the current locale is missing, falling back to a more familiar language (e.g., another variant of the current language) ensures a better user experience.

A person may master multiple languages (or different variants of the same language).

Assume the current locale is pt-PT (Português, Portugal), and the available locales are pt-PT, pt (Português, Brasil), es-419 (Español, Latinoamérica), and en.

In this case, the i18n library should retrieve localized text in the order [pt-PT, pt, en], not [pt-PT, en].

Ignoring language similarity and directly falling back to en not only reduces localization (L10n) coverage but may also increase cognitive load for users.

Example: zh-Hans-HK

Assume the current locale is zh-Hans-HK, and the available locales are zh-Hant-MO, zh-SG, ru, zh-Hant, fr, zh, ar, zh-HK, en-001, lzh.

After calling try_init_chain(), the generated locale chain is: ["zh", "zh-SG", "zh-HK", "zh-Hant-MO", "zh-Hant"].

When the log level is debug or trace, you can see [... DEBUG glossa::fallback] ...<(id, score)>:

[
  ("zh", 37),       // zh-Hans-CN
  ("zh-SG", 36),    // zh-Hans-SG
  ("zh-HK", 35),    // zh-Hant-HK
  ("zh-Hant-MO", 31),
  ("zh-Hant", 28)   // zh-Hant-TW
]

Higher scores indicate higher priority.

  • Exact match: full score (50 points).
  • Partial matches:
    • Same language: +20 points.
      • Since the current language is zh (Chinese), and no other languages are included in the built-in rules, only zh variants appear in the chain.
      • Theoretically, lzh (Classical Chinese) shares some similarity with modern Chinese, but it is not included in the built-in fallback rules for zh-Hans-HK.
    • Same script: +15 points.
      • The current script is Hans (Simplified). Hans scores higher than Hant.
        • zh-HK is essentially zh-Hant-HK.
          • Since Hans scores higher than Hant, and zh-Hans resources exist, zh-HK does not have the highest score.
    • Matches built-in fallback rules:
      • Full match: +9 points.
      • Partial match (language + script): +6 points.
    • Same region: +4 points.
      • Comparing zh-Hant (zh-Hant-TW), zh-Hant-MO, and zh-HK (zh-Hant-HK):
        • zh-HK shares the same region (HK) as the current locale, earning +4 points.
        • zh-Hant and zh-Hant-MO do not share the HK region, so no bonus.
    • Proximity bonus:
      • Same sub-region (e.g., East Asia): +2 points.
      • Same continent (e.g., Asia): +1 point.
      • Comparing zh (zh-Hans-CN) and zh-SG (zh-Hans-SG):
        • HK (HongKong SAR, China) and CN (Mainland China) are both in East Asia (+2).
        • SG (Singapore) is in Southeast Asia, sharing the same continent (Asia) with HK (+1).

Example: en-AU

Assume the current locale is en-AU, with extensive localization resources for various regions (including sparsely populated islands).

From a linguistic similarity perspective, en-NZ (New Zealand English) is closer to en-AU (Australian English) than en-GB (British English).

However, the chain generated by glossa may not guarantee 100% accuracy.

// <(id, score)>:
[
  ("en-AU", 50), ("en-GB", 44), ("en-CC", 43), ("en-CX", 43), ("en-NF", 43),
  ("en-NZ", 43), ("en-UM", 42), ("en-CK", 42), ("en-DG", 42), ("en-FJ", 42),
  ("en-FM", 42), ("en-KI", 42), ("en-NR", 42), ("en-NU", 42), ("en-PG", 42),
  ("en-PN", 42), ("en-PW", 42), ("en-SB", 42), ("en-TK", 42), ("en-TO", 42),
  ("en-TV", 42), ("en-VU", 42), ("en-WS", 42), ("en-AS", 42), ("en-GU", 42),
  ("en-MH", 42), ("en-MP", 42), ("en-US", 22), ...
]

Example: gsw-LI

gsw is Swiss German (Schwiizertüütsch), while de is Standard German (Deutsch).

use glossa::{
  error::GlossaError, fallback::conv_to_str_chain,
  try_init_chain_from_slice,
};

let chain = try_init_chain_from_slice(
  // current:
  "gsw-LI",

  // all_locales:
  &[
     "en", "es", "pt", "zh", "gsw", "gsw-FR", "gsw-LI", "de", "de-AT", "de-BE", "de-CH", "de-IT",
    "de-LI", "de-LU",
  ],
)?;
// <(id, score)>:
// [ ("gsw-LI", 50), ("gsw", 37), ("gsw-FR", 37), ("de-LI", 27), ("de", 26),
//   ("de-AT", 23), ("de-BE", 23), ("de-CH", 23), ("de-LU", 23), ("de-IT", 22) ]

let v = conv_to_str_chain(&chain);

assert_eq!(
  v.as_ref(),
  [
    "gsw-LI", "gsw", "gsw-FR", "de-LI", "de", "de-AT", "de-BE", "de-CH",
    "de-LU", "de-IT",
  ]
);

Practical Usage

Implement corresponding logic based on the localization resource (L10n Map) types generated by glossa-codegen.

Code Generation

use glossa_codegen::{Generator, L10nResources, Visibility, generator::MapType};

let generator = Generator::default()
  .with_resources(L10nResources::new("locales").with_include_map_names(["yes-no"]))
  .with_visibility(Visibility::Pub);

The Generator supports outputting various types. If you invoke generator.output_match_fn_all_in_one_without_map_name(MapType::Regular)?, the generated code will resemble:

pub const fn map(language: &[u8], key: &[u8]) -> &'static str {
  match (language, key) {
    (b"cs", b"cancel") => r#####"Zrušit"#####,
    (b"cs", b"no") => r#####"Ne"#####,
    (b"cs", b"yes") => r#####"Ano"#####,
    (b"de", b"cancel") => r#####"Abbrechen"#####,
    (b"de", b"no") => r#####"Nein"#####,
    (b"de", b"yes") => r#####"Ja"#####,
    (b"en", b"cancel") => r#####"Cancel"#####,
    (b"en", b"no") => r#####"No"#####,
    (b"en", b"ok") => r#####"OK"#####,
    (b"en", b"yes") => r#####"Yes"#####,
    (b"es", b"cancel") => r#####"Cancelar"#####,
    (b"es", b"ok") => r#####"Aceptar"#####,
    (b"es", b"yes") => r#####"Sí"#####,
    (b"fr", b"cancel") => r#####"Annuler"#####,
    (b"fr", b"no") => r#####"Non"#####,
    (b"fr", b"yes") => r#####"Oui"#####,
    (b"ja", b"cancel") => r#####"取消"#####,
    (b"ja", b"no") => r#####"いいえ"#####,
    (b"ja", b"ok") => r#####"了解"#####,
    (b"ja", b"yes") => r#####"はい"#####,
    (b"ko", b"cancel") => r#####"취소"#####,
    (b"ko", b"no") => r#####"아니오"#####,
    (b"ko", b"ok") => r#####"확인"#####,
    (b"ko", b"yes") => r#####"예"#####,
    (b"ru", b"no") => r#####"Нет"#####,
    (b"ru", b"yes") => r#####"Да"#####,
    (b"zh-Hant", b"cancel") => r#####"取消"#####,
    (b"zh-Hant", b"no") => r#####"否"#####,
    (b"zh-Hant", b"ok") => r#####"確定"#####,
    (b"zh-Hant", b"yes") => r#####"是"#####,
    (b"zh-Latn-CN", b"cancel") => r#####"QuXiao"#####,
    (b"zh-Latn-CN", b"no") => r#####"Fou"#####,
    (b"zh-Latn-CN", b"ok") => r#####"QueDing"#####,
    (b"zh-Latn-CN", b"yes") => r#####"Shi"#####,
    _ => "",
  }
}

Invoking generator.output_locales_fn(MapType::Regular, true)? generates:

// super: use glossa_shared::lang_id;

pub const fn all_locales() -> [super::lang_id::LangID; 10] {
  #[allow(unused_imports)]
  use super::lang_id::RawID;
  use super::lang_id::consts::*;
  [
    lang_id_cs(),
    lang_id_de(),
    lang_id_en(),
    lang_id_es(),
    lang_id_fr(),
    lang_id_ja(),
    lang_id_ko(),
    lang_id_ru(),
    lang_id_zh_hant(),
    lang_id_zh_pinyin(),
  ]
}

LocaleContext

Next, implement logic to lookup localized texts based on the types generated by codegen. As shown above, codegen produces a match_fn.

Given the function definition: const fn map(language: &[u8], key: &[u8]) -> &'static str, the lookup logic is:

let lookup = |(language, key)| match map(language, key) {
  "" => None,
  s => Some(s),
};

If the generated function uses map(language, map_name, key), adjust the lookup accordingly:

let lookup = |(language, map_name, key)| match map(language, map_name, key) {
  "" => None,
  s => Some(s),
};

For binary serialized data (e.g., bincode), deserialize it into a HashMap or BTreeMap. And we can use .get() to lookup.

let map = glossa_shared::decode::file::decode_file_to_maps(path)?;
let lookup = |language, tuple_key| {
  map
    .get(language)?
    .get(&tuple_key)
};

Trait Example

use glossa::{LocaleContext, traits::ChainProvider};

trait GetL10nText: ChainProvider {
  fn try_get_by_key<'t>(&self, key: &[u8]) -> Option<&'t str> {
    let lookup = |(language, key)| match map(language, key) {
      "" => None,
      s => Some(s),
    };

    self
      .provide_chain()?
      .iter()
      .map(|id| (id.as_bytes(), key))
      .find_map(lookup)
  }
}

impl GetL10nText for LocaleContext {}

#[test]
pub(crate) fn print_l10n_text() {
  let new_ctx = || LocaleContext::default().with_all_locales(all_locales());

  // #[cfg(any(target_os = "macos", target_os = "linux"))]
  let set_env_lang = |value| unsafe { std::env::set_var("LANG", value) };

  let display = |ctx: &LocaleContext, key: &str| {
    let text = ctx
      .try_get_by_key(key.as_bytes())
      .unwrap_or_else(|| panic!("{}", glossa::Error::new_text_not_found(key)));
    println!("{key}: {text}")
  };

  {
    // set_env_lang("gsw_CH.UTF-8");
    //
    let ctx = new_ctx()
      .with_current_locale(Some(glossa_shared::lang_id::consts::lang_id_gsw()));
    // [("de", 26)]

    for key in ["yes", "no", "ok", "cancel"] {
      display(&ctx, key)
    }
  }
  // Output:
  //   yes: Ja
  //   no: Nein
  //   ok: OK
  //   cancel: Abbrechen

  {
    set_env_lang("zh_MO.UTF-8");
    // new_ctx();                           // current_locale =>  get_static_locale()
    let ctx = new_ctx().with_current_locale(None);

    log::debug!("\n---\n--- current locale => zh-MO");

    // [("zh-Hant", 43), ("zh-Latn-CN", 22)]
    for key in ["yes", "no", "ok", "cancel", "confirm"] {
      display(&ctx, key)
    }
  }
  // Output:
  //   yes: 是
  //   no: 否
  //   ok: 確定
  //   cancel: 取消
  //   confirm: Confirm
}

Bilingual Example

Scenario 1:

In resource-constrained environments, Chinese characters may fail to display properly. In such cases, we can switch the localization language to zh-pinyin (Chinese romanization).

However, due to polysemous homophones in Mandarin Chinese, ambiguities may arise in certain contexts.(can only use Pinyin, not Chinese characters.)

This is precisely where the bilingual functionality shines brightly ✨!

The "bilingual functionality" must be manually implemented.


#[ignore]
#[test]
// en-GB, zh-pinyin
fn test_bilingual() {
  use glossa_shared::lang_id::consts::{lang_id_en_gb, lang_id_zh_pinyin};

  let new_ctx = |id| {
    LocaleContext::default()
      .with_current_locale(Some(id))
      .with_all_locales(all_locales())
  };
  let zh_pinyin_ctx = new_ctx(lang_id_zh_pinyin());
  let en_gb_ctx = new_ctx(lang_id_en_gb());

  fn get_text<'a>(ctx: &LocaleContext, key: &str) -> Option<&'a str> {
    let key_bytes = key.as_bytes();
    let lookup = |language| match map(language, key_bytes) {
      "" => None,
      x => Some(x),
    };

    ctx
      .get_or_try_init_chain()?
      .iter()
      .map(|id| id.as_bytes())
      .find_map(lookup)
  }

  let get_cancel_text = |ctx| get_text(ctx, "cancel").unwrap_or_default();

  let zh_pinyin_text = get_cancel_text(&zh_pinyin_ctx);
  let en_gb_text = get_cancel_text(&en_gb_ctx);

  let text = match zh_pinyin_text == en_gb_text {
    true => zh_pinyin_text.into(),
    _ => glossa_shared::fmt_compact!("{en_gb_text}; {zh_pinyin_text}"),
  };

  assert_eq!(text, "Cancel; QuXiao")
}
Commit count: 128

cargo fmt