Crates.io | glossa |
lib.rs | glossa |
version | 0.0.6 |
created_at | 2023-03-26 00:42:21.6321+00 |
updated_at | 2025-05-26 07:40:33.899122+00 |
description | Generates an array based on the similarity between the current locale and all available locales. |
homepage | |
repository | https://github.com/2moe/glossa |
max_upload_size | |
id | 820708 |
size | 138,270 |
The core functionality of the glossa crate:
Q: Why is fallback necessary?
A: When localized text for the current locale is missing, falling back to a more familiar language (e.g., another variant of the current language) ensures a better user experience.
A person may master multiple languages (or different variants of the same language).
Assume the current locale is pt-PT
(Português, Portugal), and the available locales are pt-PT
, pt
(Português, Brasil), es-419
(Español, Latinoamérica), and en
.
In this case, the i18n library should retrieve localized text in the order [pt-PT, pt, en]
, not [pt-PT, en]
.
Ignoring language similarity and directly falling back to en
not only reduces localization (L10n) coverage but may also increase cognitive load for users.
Assume the current locale is zh-Hans-HK
, and the available locales are zh-Hant-MO
, zh-SG
, ru
, zh-Hant
, fr
, zh
, ar
, zh-HK
, en-001
, lzh
.
After calling try_init_chain()
, the generated locale chain is: ["zh", "zh-SG", "zh-HK", "zh-Hant-MO", "zh-Hant"]
.
When the log level is debug
or trace
, you can see [... DEBUG glossa::fallback] ...<(id, score)>
:
[
("zh", 37), // zh-Hans-CN
("zh-SG", 36), // zh-Hans-SG
("zh-HK", 35), // zh-Hant-HK
("zh-Hant-MO", 31),
("zh-Hant", 28) // zh-Hant-TW
]
Higher scores indicate higher priority.
zh
(Chinese), and no other languages are included in the built-in rules, only zh
variants appear in the chain.lzh
(Classical Chinese) shares some similarity with modern Chinese, but it is not included in the built-in fallback rules for zh-Hans-HK
.Hans
(Simplified). Hans
scores higher than Hant
.
zh-HK
is essentially zh-Hant-HK
.
Hans
scores higher than Hant
, and zh-Hans
resources exist, zh-HK
does not have the highest score.zh-Hant
(zh-Hant-TW), zh-Hant-MO
, and zh-HK
(zh-Hant-HK):
zh-HK
shares the same region (HK) as the current locale, earning +4 points.zh-Hant
and zh-Hant-MO
do not share the HK region, so no bonus.zh
(zh-Hans-CN) and zh-SG
(zh-Hans-SG):
Assume the current locale is en-AU
, with extensive localization resources for various regions (including sparsely populated islands).
From a linguistic similarity perspective, en-NZ
(New Zealand English) is closer to en-AU
(Australian English) than en-GB
(British English).
However, the chain generated by glossa may not guarantee 100% accuracy.
// <(id, score)>:
[
("en-AU", 50), ("en-GB", 44), ("en-CC", 43), ("en-CX", 43), ("en-NF", 43),
("en-NZ", 43), ("en-UM", 42), ("en-CK", 42), ("en-DG", 42), ("en-FJ", 42),
("en-FM", 42), ("en-KI", 42), ("en-NR", 42), ("en-NU", 42), ("en-PG", 42),
("en-PN", 42), ("en-PW", 42), ("en-SB", 42), ("en-TK", 42), ("en-TO", 42),
("en-TV", 42), ("en-VU", 42), ("en-WS", 42), ("en-AS", 42), ("en-GU", 42),
("en-MH", 42), ("en-MP", 42), ("en-US", 22), ...
]
gsw
is Swiss German (Schwiizertüütsch), whilede
is Standard German (Deutsch).
use glossa::{
error::GlossaError, fallback::conv_to_str_chain,
try_init_chain_from_slice,
};
let chain = try_init_chain_from_slice(
// current:
"gsw-LI",
// all_locales:
&[
"en", "es", "pt", "zh", "gsw", "gsw-FR", "gsw-LI", "de", "de-AT", "de-BE", "de-CH", "de-IT",
"de-LI", "de-LU",
],
)?;
// <(id, score)>:
// [ ("gsw-LI", 50), ("gsw", 37), ("gsw-FR", 37), ("de-LI", 27), ("de", 26),
// ("de-AT", 23), ("de-BE", 23), ("de-CH", 23), ("de-LU", 23), ("de-IT", 22) ]
let v = conv_to_str_chain(&chain);
assert_eq!(
v.as_ref(),
[
"gsw-LI", "gsw", "gsw-FR", "de-LI", "de", "de-AT", "de-BE", "de-CH",
"de-LU", "de-IT",
]
);
Implement corresponding logic based on the localization resource (L10n Map) types generated by
glossa-codegen
.
use glossa_codegen::{Generator, L10nResources, Visibility, generator::MapType};
let generator = Generator::default()
.with_resources(L10nResources::new("locales").with_include_map_names(["yes-no"]))
.with_visibility(Visibility::Pub);
The Generator
supports outputting various types.
If you invoke generator.output_match_fn_all_in_one_without_map_name(MapType::Regular)?
, the generated code will resemble:
pub const fn map(language: &[u8], key: &[u8]) -> &'static str {
match (language, key) {
(b"cs", b"cancel") => r#####"Zrušit"#####,
(b"cs", b"no") => r#####"Ne"#####,
(b"cs", b"yes") => r#####"Ano"#####,
(b"de", b"cancel") => r#####"Abbrechen"#####,
(b"de", b"no") => r#####"Nein"#####,
(b"de", b"yes") => r#####"Ja"#####,
(b"en", b"cancel") => r#####"Cancel"#####,
(b"en", b"no") => r#####"No"#####,
(b"en", b"ok") => r#####"OK"#####,
(b"en", b"yes") => r#####"Yes"#####,
(b"es", b"cancel") => r#####"Cancelar"#####,
(b"es", b"ok") => r#####"Aceptar"#####,
(b"es", b"yes") => r#####"Sí"#####,
(b"fr", b"cancel") => r#####"Annuler"#####,
(b"fr", b"no") => r#####"Non"#####,
(b"fr", b"yes") => r#####"Oui"#####,
(b"ja", b"cancel") => r#####"取消"#####,
(b"ja", b"no") => r#####"いいえ"#####,
(b"ja", b"ok") => r#####"了解"#####,
(b"ja", b"yes") => r#####"はい"#####,
(b"ko", b"cancel") => r#####"취소"#####,
(b"ko", b"no") => r#####"아니오"#####,
(b"ko", b"ok") => r#####"확인"#####,
(b"ko", b"yes") => r#####"예"#####,
(b"ru", b"no") => r#####"Нет"#####,
(b"ru", b"yes") => r#####"Да"#####,
(b"zh-Hant", b"cancel") => r#####"取消"#####,
(b"zh-Hant", b"no") => r#####"否"#####,
(b"zh-Hant", b"ok") => r#####"確定"#####,
(b"zh-Hant", b"yes") => r#####"是"#####,
(b"zh-Latn-CN", b"cancel") => r#####"QuXiao"#####,
(b"zh-Latn-CN", b"no") => r#####"Fou"#####,
(b"zh-Latn-CN", b"ok") => r#####"QueDing"#####,
(b"zh-Latn-CN", b"yes") => r#####"Shi"#####,
_ => "",
}
}
Invoking generator.output_locales_fn(MapType::Regular, true)?
generates:
// super: use glossa_shared::lang_id;
pub const fn all_locales() -> [super::lang_id::LangID; 10] {
#[allow(unused_imports)]
use super::lang_id::RawID;
use super::lang_id::consts::*;
[
lang_id_cs(),
lang_id_de(),
lang_id_en(),
lang_id_es(),
lang_id_fr(),
lang_id_ja(),
lang_id_ko(),
lang_id_ru(),
lang_id_zh_hant(),
lang_id_zh_pinyin(),
]
}
Next, implement logic to lookup localized texts based on the types generated by codegen.
As shown above, codegen produces a match_fn
.
Given the function definition: const fn map(language: &[u8], key: &[u8]) -> &'static str
, the lookup logic is:
let lookup = |(language, key)| match map(language, key) {
"" => None,
s => Some(s),
};
If the generated function uses map(language, map_name, key)
, adjust the lookup accordingly:
let lookup = |(language, map_name, key)| match map(language, map_name, key) {
"" => None,
s => Some(s),
};
For binary serialized data (e.g., bincode), deserialize it into a HashMap
or BTreeMap
.
And we can use .get()
to lookup.
let map = glossa_shared::decode::file::decode_file_to_maps(path)?;
let lookup = |language, tuple_key| {
map
.get(language)?
.get(&tuple_key)
};
use glossa::{LocaleContext, traits::ChainProvider};
trait GetL10nText: ChainProvider {
fn try_get_by_key<'t>(&self, key: &[u8]) -> Option<&'t str> {
let lookup = |(language, key)| match map(language, key) {
"" => None,
s => Some(s),
};
self
.provide_chain()?
.iter()
.map(|id| (id.as_bytes(), key))
.find_map(lookup)
}
}
impl GetL10nText for LocaleContext {}
#[test]
pub(crate) fn print_l10n_text() {
let new_ctx = || LocaleContext::default().with_all_locales(all_locales());
// #[cfg(any(target_os = "macos", target_os = "linux"))]
let set_env_lang = |value| unsafe { std::env::set_var("LANG", value) };
let display = |ctx: &LocaleContext, key: &str| {
let text = ctx
.try_get_by_key(key.as_bytes())
.unwrap_or_else(|| panic!("{}", glossa::Error::new_text_not_found(key)));
println!("{key}: {text}")
};
{
// set_env_lang("gsw_CH.UTF-8");
//
let ctx = new_ctx()
.with_current_locale(Some(glossa_shared::lang_id::consts::lang_id_gsw()));
// [("de", 26)]
for key in ["yes", "no", "ok", "cancel"] {
display(&ctx, key)
}
}
// Output:
// yes: Ja
// no: Nein
// ok: OK
// cancel: Abbrechen
{
set_env_lang("zh_MO.UTF-8");
// new_ctx(); // current_locale => get_static_locale()
let ctx = new_ctx().with_current_locale(None);
log::debug!("\n---\n--- current locale => zh-MO");
// [("zh-Hant", 43), ("zh-Latn-CN", 22)]
for key in ["yes", "no", "ok", "cancel", "confirm"] {
display(&ctx, key)
}
}
// Output:
// yes: 是
// no: 否
// ok: 確定
// cancel: 取消
// confirm: Confirm
}
Scenario 1:
In resource-constrained environments, Chinese characters may fail to display properly. In such cases, we can switch the localization language to zh-pinyin (Chinese romanization).
However, due to polysemous homophones in Mandarin Chinese, ambiguities may arise in certain contexts.(can only use Pinyin, not Chinese characters.)
This is precisely where the bilingual functionality shines brightly ✨!
The "bilingual functionality" must be manually implemented.
#[ignore]
#[test]
// en-GB, zh-pinyin
fn test_bilingual() {
use glossa_shared::lang_id::consts::{lang_id_en_gb, lang_id_zh_pinyin};
let new_ctx = |id| {
LocaleContext::default()
.with_current_locale(Some(id))
.with_all_locales(all_locales())
};
let zh_pinyin_ctx = new_ctx(lang_id_zh_pinyin());
let en_gb_ctx = new_ctx(lang_id_en_gb());
fn get_text<'a>(ctx: &LocaleContext, key: &str) -> Option<&'a str> {
let key_bytes = key.as_bytes();
let lookup = |language| match map(language, key_bytes) {
"" => None,
x => Some(x),
};
ctx
.get_or_try_init_chain()?
.iter()
.map(|id| id.as_bytes())
.find_map(lookup)
}
let get_cancel_text = |ctx| get_text(ctx, "cancel").unwrap_or_default();
let zh_pinyin_text = get_cancel_text(&zh_pinyin_ctx);
let en_gb_text = get_cancel_text(&en_gb_ctx);
let text = match zh_pinyin_text == en_gb_text {
true => zh_pinyin_text.into(),
_ => glossa_shared::fmt_compact!("{en_gb_text}; {zh_pinyin_text}"),
};
assert_eq!(text, "Cancel; QuXiao")
}