| Crates.io | glossa-codegen |
| lib.rs | glossa-codegen |
| version | 0.0.9 |
| created_at | 2023-04-03 16:37:06.172439+00 |
| updated_at | 2025-05-28 23:08:58.882602+00 |
| description | Generate const language localisation map code at compile time |
| homepage | |
| repository | https://github.com/2moe/glossa |
| max_upload_size | |
| id | 829235 |
| size | 238,711 |
glossa-codegen is used to generate Rust code (with localized texts) and bincode files.
Note: Although glossa-codegen requires std, both glossa and glossa-shared support no-std environments.
- glossa-codegen is used to generate production code.
- glossa is used to build fallback chains.
- glossa-shared provides data types required by production code.
You only need to include glossa-codegen in
#[test]tests orbuild.rs, not in production code.
Assume a locales directory with the following structure:
locales
├── ar
│ └── error.yaml
├── en
│ ├── error.yaml
│ └── yes-no.toml
├── es
│ └── yes-no.toml
├── fr
│ └── yes-no.toml
├── ru
│ └── error.yaml
└── zh
├── error.yaml
└── yes-no.toml
Here, "ar", "en", "es", "fr", "ru", "zh" are Language IDs.
"error" and "yes-no" are Map Names.
Different file stems (e.g.,
a.toml,b.json) correspond to different map names.What about identical stems? (e.g.,
a.tomlanda.json)
Q: If multiple files with the same stem exist (e.g., error.yaml, error.yml, error.toml, etc.), which one becomes the actual "error" map?
A: If all files are valid and non-empty K-V pairs, it depends on luck! Otherwise, the first valid file with the same stem becomes the map.
Note:
a.toml(stem:a) anda.dsl.toml(stem:a.dsl) are not considered the same. However,en/a.tomlanden/subdir/a.jsonare considered the same stem.
Q: Why does it depend on luck?
A: Because during the initialization of localization resources, we utilize Rayon for parallel multi-threaded deserialization (multiple files are read and parsed across threads).
The execution order is non-deterministic.
Thread scheduling and file processing completion timing depend on runtime conditions, making the final initialization result probabilistically variable.
| L10n Type | Description |
|---|---|
| Raw Text Files | Untreated source files (e.g., en/hello.toml) |
| Generated Rust Code | Hardcoded into the program via const fn |
| Bincode | Binary files for efficient deserialization |
Raw files can be seen as source code, while other formats are compiled from them.
The most basic type.
TOML example:
world = "世界"
"🐱" = "喵 ฅ(°ω°ฅ)"
JSON5 example:
{
// JSON5 supports comments
"world": "世界",
"🐱": "喵 ฅ(°ω°ฅ)", // Trailing commas allowed
}
DSL: Domain-Specific Language
Learn glossa-dsl's 5 syntax rules in 5 minutes:
name = "Tom"{"name": "Tom"}TOML:
name = "Tom"
hello = "Hello { name }"
hello references {name} (whitespace inside braces is ignored)."Hello Tom".JSON5:
{
"hello": "Hello {🐱}",
"🐱": "ฅ(°ω°ฅ)",
}
①. hello references {🐱}.
②. Result: "Hello ฅ(°ω°ฅ)".
TOML:
good-morning = "Good morning, { $🐱 }"
greeting = "{good-morning}, { $name }!"
{ $🐱 } and { $name } require external Arguments.Rust:
let ctx = [("name", "Moe"), ("🐱", "ฅ(°ω°ฅ)")];
let text = res.get_with_context("greeting", &ctx)?;
assert_eq!(text, "Good morning, ฅ(°ω°ฅ), Moe!");
{ 🐱 } and { $🐱 }{ 🐱 }: Internal reference.{ $🐱 }: Requires an external argument.Internal reference:
"🐱" = "ฅ(°ω°ฅ)"
meow = "{ 🐱 }"
Requires an external argument:
meow = "{ $🐱 }"
zh/unread.toml:
"阿拉伯数字转汉字" = """
$num ->
[0] 〇
[1] 一
[2] 二
[3] 三
[10] 十
*[其他] {$num}
"""
"未读msg" = "未读消息"
"显示未读消息数量" = """
$num ->
[0] 没有{ 未读msg }
[2] 您有两条{ 未读msg }
*[其他] 您有{ 阿拉伯数字转汉字 }条{ 未读msg }
"""
show-unread-messages-count = "{显示未读消息数量}。"
rust:
let get_text = |num_str| res.get_with_context("show-unread-messages-count", &[("num", num_str)]);
assert_eq!(get_text("0")?, "没有未读消息。");
assert_eq!(get_text("1")?, "您有一条未读消息。");
assert_eq!(get_text("2")?, "您有两条未读消息。");
assert_eq!(get_text("10")?, "您有十条未读消息。");
assert_eq!(get_text("100")?, "您有100条未读消息。");
en/unread.toml:
num-to-en = """
$num ->
[0] zero
[1] one
[2] two
[3] three
*[other] {$num}
"""
unread_msg = "unread message"
unread-count = """
$num ->
[0] No {unread_msg}s.
[1] You have { num-to-en } {unread_msg}.
*[other] You have { num-to-en } {unread_msg}s.
"""
show-unread-messages-count = "{unread-count}"
rust:
let get_text = |num_str| res.get_with_context("show-unread-messages-count", &[("num", num_str)]);
assert_eq!(get_text("0")?, "No unread messages.");
assert_eq!(get_text("1")?, "You have one unread message.");
assert_eq!(get_text("2")?, "You have two unread messages.");
assert_eq!(get_text("100")?, "You have 100 unread messages.");
In the above context, we learned that { a } represents an internal reference, while { $a } depends on the externally passed argument a.
Q: How can we preserve the raw {a } format and prevent its automatic parsing?
A: Use escape syntax with nested braces:
• To preserve {a }, wrap it in two layers of braces: {{ {a } }}
• To preserve {{a }, wrap it in three layers of braces: {{{ {{a } }}}
"{{ a }}" => "a""{{{a}}}" => "a""{{{{ a }}}}" => "a""{{ {a} }}" => "{a}""{{a}" => ❌ nom Error, code: take_until"{{{ {{a}} }}}" => "{{a}}""{{{ {{ a }} }}}" => "{{ a }}""{{{ {{a} }}}" => "{{a}"enum MapType {
Regular,
Highlight,
RegularAndHighlight,
DSL,
}
AST: Abstract Syntax Tree
pub struct SmallList<const N: usize>(pub SmallVec<MiniStr, N>);
pub struct L10nResources {
dir: PathBuf,
dsl_suffix: MiniStr,
include_languages: SmallList<3>,
include_map_names: SmallList<2>,
exclude_languages: SmallList<1>,
exclude_map_names: SmallList<1>,
/// get data: [Self::get_or_init_data]
lazy_data: OnceLock<L10nResMap>,
}
"./locales".".dsl").
".dsl":
"a.dsl.toml" is recognized as a glossa-DSL file."b.dsl.json" is also recognized as a glossa-DSL file."a.toml" is treated as a regular file.["de", "en", "es", "pt", "ru", "zh"].
.with_include_languages(["en", "zh"]) ⇒ Only resources for "en" and "zh" are initialized."en/a.toml", "en/b.json", "zh/a.json", "zh/b.ron".
["a", "b"]..with_include_map_names(["a"]) ⇒ Only "en/a.toml" and "zh/a.json" are initialized.["de", "en", "es", "pt", "ru", "zh"].
.with_exclude_languages(["en", "es", "ru"]) ⇒ Initializes ["de", "pt", "zh"]..with_include_languages(["en", "es"]).with_exclude_languages(["en"]) ⇒ Initializes ["es"]."en/a.toml", "en/b.json", "zh/a.json", "zh/b.ron", "zh/c.toml".
.with_exclude_map_names(["a"]) ⇒ Initializes "en/b.json", "zh/b.ron", "zh/c.toml"..with_include_map_names(["b", "c"]).with_exclude_map_names(["b"]) ⇒ Initializes "zh/c.toml"..with_include_languages(["en"]).with_exclude_map_names(["a"]) ⇒ Initializes "en/b.json"..get_or_init_data(), equivalent to a cache.| Method | Description |
|---|---|
.get_dir() |
Retrieves the directory path. |
.with_dir("/path/to/new_dir".into()) |
Sets the L10n directory path. |
.get_dsl_suffix() |
Retrieves the DSL suffix. |
.with_dsl_suffix(".new_suffix".into()) |
Sets the DSL suffix. |
.with_include_languages([]) |
Configures the language whitelist. |
.with_include_map_names([]) |
Configures the map name whitelist. |
.with_exclude_languages([]) |
Configures the language blacklist. |
.with_exclude_map_names([]) |
Configures the map name blacklist. |
.get_or_init_data() |
Retrieves &HashMap<KString, Vec<L10nMapEntry>>, initializing data if needed. |
.with_lazy_data(OnceLock::new()) |
Resets lazy_data by replacing it with a new uninitialized OnceLock. |
Q: How to construct a new L10nResources struct?
A:
use glossa_codegen::L10nResources;
let _res = L10nResources::new("locales");
// Equivalent to: L10nResources::default().with_dir("locales".into())
The "locales" path can be replaced with other directories, e.g., "../../l10n/".
pub struct Generator<'h> {
resources: Box<L10nResources>,
visibility: Visibility,
outdir: Option<PathBuf>,
bincode_suffix: MiniStr,
mod_prefix: MiniStr,
highlight: Option<Box<HighlightCfgMap<'h>>>,
/// get: `Self::get_or_init_*maps`
lazy_maps: Box<LazyMaps>,
}
PubCrate).
glossa_codegen::Visibility { Private, PubCrate, Pub, PubSuper }
.with_visibility(Visibility::Pub) ⇒ Generates pub const fn xxx..with_visibility(Visibility::PubCrate) ⇒ Generates pub(crate) const fn xxx.".bincode")."l10n_")..get_or_init_maps() // Regular.get_or_init_highlight_maps() // Highlight.get_or_init_merged_maps() // RegularAndHighlight.get_or_init_dsl_maps() // Templateuse glossa_codegen::{Generator, L10nResources};
let resources = L10nResources::new("locales");
let generator = Generator::default()
.with_resources(resources)
.with_outdir("tmp");
const fn with internal match expressions
Calling .output_match_fn(MapType::Regular) generates Rust code:
const fn map(map_name: &[u8], key: &[u8]) -> &'static str {
match (map_name, key) { ... }
}
PHF map functions
Calling .output_phf(MapType::Regular) generates Rust code:
const fn map() -> super::PhfL10nOrderedMap { ... }
Bincode
.output_bincode(MapType::Regular) generates binary bincode files.MapType::DSL can only output to bincode, while other MapTypes support all output formats.
You can treat DSL as a Regular Map (e.g., by modifying
L10nResources'sdsl_suffix), but this offers no performance benefit. Parsing the AST of DSL is faster than parsing raw DSL.
- When DSL is treated as Regular, the generated code contains raw K-V pairs. At runtime, these must first be parsed into AST.
- Directly outputting
MapType::DSLas bincode serializes the DSL's AST instead of raw K-V pairs.
match ExpressionsKey methods:
.output_match_fn()
{outdir}/{mod_prefix}{snake_case_language}.rs
en → tmp/l10n_en.rsen-GB → tmp/l10n_en_gb.rsconst fn map(map_name: &[u8], key: &[u8]) -> &'static str {...}output_match_fn_without_map_name():
const fn map(key: &[u8]) -> &'static str {...}.output_match_fn_all_in_one()
Aggregates all languages into a single function:
const fn map(lang: &[u8], map_name: &[u8], key: &[u8]) -> &'static str { ... }
.output_match_fn_all_in_one_by_language()
Aggregates all languages into a single function:
const fn map(language: &[u8]) -> &'static str { ... }
Use only if both
map_nameandkeyare unique to avoid conflicts.
.output_match_fn_all_in_one_without_map_name()
Aggregates all languages into a single function:
const fn map(language: &[u8], key: &[u8]) -> &'static str { ... }
Use only if
map_nameis unique to avoid key conflicts.
output_match_fn()Given:
l10n/en-GB/error.toml:
text-not-found = "No localised text found"
l10n/de/error.yml:
text-not-found: Kein lokalisierter Text gefunden
Code:
use glossa_codegen::{generator::MapType, Generator, L10nResources};
let resources = L10nResources::new("l10n");
Generator::default()
.with_resources(resources)
.with_outdir("tmp")
.output_match_fn(MapType::Regular)?;
Output (tmp/l10n_en_gb.rs):
pub(crate) const fn map(map_name: &[u8], key: &[u8]) -> &'static str {
match (map_name, key) {
(b"error", b"text-not-found") => r#####"No localised text found"#####,
_ => "",
}
}
Output (tmp/l10n_de.rs):
pub(crate) const fn map(map_name: &[u8], key: &[u8]) -> &'static str {
match (map_name, key) {
(b"error", b"text-not-found") => r#####"Kein lokalisierter Text gefunden"#####,
_ => "",
}
}
output_match_fn_all_in_one()Q: What do we get if we use output_match_fn_all_in_one()?
A: We will receive a String containing the function data.
All localization resources for every language are consolidated into a single function.
let function_data = generator.output_match_fn_all_in_one(MapType::Regular)?;
Output (function_data):
pub(crate) const fn map(lang: &[u8], map_name: &[u8], key: &[u8]) -> &'static str {
match (lang, map_name, key) {
(b"en-GB", b"error", b"text-not-found") => r#####"No localised text found"#####,
(b"de", b"error", b"text-not-found") => r#####"Kein lokalisierter Text gefunden"#####,
_ => "",
}
}
output_match_fn_all_in_one_without_map_name()TLDR:
map_name
output_match_fn_all_in_one_without_map_name() can improve performance..output_match_fn_all_in_one() instead.When map_name is unique, we can omit it for performance optimization.
match (lang, key) { ... }
match (lang, map_name, key) { ... }
Comparing these two match expressions:
lang and key).lang, map_name, and key).Theoretically, the first is faster due to fewer match arms.
output_match_fn_all_in_one_without_map_name() generates code similar to the first approach.
If you aren’t concerned with nanosecond-level optimizations, you can safely skip this section.
When map_name is unique (e.g., yes-no):
en/yes-no { yes: "Yes", no: "No"}de/yes-no { yes: "Ja", no: "Nein" }Calling .output_match_fn_all_in_one_without_map_name(Regular)?
Output:
pub(crate) const fn map(language: &[u8], key: &[u8]) -> &'static str {
match (language, key) {
(b"en", b"yes") => r#####"Yes"#####,
(b"en", b"no") => r#####"No"#####,
(b"de", b"yes") => r#####"Ja"#####,
(b"de", b"no") => r#####"Nein"#####,
_ => "",
}
}
When map_name is not unique. For example, adding a new entry like en/yes-no2 { yes: "YES", no: "NO", ok: "OK" }.
Different map_names may contain identical keys (e.g., "yes" and "no"), causing key conflicts. In such cases, omitting map_name becomes invalid.
output_phf(): Generates Perfect Hash Function (PHF) maps per language..output_phf_all_in_one():Aggregates all localization resources into a single string containing serialized PHF map dataoutput_phf()use glossa_codegen::{generator::MapType, Generator, L10nResources};
pub(crate) fn es_generator<'h>() -> Generator<'h> {
let data = L10nResources::new("locales").with_include_languages(["es", "es-419"]);
Generator::default().with_resources(data).with_outdir("tmp")
}
es_generator().output_phf(MapType::Regular)?;
tmp/l10n_es.rs
pub(crate) const fn map() -> super::PhfL10nOrderedMap {
use super::PhfTupleKey as Key;
super::phf::OrderedMap {
key: 12913932095322966823,
disps: &[(0, 0)],
idxs: &[1, 3, 2, 4, 0],
entries: &[
(
Key(r#"error"#, r##"text-not-found"##),
r#####"No se encontró texto localizado"#####,
),
(Key(r#"yes-no"#, r##"cancel"##), r#####"Cancelar"#####),
(Key(r#"yes-no"#, r##"no"##), r#####"No"#####),
(Key(r#"yes-no"#, r##"ok"##), r#####"Aceptar"#####),
(Key(r#"yes-no"#, r##"yes"##), r#####"Sí"#####),
],
}
}
Q: Wait, where do PhfL10nOrderedMap and PhfTupleKey come from?
A: These types are defined in the
output_phf_all_in_one()let data = L10nResources::new("locales")
.with_include_languages(["de", "en", "fr", "pt", "zh"])
.with_include_map_names(["yes-no"]);
let function_data = Generator::default().with_resources(data).output_phf_all_in_one(MapType::Regular)?;
function_data:
pub(crate) const fn map() -> super::PhfL10nAllInOneMap {
use super::PhfTripleKey as Key;
super::phf::OrderedMap {
key: 6767243246500575252,
disps: &[(0, 0), (0, 2), (4, 12), (15, 9)],
idxs: &[
4, 7, 13, 19, 9, 14, 3, 17, 10, 18, 5, 12, 16, 1, 8, 6, 2, 15, 0, 11,
],
entries: &[
(
Key(r#"de"#, r##"yes-no"##, r###"cancel"###),
r#####"Abbrechen"#####,
),
(Key(r#"de"#, r##"yes-no"##, r###"no"###), r#####"Nein"#####),
(Key(r#"de"#, r##"yes-no"##, r###"ok"###), r#####"OK"#####),
(Key(r#"de"#, r##"yes-no"##, r###"yes"###), r#####"Ja"#####),
(
Key(r#"en"#, r##"yes-no"##, r###"cancel"###),
r#####"Cancel"#####,
),
(Key(r#"en"#, r##"yes-no"##, r###"no"###), r#####"No"#####),
(Key(r#"en"#, r##"yes-no"##, r###"ok"###), r#####"OK"#####),
(Key(r#"en"#, r##"yes-no"##, r###"yes"###), r#####"Yes"#####),
(
Key(r#"fr"#, r##"yes-no"##, r###"cancel"###),
r#####"Annuler"#####,
),
(Key(r#"fr"#, r##"yes-no"##, r###"no"###), r#####"Non"#####),
(Key(r#"fr"#, r##"yes-no"##, r###"ok"###), r#####"OK"#####),
(Key(r#"fr"#, r##"yes-no"##, r###"yes"###), r#####"Oui"#####),
(
Key(r#"pt"#, r##"yes-no"##, r###"cancel"###),
r#####"Cancelar"#####,
),
(Key(r#"pt"#, r##"yes-no"##, r###"no"###), r#####"Não"#####),
(Key(r#"pt"#, r##"yes-no"##, r###"ok"###), r#####"OK"#####),
(Key(r#"pt"#, r##"yes-no"##, r###"yes"###), r#####"Sim"#####),
(
Key(r#"zh"#, r##"yes-no"##, r###"cancel"###),
r#####"取消"#####,
),
(Key(r#"zh"#, r##"yes-no"##, r###"no"###), r#####"否"#####),
(Key(r#"zh"#, r##"yes-no"##, r###"ok"###), r#####"确定"#####),
(Key(r#"zh"#, r##"yes-no"##, r###"yes"###), r#####"是"#####),
],
}
}
output_bincode(): Serializes data into a separate bincode file for each language.
{outdir}/{language}{bincode_suffix}
output_bincode_all_in_one()
{outdir}/all{bincode_suffix}
output_bincode()../../locales/en/unread.dsl.toml:
num-to-en = """
$num ->
[0] zero
[1] one
[2] two
[3] three
*[other] {$num}
"""
unread = "unread message"
unread-count = """
$num ->
[0] No {unread}s.
[1] You have { num-to-en } {unread}.
*[other] You have { num-to-en } {unread}s.
"""
show-unread-messages-count = "{unread-count}"
rust:
use glossa_codegen::{L10nResources, Generator, generator::MapType};
use glossa_shared::decode::file::decode_single_file_to_dsl_map;
use std::path::Path;
// -------------------
// Encode
let resources = crate::L10nResources::new("../../locales/");
// Output to tmp/{language}_dsl.bincode
Generator::default()
.with_resources(resources)
.with_outdir("tmp")
.with_bincode_suffix("_dsl.bincode".into())
.output_bincode(MapType::DSL)?;
// ------------------
// Decode
let file = Path::new("tmp").join("en_dsl.bincode");
let dsl_maps = decode_single_file_to_dsl_map(file)?;
let unread_resolver = dsl_maps
.get("unread")
.expect("Failed to get AST (map_name: unread)");
let get_text = |num_str| {
unread_resolver
.get_with_context("show-unread-messages-count", &[("num", num_str)])
};
let one = get_text("1")?;
assert_eq!(one, "You have one unread message.");
let zero = get_text("0")?;
assert_eq!(zero, "No unread messages.");
Ok(())
TLDR: Pre-render syntax-highlighted texts into constants for performance.
glossa-codegen supports rendering localized texts into syntax-highlighted content and converting them into Rust code and bincode.
Q: Why pre-render syntax highlighting?
A: For performance optimization.
&'static str constants is orders of magnitude faster than rendering syntax highlighting at runtime using regex.Q: Where are pre-rendered syntax-highlighted strings useful?
A: Ideal for CLI applications.

pub type HighlightCfgMap<'h> = HashMap<DerivedMapKey, SyntaxHighlightConfig<'h>>;
pub struct DerivedMapKey {
/// The base map name (e.g., "help-markdown").
base_name: KString,
/// A suffix to differentiate derived maps (e.g., "_monokai").
suffix: KString,
}
pub struct SyntaxHighlightConfig<'r> {
resource: HighlightResource<'r>,
/// Syntax name (e.g., "md" for Markdown).
syntax_name: MiniStr,
/// Whether to use true color (24-bit RGB).
true_color: bool,
}
pub struct HighlightResource<'theme> {
/// Theme name (e.g., "Monokai Extended").
theme_name: MiniStr,
/// Lazily initialized theme.
theme: OnceLock<&'theme Theme>,
/// Theme set (collection of themes).
theme_set: &'theme ThemeSet,
/// Syntax set (collection of syntax definitions).
syntax_set: &'theme SyntaxSet,
/// Whether to enable background.
background: bool,
}
Basic Usage:
generator
.with_highlight(HighlightCfgMap::default())
.output_bincode(MapType::Highlight);
Note: The above code will not run until
HighlightCfgMapis properly configured. ReplaceHighlightCfgMap::default()with valid data to make it work.
Key Concepts:
HighlightCfgMap applies different syntax highlighting configurations to multiple maps.
Example Path Structure:
en/
├── help-markdown.toml // Base map: help-markdown
└── a-zsh.toml // Base map: a-zsh
<
// help-markdown_monokai
(DerivedMapKey {
base_name: "help-markdown",
suffix: "_monokai",
},
SyntaxHighlightConfig {
resource: HighlightResource {
theme_name: "Monokai Extended",
background: true,
...
},
syntax_name: "md",
true_color: true,
}),
// help-markdown_ayu
(DerivedMapKey {
base_name: "help-markdown",
suffix: "_ayu",
},
SyntaxHighlightConfig {
resource: HighlightResource {
theme_name: "ayu-dark",
background: false,
...
},
syntax_name: "md",
true_color: false,
}),
// a-zsh_custom2
(DerivedMapKey {
base_name: "a-zsh",
suffix: "_custom2",
},
SyntaxHighlightConfig {
resource: HighlightResource {
theme_set: custom_theme_set(),
theme_name: "OneDark-pro vivid",
background: false,
...
},
syntax_name: "sh",
true_color: true,
})
>
DerivedMapKey
base_name: References an existing regular map (e.g., "help-markdown").suffix: Appended to base_name to create a new derived map (e.g., "help-markdown_monokai").format!("{base_name}{suffix}") does not clash with existing map names.SyntaxHighlightConfig
syntax_name: The language syntax (e.g., "md" for Markdown).
SyntaxSet via HighlightResource.true_color:
HighlightResource
fn new_highlight_map<'a>() -> HighlightCfgMap<'a> {
let mut hmap = HighlightCfgMap::default();
hmap.insert(
DerivedMapKey::default()
.with_base_name("md".into())
.with_suffix("_md".into()),
SyntaxHighlightConfig::default()
.with_syntax_name("md".into())
.with_true_color(false),
);
hmap.insert(
DerivedMapKey::default()
.with_base_name("md".into())
.with_suffix("_md_ayu_dark".into()),
SyntaxHighlightConfig::default()
.with_resource(
HighlightResource::default()
.with_theme_name("ayu-light".into())
.with_background(false),
)
.with_syntax_name("md".into()),
);
hmap.insert(
DerivedMapKey::default()
.with_base_name("t".into())
.with_suffix("_toml".into()),
SyntaxHighlightConfig::default().with_syntax_name("toml".into()),
);
hmap
}
let highlight_generator = Generator::default()
.with_resources(L10nResources::new("locales"))
.with_outdir("tmp")
.with_highlight(new_highlight_map())
.with_bincode_suffix(".highlight.bincode".into());
highlight_generator.output_bincode_all_in_one(MapType::Highlight)