Crates.io | static-lang-word-lists |
lib.rs | static-lang-word-lists |
version | 0.3.1 |
created_at | 2025-08-21 10:20:30.401541+00 |
updated_at | 2025-09-23 10:50:55.887036+00 |
description | Runtime decompressed statically-included word lists |
homepage | |
repository | https://github.com/googlefonts/fontheight |
max_upload_size | |
id | 1804551 |
size | 53,242 |
static-lang-word-lists
A collection of word lists for various scripts, compressed at build time, baked into the binary, and decompressed lazily at run time.
Include word lists in the binary, don't take up more space than necessary, be publishable on crates.io (10 MiB size limit)
On crates.io as static-lang-word-lists
For documentation, please refer to docs.rs
A build script that downloads the word lists from GitHub, compresses them with Brotli, and embeds that data in the binary, lazily decompressed at runtime
Note: adding or removing a wordlist requires that egg.py
be re-run
egg.py
(this step is manual and be run when adding or removing a word list!). Their names are generated based on their path - data/diffenator/latin.txt
becomes DIFFENATOR_LATIN
in the end crateegg.py
(called chicken.rs
) to construct URLs to download filesword_list_codegen.rs
which uses the wordlist!
macro to make the structs for accessing the data for crate consumers, as well as map_codegen.rs
which uses phf
to construct a lookup table for the word listsTo build using local files, set the STATIC_LANG_WORD_LISTS_LOCAL
environment variable
Diffenator wordlists are from diffenator2. Apache-2.0 licensed.
Emoji wordlists are from unicode.org. Unicode licensed.
AOSP word lists are from the aosp-test-texts, using the files produced by scripts/extract_words.py
. Apache-2.0 licensed.