| Crates.io | static-lang-word-lists |
| lib.rs | static-lang-word-lists |
| version | 0.4.1 |
| created_at | 2025-08-21 10:20:30.401541+00 |
| updated_at | 2025-10-29 11:56:04.064601+00 |
| description | Runtime decompressed statically-included word lists |
| homepage | |
| repository | https://github.com/googlefonts/fontheight |
| max_upload_size | |
| id | 1804551 |
| size | 306,113 |
static-lang-word-listsA collection of word lists for various scripts, compressed at build time, baked into the binary, and decompressed lazily at run time.
Include word lists in the binary, don't take up more space than necessary, be publishable on crates.io (10 MiB size limit)
On crates.io as static-lang-word-lists
For documentation, please refer to docs.rs.
A build script that downloads the word lists from GitHub, compresses them with Brotli, and embeds that data in the binary, lazily decompressed at runtime
Note: adding or removing a wordlist requires that cargo xtask slwl be re-run.
See the xtasks' README for more details on what it's doing.
This README only concerns the build script's role
cargo xtask slwl (this step is manual and must be run when adding or removing a word list!)chicken.rs, which is include!d in the build script)OUT_DIR under their relative path, where static-lang-word-lists/src/declarations.rs is expecting themTo build using local files, set the STATIC_LANG_WORD_LISTS_LOCAL environment variable
static-lang-word-lists/data, in a subdirectory with a kebab-case name for your sourcecargo xtask slwl. It'll emit crate feature definitions to stdout, copy & paste over the existing [feature] table in static-lang-word-lists/Cargo.tomlcargo build --package static-lang-word-lists. You will need the STATIC_LANG_WORD_LISTS_LOCAL environment variable setMetadata files are TOML files. The ones that live in this crate have the same file name as their word list, only differing in extension.
| Field name | Field type | Required? | Description |
|---|---|---|---|
name |
string | ✔️ | A cosmetic name for the word list, usually in snake_case |
script |
string | ❌ | An ISO 15924 four-letter capitalised code* |
language |
string | ❌ | An ISO 639-1 two-letter lowercase code* |
(* this is not enforced, but will at least be true of crate-provided word lists.)
Diffenator wordlists are from diffenator2. Apache-2.0 licensed.
Emoji wordlists are from unicode.org. Unicode licensed.
AOSP word lists are from the aosp-test-texts, using the files produced by scripts/extract_words.py in that repo. Apache-2.0 licensed.
LibreOffice word lists are generated from the LibreOffice dictionaries repo, using the files produced by scripts/import_libreoffice.py. MPL v2.0 and LGPL v3+ dual-licensed.