| Crates.io | pinyin-sort |
| lib.rs | pinyin-sort |
| version | 0.1.1 |
| created_at | 2025-08-08 17:49:54.345882+00 |
| updated_at | 2025-08-08 17:49:54.345882+00 |
| description | A tool to sort pinyin |
| homepage | https://github.com/acture/pinyin-sort |
| repository | https://github.com/acture/pinyin-sort |
| max_upload_size | |
| id | 1787156 |
| size | 9,908,312 |
A small Rust CLI that sorts Chinese strings by their Hanyu Pinyin (tone3) order, with sensible tie‑breaking by the original character and flexible output formatting. It can read input from files or directly from command‑line text arguments. A simple TOML override file lets you correct or customize pinyin for specific characters or phrases.
Note
This repository generates a large static map from codepoint to pinyin at build time from the vendored pinyin-data source.
Pinyin syllables are normalized to tone3 style (e.g., han4, zhao4).
Sort a list of Chinese strings by pinyin
Deterministic tie‑breaking by original character when pinyin matches
Accept input via files or inline text
Highly configurable output formatting (columns, alignment, padding, separators, blank line cadence)
Optional pinyin override file (TOML) for characters and phrases
Reproducible development environment with Nix and Just tasks
Prerequisites:
Steps:
pip install pypinyinpython3 scripts/convert_pinyin_to_csv.pyjust prep-datacargo build --releasetarget/release/pinyin-sortThis repo includes a flake and a development shell.
nix developjust prep-datajust buildnix buildtarget/release/pinyin-sort
Basic help:
pinyin-sort -hInputs are provided either as files or inline text. If neither is provided, the tool prints its help and exits.
Examples:
pinyin-sort -t 汉字 张三 赵四pinyin-sort -f ./data.txtpinyin-sort -f a.txt b.txt --columns 5 --entry-width 6 --align center --separator ","Behavior overview:
Exit codes:
0 on success
Non‑zero on I/O or configuration parsing errors (e.g., reading files, loading override TOML)
These options are defined in src/args.rs and parsed via clap.
Inputs
Output destination
Pinyin overrides
Formatting
Note: When using shell characters like tab or newline on the command line, ensure they are quoted or escaped appropriately for your shell.
You can customize pinyin for specific characters or phrases. Provide a TOML file via --config.
Schema (see src/override.rs):
Example override.toml:
[char_override] '重' = "chong2" '行' = "xing2"
[phrase_override] "重庆" = ["chong2", "qing4"] "银行" = ["yin2", "hang2"]
Usage:
pinyin-sort -t 重庆 -t 重庆市 --config ./override.tomlNotes:
phrase_override takes precedence when the full input matches a phrase key.
For characters not listed in the overrides, built‑in data is used.
Generated file:
Data preparation:
Build steps:
Ensure data/pinyin.csv exists (create it via the script above).
Run cargo build (or cargo build --release). The build script regenerates src/generated/pinyin_map.rs when data/pinyin.csv changes.
The library code includes simple helpers:
Caveats:
The pinyin_of function relies on generated data and optional overrides. It returns per‑character pinyin (first reading or the override for the position).
Tests: run cargo test
Dev shell (Nix): nix develop
Just recipes: just prep-data, just build
AGPL-3.0-only. See Cargo.toml.