Crates.io | pinyin-parser |
lib.rs | pinyin-parser |
version | 0.1.9 |
source | src |
created_at | 2021-06-28 04:44:20.239522 |
updated_at | 2024-01-06 23:00:51.528577 |
description | Parses a string of pinyin syllables. Covers marginal cases such as `ẑ`, `ŋ` and `ê`. |
homepage | |
repository | https://github.com/sozysozbot/pinyin-parser-rs |
max_upload_size | |
id | 415637 |
size | 65,629 |
Parses a string of pinyin syllables. Covers marginal cases such as ẑ
, ŋ
and ê
.
Since pinyin strings in the wild does not necessarily conform to the standard, this parser offers two modes: strict and loose.
Strict mode:
ɡ
(U+0261) instead of g
, and other such lookalike charactersa
, an e
or an o
use pinyin_parser::PinyinParser;
assert_eq!(
PinyinParser::strict("jīntiān")
.into_iter()
.collect::<Vec<_>>(),
vec!["jīn", "tiān"]
);
The resulting strings are NFC-normalized (i.e. the sample above gives a single-character ī
U+012B)
Erhua is supported.
use pinyin_parser::PinyinParser;
assert_eq!(
PinyinParser::strict("yīdiǎnr chàng'gēr")
.collect::<Vec<_>>(),
vec!["yī", "diǎnr"]
);
If you want r
to be separated from the main syllable, use .split_erhua()
.
Note that syllables "er", "ēr", "ér", "ěr", and "èr" are exempt from this splitting.
use pinyin_parser::PinyinParser;
assert_eq!(
PinyinParser::strict("yīdiǎnr chànggēr shuāng'ěr língtīng").split_erhua().collect::<Vec<_>>(),
vec![
"yī", "diǎn", "r",
"chàng", "gē", "r",
"shuāng", "ěr",
"líng", "tīng"
]
);
This parser supports the use of ẑ
, ĉ
, ŝ
and ŋ
, though I have never seen anyone use it.
use pinyin_parser::PinyinParser;
assert_eq!(
PinyinParser::strict("Ẑāŋ").into_iter().collect::<Vec<_>>(),
vec!["zhāng"]
)
use pinyin_parser::PinyinParser;
assert_eq!(
// An apostrophe can come only before an `a`, an `e` or an `o` in strict mode,
// but allowed here because it's loose
PinyinParser::loose("Yīng'guó")
.into_iter()
.collect::<Vec<_>>(),
vec!["yīng", "guó"]
);