# pinyin-parser-rs Parses a string of pinyin syllables. Covers marginal cases such as `ẑ`, `ŋ` and `ê`. Since pinyin strings in the wild does not necessarily conform to the standard, this parser offers two modes: strict and loose. Strict mode: * forbids the use of breve instead of hacek to represent the third tone * forbids the use of IPA `ɡ` (U+0261) instead of `g`, and other such lookalike characters * allows apostrophes only before an `a`, an `e` or an `o` ## Examples ```rust use pinyin_parser::PinyinParser; assert_eq!( PinyinParser::strict("jīntiān") .into_iter() .collect::>(), vec!["jīn", "tiān"] ); ``` The resulting strings are NFC-normalized (i.e. the sample above gives a single-character `ī` U+012B) Erhua is supported. ```rust use pinyin_parser::PinyinParser; assert_eq!( PinyinParser::strict("yīdiǎnr chàng'gēr") .collect::>(), vec!["yī", "diǎnr"] ); ``` If you want `r` to be separated from the main syllable, use `.split_erhua()`. Note that syllables "er", "ēr", "ér", "ěr", and "èr" are exempt from this splitting. ```rust use pinyin_parser::PinyinParser; assert_eq!( PinyinParser::strict("yīdiǎnr chànggēr shuāng'ěr língtīng").split_erhua().collect::>(), vec![ "yī", "diǎn", "r", "chàng", "gē", "r", "shuāng", "ěr", "líng", "tīng" ] ); ``` This parser supports the use of `ẑ`, `ĉ`, `ŝ` and `ŋ`, though I have never seen anyone use it. ```rust use pinyin_parser::PinyinParser; assert_eq!( PinyinParser::strict("Ẑāŋ").into_iter().collect::>(), vec!["zhāng"] ) ``` ```rust use pinyin_parser::PinyinParser; assert_eq!( // An apostrophe can come only before an `a`, an `e` or an `o` in strict mode, // but allowed here because it's loose PinyinParser::loose("Yīng'guó") .into_iter() .collect::>(), vec!["yīng", "guó"] ); ```