Crates.io | unidecode |
lib.rs | unidecode |
version | 0.3.0 |
source | src |
created_at | 2015-03-17 07:18:43.102523 |
updated_at | 2016-12-26 02:15:11.105231 |
description | Provides pure ASCII transliterations of Unicode strings. |
homepage | https://github.com/chowdhurya/rust-unidecode/ |
repository | https://github.com/chowdhurya/rust-unidecode/ |
max_upload_size | |
id | 1593 |
size | 1,960,833 |
The rust-unidecode
library is a Rust port of Sean M. Burke's famous
Text::Unidecode
module for Perl. It transliterates Unicode strings such as "Æneid" into pure
ASCII ones such as "AEneid." For a detailed explanation on the rationale behind
using such a library, you can refer to both the documentation of the original
module and
this article written
by Burke in 2001.
The data set used to translate the Unicode was ported directly from the
Text::Unidecode
module using a Perl script, so rust-unidecode
should produce
identical output.
extern crate unidecode;
use unidecode::unidecode;
assert_eq!(unidecode("Æneid"), "AEneid");
assert_eq!(unidecode("étude"), "etude");
assert_eq!(unidecode("北亰"), "Bei Jing");
assert_eq!(unidecode("ᔕᓇᓇ"), "shanana");
assert_eq!(unidecode("げんまい茶"), "genmaiCha ");
Here are some guarantees you have when calling unidecode()
:
String
returned will be valid ASCII; the decimal representation of
every char
in the string will be between 0 and 127, inclusive."\n"
) or ASCII characters in the range 0x0020 - 0x007E. So for example,
no Unicode character will translate to \u{01}
. The exception is if the
ASCII character itself is passed in, in which case it will be mapped to
itself. (So '\u{01}'
will be mapped to "\u{01}"
.)There are, however, some things you should keep in mind:
\n
characters.rust-unidecode
does not know about the character."[?]"
.This information was paraphrased from the original Text::Unidecode
documentation.