Crates.io | transcoding_rs |
lib.rs | transcoding_rs |
version | 0.1.1 |
source | src |
created_at | 2021-10-24 12:28:42.874849 |
updated_at | 2021-10-24 12:39:16.08423 |
description | Converts text encoding the easy and efficient way |
homepage | |
repository | https://github.com/kena0ki/aconv/tree/main/transcoding_rs |
max_upload_size | |
id | 470333 |
size | 70,198 |
This is a transcoding library. Transcoding here means converting text encoding to another.
There are two excellent crates chardetng
and encoding_rs
.
chardetng
is created for encoding detection and encoding_rs
can be used for transcoding.
This library aims to transcode the easy and efficient way by combining these two crates.
Note: Supported encodings are the ones defined in the Encoding Standard.
Note: UTF-16 files are needed to have a BOM to be detected as the encoding.
This is because chardetng
, on which this library depends, does not support UTF-16 and this library only added BOM sniffing to detect UTF-16.
See the document.
Since texts are internally just byte sequences, there is no way to detect the right encoding with 100% accuracy.
So we need to guess the right encoding somehow.
The below is the flow we roughly follow.
chardetng
.encoding_rs
.Characters that are treated as non-text in this library are the same ones in the file
command, plus the REPLACEMENT CHARACTER.
Namely, U+0000 ~ U+0006, U+000e ~ U+001a, U+001c ~ U+001f, U+007f, and U+FFFD are treated as the non-text characters.
Licensed under either of
at your option.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.