Crates.io | charset |
lib.rs | charset |
version | 0.1.5 |
source | src |
created_at | 2018-11-04 10:51:02.383282 |
updated_at | 2024-07-21 14:04:50.814592 |
description | Character encoding decoding for email |
homepage | https://docs.rs/charset/ |
repository | https://github.com/hsivonen/charset |
max_upload_size | |
id | 94638 |
size | 62,887 |
charset
is a wrapper around encoding_rs
that provides
(non-streaming) decoding for character encodings that occur in email by
providing decoding for UTF-7 in addition to the encodings defined by
the Encoding Standard (and provided by encoding_rs
).
Note: Do not use this crate for consuming Web content. For security
reasons, consumers of Web content are prohibited from supporting
UTF-7. Use encoding_rs
directly when consuming Web content.
The set of encodings consisting of UTF-7 and the encodings defined in the
Encoding Standard is believed to be appropriate for consuming email,
because that's the set of encodings supported by Thunderbird.
Furthermore, UTF-7 support is believed to be necessary based on the
experience of the Firefox OS email client. In fact, while the UTF-7
implementation in this crate is independent of Thunderbird's UTF-7
implementation, Thunderbird uses encoding_rs
to decode the other
encodings. In addition to the labels defined in the Encoding Standard,
this crate recognizes additional java.io
and java.nio
names for
compatibility with JavaMail. For UTF-7, IANA and Netscape 4.0 labels
are recognized.
Known compatibility limitations (known from Thunderbird bug reports):
This crate intentionally does not support encoding content into legacy
encodings. When sending email, always use UTF-8. This is, just call
.as_bytes()
on &str
and label the content as UTF-8
.
Logically this crate should be at version 1.0, but it's not worth the hassle
to do a version number semver break when there's no actual API break. The
expectation is to do 1.0 when encoding_rs
1.0 comes along.
Apache-2.0 OR MIT; please see the file named COPYRIGHT.
Generated API documentation is available online.
Again, this crate is for email. Please do NOT use it for Web content.
Never try to perform any security analysis on the undecoded data in ASCII-incompatible encodings and in UTF-7 in particular. Always decode first and analyze after. UTF-7 allows even characters that don't have to be represented as base64 to be represented as base64. Also, for consistency with Thunderbird, the UTF-7 decoder in this crate allows e.g. ASCII controls to be represented without base64 encoding even when the spec says they should be base64-encoded.
This implementation is non-constant-time by design. An attacker who can observe input length and the time it takes to decode it can make guesses about relative proportions of characters from different ranges. Guessing the proportion of ASCII vs. non-ASCII should be particularly feasible.
The cargo features serde
enables Serde support for Charset
.
The MSRV depends on the encoding_rs
and base64
dependencies; not on this
crate. The current MSRV appears to be 1.47.0. This crate does not undergo
semver bumps for base64
semver bumps.
This is a personal project. It has a Mozilla copyright notice, because I copied and pasted from encoding_rs. You should not try to read anything more into Mozilla's name appearing.
bincode
(dev dependency only) to 1.3.3.base64
to 0.22.1.encoding_rs
to 0.8.34.no_std
+ alloc
crate.base64
to 0.13.0.From<&'static Encoding>
for Charset
.decode_ascii()
.decode_latin1()
.Initial release.