maybe_utf8

Crates.iomaybe_utf8
lib.rsmaybe_utf8
version0.2.3
sourcesrc
created_at2015-01-14 17:10:08.222804
updated_at2015-12-11 23:54:10.497015
descriptionByte container optionally encoded as UTF-8
homepagehttps://github.com/lifthrasiir/rust-maybe_utf8
repositoryhttps://github.com/lifthrasiir/rust-maybe_utf8
max_upload_size
id783
size18,646
Kang Seonghoon (lifthrasiir)

documentation

https://lifthrasiir.github.io/rust-maybe_utf8/

README

MaybeUtf8 0.2.3

MaybeUTF8 on Travis CI

Byte container optionally encoded as UTF-8. It is intended as a byte sequence type with uncertain character encoding, while the caller might be able to determine the actual encoding.

For example, ZIP file format originally didn't support UTF-8 file names, assuming the archive would be extracted only in the system with the same system encoding as the original system. The newer ZIP standard supports explicitly UTF-8-encoded file names though. In this case, the ZIP library may want to return either a String or Vec<u8> depending on the UTF-8 flag.

This crate supports two types, MaybeUtf8Buf (analogous to String) and MaybeUtf8Slice (analogous to &str). Both types support various conversion methods. For example, if you know that the bytes are encoded in ISO 8859-2, Encoding can be used to convert them:

use std::borrow::IntoCow;
use encoding::{Encoding, DecoderTrap};
use encoding::all::ISO_8859_2;
use maybe_utf8::{MaybeUtf8Buf, MaybeUtf8Slice};

let namebuf = MaybeUtf8Buf::from_bytes(vec![99,97,102,233]);
assert_eq!(format!("{}", namebuf), "caf\u{fffd}");

// borrowed slice equally works
{
    let nameslice: MaybeUtf8Slice = namebuf.to_slice();
    assert_eq!(format!("{:?}", nameslice), r#"b"caf\xe9""#);
    assert_eq!(nameslice.map_as_cow(|v| ISO_8859_2.decode(&v, DecoderTrap::Replace).unwrap()),
               "caf\u{e9}");
}

// consuming an optionally-UTF-8-encoded buffer also works
assert_eq!(namebuf.map_into_str(|v| ISO_8859_2.decode(&v, DecoderTrap::Replace).unwrap()),
           "caf\u{e9}");

IntoMaybeUtf8 trait can be used to uniformly accept either string or vector to construct MaybeUtf8* values.

use maybe_utf8::IntoMaybeUtf8;
assert_eq!("caf\u{e9}".into_maybe_utf8(), b"caf\xc3\xa9".into_maybe_utf8());

Complete Documentation is available.

MaybeUtf8 is written by Kang Seonghoon and licensed under the MIT/X11 license.

Commit count: 10

cargo fmt