bytepiece_rs

Crates.iobytepiece_rs
lib.rsbytepiece_rs
version0.2.2
sourcesrc
created_at2023-09-18 23:55:16.673846
updated_at2023-11-12 08:47:09.081883
descriptionThe Bytepiece Tokenizer Implemented in Rust
homepage
repositoryhttps://github.com/hscspring/bytepiece-rs
max_upload_size
id976379
size5,819,268
Yam (hscspring)

documentation

README

bytepiece-rs

Usage

use bytepice_rs::Tokenizer;

let tokenizer = Tokenizer::new();
// or load a custom model
let tokenizer = Tokenizer::load_from("/path/to/model");
let text = "今天天气不错";
let ids = tokenizer.encode(text, false, false, alpha=0.0);
assert_eq!(ids, vec![40496, 45268, 39432]);
let text2 = tokenizer.decode(ids);
assert_eq!(text2, text);

Benchmark & Test

cargo test
cargo bench -- --plotting-backend gnuplot
Commit count: 33

cargo fmt