| Crates.io | unitoken |
| lib.rs | unitoken |
| version | 0.1.1 |
| created_at | 2025-12-17 22:10:20.119318+00 |
| updated_at | 2025-12-18 06:46:21.605586+00 |
| description | Fast BPE tokenizer/trainer with a Rust core and Python bindings |
| homepage | |
| repository | https://github.com/a-gradient/unitoken |
| max_upload_size | |
| id | 1991212 |
| size | 233,127 |
unitoken is a fast BPE tokenizer/trainer with a Rust core and optional Python bindings.
Rust:
cargo add unitoken
Python (wheels via PyPI):
pip install uni-tokenizer
from uni_tokenizer import BpeTrainer, BpeEncoder
trainer = BpeTrainer(["<|endoftext|>"]) # first token is treated as EOT
trainer.add_words({"hello": 10, "world": 7})
trainer.train(vocab_size=256)
trainer.save("demo")
enc = BpeEncoder.load("demo")
ids = enc.encode_word("hello")
This project uses maturin for the Python extension module.
maturin develop