Crates.io | tiktokenx |
lib.rs | tiktokenx |
version | 0.1.0 |
created_at | 2025-08-22 09:23:40.318139+00 |
updated_at | 2025-08-22 09:23:40.318139+00 |
description | A high-performance Rust implementation of OpenAI's tiktoken library |
homepage | https://github.com/imumesh18/tiktokenx |
repository | https://github.com/imumesh18/tiktokenx |
max_upload_size | |
id | 1806118 |
size | 131,542 |
Fast Rust implementation of OpenAI's tiktoken tokenizer.
[dependencies]
tiktokenx = "0.1"
use tiktokenx::{get_encoding, encoding_for_model};
// Get encoding by name
let enc = get_encoding("cl100k_base").unwrap();
let tokens = enc.encode("hello world", &[], &[]).unwrap();
let text = enc.decode(&tokens).unwrap();
// Get encoding for a model
let enc = encoding_for_model("gpt-4").unwrap();
let token_count = enc.encode("Hello, world!", &[], &[]).unwrap().len();
Model Family | Models | Encoding |
---|---|---|
GPT-5 | gpt-5 | o200k_base |
GPT-4 | gpt-4, gpt-4-turbo, gpt-4o | cl100k_base / o200k_base |
GPT-3.5 | gpt-3.5-turbo | cl100k_base |
o1 | o1, o1-mini, o1-preview | o200k_base |
Legacy | text-davinci-003, code-davinci-002 | p50k_base |
Benchmarks on Apple M1 Pro comparing tiktokenx vs Python tiktoken:
Implementation | Operation | Time | Throughput | Memory | vs Python |
---|---|---|---|---|---|
Python tiktoken | Encode short text | 5.7 μs | 4.8 MiB/s | 0.1 MB | 1.0x |
tiktokenx | Encode short text | 4.1 μs | 6.7 MiB/s | 0.5 MB | 1.4x |
Python tiktoken | Encode long text | 482.1 μs | 8.9 MiB/s | 0.1 MB | 1.0x |
tiktokenx | Encode long text | 175.4 μs | 24.5 MiB/s | 2.0 MB | 2.7x |
tiktokenx is 2.1x faster and uses 0.1x less memory on average!
# Run tests
cargo test
# Run benchmarks
cargo bench
# Check formatting
cargo fmt --check
# Run clippy
cargo clippy -- -D warnings
Contributions are welcome! Please open an issue or submit a pull request.
MIT