Crates.io | bpe-openai |
lib.rs | bpe-openai |
version | |
source | src |
created_at | 2024-10-07 11:04:33.412453 |
updated_at | 2024-12-06 10:14:00.175091 |
description | Prebuilt fast byte-pair encoders for OpenAI. |
homepage | |
repository | https://github.com/github/rust-gems |
max_upload_size | |
id | 1399832 |
Cargo.toml error: | TOML parse error at line 17, column 1 | 17 | autolib = false | ^^^^^^^ unknown field `autolib`, expected one of `name`, `version`, `edition`, `authors`, `description`, `readme`, `license`, `repository`, `homepage`, `documentation`, `build`, `resolver`, `links`, `default-run`, `default_dash_run`, `rust-version`, `rust_dash_version`, `rust_version`, `license-file`, `license_dash_file`, `license_file`, `licenseFile`, `license_capital_file`, `forced-target`, `forced_dash_target`, `autobins`, `autotests`, `autoexamples`, `autobenches`, `publish`, `metadata`, `keywords`, `categories`, `exclude`, `include` |
size | 0 |
Fast tokenizers for OpenAI token sets based on the bpe crate.
Serialized BPE instances are generated during build and lazily loaded at runtime as static values.
The overhead of loading the tokenizers is small because it happens only once per process and only requires deserialization (as opposed to actually building the internal data structures).
For convencience it re-exports the bpe
crate so that depending on this crate is enough to use these tokenizers.
Supported tokenizers:
Add a dependency by running
cargo add bpe-openai
or by adding the following to Cargo.toml
[dependencies]
bpe-openai = "0.1"
Counting tokens is as simple as:
use bpe_openai::cl100k;
fn main() {
let bpe = cl100k();
let count = bpe.count("Hello, world!");
println!("{tokens}");
}
For more detailed documentation we refer to bpe.