bpe-openai

Crates.io	bpe-openai
lib.rs	bpe-openai
version	0.3.0
created_at	2024-10-07 11:04:33.412453+00
updated_at	2025-05-07 10:48:15.382453+00
description	Prebuilt fast byte-pair encoders for OpenAI.
homepage
repository	https://github.com/github/rust-gems
max_upload_size
id	1399832
size	3,641,454

Alexander Neubeck (aneubeck)

documentation

README

OpenAI Byte Pair Encoders

Fast tokenizers for OpenAI token sets based on the bpe crate. Serialized BPE instances are generated during build and lazily loaded at runtime as static values. The overhead of loading the tokenizers is small because it happens only once per process and only requires deserialization (as opposed to actually building the internal data structures). For convencience it re-exports the bpe crate so that depending on this crate is enough to use these tokenizers.

Supported tokenizers:

cl100k
o200k

Usage

Add a dependency by running

cargo add bpe-openai

or by adding the following to Cargo.toml

[dependencies]
bpe-openai = "0.1"

Counting tokens is as simple as:

use bpe_openai::cl100k;

fn main() {
  let bpe = cl100k();
  let count = bpe.count("Hello, world!");
  println!("{tokens}");
}

For more detailed documentation we refer to bpe.

Commit count: 243

bpe-openai

documentation

README

OpenAI Byte Pair Encoders

Usage

cargo fmt