token_trekker_rs

Crates.iotoken_trekker_rs
lib.rstoken_trekker_rs
version0.1.3
sourcesrc
created_at2023-03-22 00:35:31.696294
updated_at2023-03-22 00:42:44.47864
descriptionA fun and efficient Rust library to count tokens in text files using different tokenizers.
homepage
repository
max_upload_size
id816617
size53,911
rahul (1rgs)

documentation

README

token_trekker_rs

token_trekker_rs is a command-line tool for counting the total number of tokens in all files within a directory or matching a glob pattern, using various tokenizers.

Features

  • Supports multiple tokenizer options
  • Parallel processing for faster token counting
  • Outputs results in a colorized table

Installation

To install token_trekker_rs from crates.io, run:

cargo install token_trekker_rs

Building from Source

To build token_trekker_rs from the source code, first clone the repository:

git clone https://github.com/1rgs/token_trekker_rs.git
cd token_trekker_rs

Then build the project using cargo:

cargo build --release

The compiled binary will be available at ./target/release/token-trekker.

Usage

To count tokens in a directory or for files matching a glob pattern, run the following command:

token-trekker --path <path_or_glob_pattern> <tokenizer>

Replace <path_or_glob_pattern> with the path to the directory or the glob pattern of the files to process, and with one of the available tokenizer options:

  • p50k-base
  • p50k-edit
  • r50k-base
  • cl100k-base
  • gpt2

For example:

token_trekker_rs --path "path/to/files/*.txt" p50k-base
Commit count: 0

cargo fmt