Crates.io | token-counter |
lib.rs | token-counter |
version | 0.1.0 |
source | src |
created_at | 2024-07-04 21:35:24.178371 |
updated_at | 2024-07-04 21:35:24.178371 |
description | `wc` for tokens: count tokens in files with HF Tokenizers |
homepage | |
repository | https://github.com/EndlessReform/token-counter |
max_upload_size | |
id | 1292122 |
size | 41,209 |
tc
is a CLI tool for counting tokens in text files, as a lightweight wrapper around the HuggingFace Tokenizers crate. It's like the Unix wc
command, but for tokens instead of words.
cargo install token-counter
Using default tokenizer (cl100k, the tokenizer for GPT-3.5 and GPT-4):
tc file1.md file2.md
Using globs:
tc *.md
Arguments:
-m
, --model
: HuggingFace ID of the model for tokenizer (ex. google-bert/bert-base-uncased
)