| Crates.io | token-counter |
| lib.rs | token-counter |
| version | 0.1.0 |
| created_at | 2024-07-04 21:35:24.178371+00 |
| updated_at | 2024-07-04 21:35:24.178371+00 |
| description | `wc` for tokens: count tokens in files with HF Tokenizers |
| homepage | |
| repository | https://github.com/EndlessReform/token-counter |
| max_upload_size | |
| id | 1292122 |
| size | 41,209 |
tc is a CLI tool for counting tokens in text files, as a lightweight wrapper around the HuggingFace Tokenizers crate. It's like the Unix wc command, but for tokens instead of words.
cargo install token-counter
Using default tokenizer (cl100k, the tokenizer for GPT-3.5 and GPT-4):
tc file1.md file2.md
Using globs:
tc *.md
Arguments:
-m, --model: HuggingFace ID of the model for tokenizer (ex. google-bert/bert-base-uncased)