| Crates.io | token_processor |
| lib.rs | token_processor |
| version | 0.1.6 |
| created_at | 2025-04-21 23:54:10.117904+00 |
| updated_at | 2025-04-22 05:08:41.397494+00 |
| description | A fast, streaming‑first Rust library for processing LLM outputs by attaching callbacks to XML‑style tags—supporting both streaming and buffered handlers—and using aho‑corasick for ultra‑efficient, cross‑chunk pattern matching on decoded text tokens. |
| homepage | |
| repository | https://github.com/ljt019/token_processor/ |
| max_upload_size | |
| id | 1643376 |
| size | 53,509 |
A fast, streaming‐oriented token processor for Large Language Model output in Rust.
It's meant to be used with already decoded text tokens/chunks.
aho-corasick for efficient multi-pattern scanning, including cross‐chunk matches.Add this to your Cargo.toml:
[dependencies]
token_processor = { path = "https://github.com/ljt019/token_processor"}
Or use cargo:
cargo add token_processor
use token_processor::{Tag, TokenProcessorBuilder};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let mut processor = TokenProcessorBuilder::new(1024)
.streaming_tag(
Tag::new("<think>"),
|| print!("[open] "),
|chunk: &str| print!("{}", chunk),
|| print!(" [close]"),
)
.buffered_tag(Tag::new("<tool>"), |payload: String| async move {
println!("[tool payload] {}", payload);
})
.raw_tokens(|chunk: &str| print!("{}", chunk))
.build()?;
processor.process("Hello <think>world</think> <tool>data</tool>!").await?;
processor.flush().await?;
Ok(())
}
Explore the examples/ folder for more usage scenarios:
simple.rs – raw tokens onlystreaming_tags.rs – streaming‐mode tag handlingbuffered_tags.rs – buffered‐mode tag handlingRun the full test suite:
cargo test
Licensed under MIT OR Apache‐2.0. See LICENSE for details.