token_processor

Crates.iotoken_processor
lib.rstoken_processor
version0.1.6
created_at2025-04-21 23:54:10.117904+00
updated_at2025-04-22 05:08:41.397494+00
descriptionA fast, streaming‑first Rust library for processing LLM outputs by attaching callbacks to XML‑style tags—supporting both streaming and buffered handlers—and using aho‑corasick for ultra‑efficient, cross‑chunk pattern matching on decoded text tokens.
homepage
repositoryhttps://github.com/ljt019/token_processor/
max_upload_size
id1643376
size53,509
Lucien Thomas (ljt019)

documentation

README

token_processor

crates.io docs.rs Build Tests Doc Tests

A fast, streaming‐oriented token processor for Large Language Model output in Rust.

It's meant to be used with already decoded text tokens/chunks.

Features

  • Streaming Handlers: Callbacks on tag open, data chunks, and close events in real time.
  • Buffered Handlers: Collect full payload between tags and invoke an async callback on close.
  • High Performance: Uses aho-corasick for efficient multi-pattern scanning, including cross‐chunk matches.

Installation

Add this to your Cargo.toml:

[dependencies]
token_processor = { path = "https://github.com/ljt019/token_processor"}

Or use cargo:

cargo add token_processor

Quickstart

use token_processor::{Tag, TokenProcessorBuilder};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut processor = TokenProcessorBuilder::new(1024)
        .streaming_tag(
            Tag::new("<think>"),
            || print!("[open] "),
            |chunk: &str| print!("{}", chunk),
            || print!(" [close]"),
        )
        .buffered_tag(Tag::new("<tool>"), |payload: String| async move {
            println!("[tool payload] {}", payload);
        })
        .raw_tokens(|chunk: &str| print!("{}", chunk))
        .build()?;

    processor.process("Hello <think>world</think> <tool>data</tool>!").await?;
    processor.flush().await?;
    Ok(())
}

Examples

Explore the examples/ folder for more usage scenarios:

  • simple.rs – raw tokens only
  • streaming_tags.rs – streaming‐mode tag handling
  • buffered_tags.rs – buffered‐mode tag handling

Testing

Run the full test suite:

cargo test

License

Licensed under MIT OR Apache‐2.0. See LICENSE for details.

Commit count: 34

cargo fmt