| Crates.io | hibachi |
| lib.rs | hibachi |
| version | 0.1.9 |
| created_at | 2025-04-15 01:42:42.669389+00 |
| updated_at | 2025-04-25 19:26:39.97871+00 |
| description | Asynchronous Batched Inference Platform |
| homepage | |
| repository | https://www.github.com/kasandell/hibachi |
| max_upload_size | |
| id | 1633825 |
| size | 328,156 |
Efficient batched inference tensor models

Hibachi is a Rust library for efficient batched inference with autoregressive and feedforward models. It dynamically groups multiple generation requests into batches, manages tensor operations, and streams results back to clients as they become available.
Backend trait, includes implementations for Candle and Burn backends (max Burn tensor rank of 9)Add this to your Cargo.toml:
[dependencies]
hibachi = {version = "0.1.0", features = ["candle", "autoregressive"] }# burn, feedforward flags available as well
tokio = { version = "1", features = ["full"] }
This package is still in its early stages. Until 1.x releases, hibachi reserves the right to break interfaces. Though we will try our best not to,
this packaage is in its infancy, and may need to be adjusted as it grows.
use hibachi::autoregressive::{Autoregressive, AutoregressiveBatcher, AutoregressiveBatchInference};
use hibachi::backend::{Backend, Unsqueezable};
use std::sync::Arc;
use candle_core::{Tensor, Device, DType};
// 1. Implement the Autoregressive trait for your model
struct MyModel { /* ... */ }
#[async_trait]
impl Autoregressive<Tensor> for MyModel {
async fn forward(&self, tensor: Tensor) -> Tensor {
// Implement your model's forward pass
}
}
// 3. Create the batched inference engine
#[tokio::main]
async fn main() {
// Initialize model
let model = MyModel::new();
let device = Device::Cpu;
// will be of rank + 1
let stop_token = Tensor::ones(&[1], DType::U8, &device).unwrap();
let padding_token = Tensor::zeros(&[1], DType::U8, &device).unwrap();
// Create inference engine with max batch size of 16
let engine = AutoregressiveBatchInference::<Tensor, 16>::new(
model,
&stop_token,
&padding_token
);
// Process requests
let input = Tensor::arange(2., 5., &device);
let mut stream = engine.run(input).await;
// Stream results
while let Some(token) = stream.next().await {
println!("Generated token: {:?}", token);
}
}
Tensor Batch consists of several core components:
Backend Abstraction
Autoregressive Models
Feedforward Models
Batching Engine
Communication Layer
To use with a custom tensor library, implement the Backend and Unsqueezable traits:
use hibachi::backend::{Backend, Unsqueezable};
impl Backend for MyCustomTensor {
fn shape(&self) -> Vec<usize> { /* ... */ }
fn clone(&self) -> Self { /* ... */ }
// ... implement other required methods
}
impl Unsqueezable for MyCustomTensor {
type Unsqueezed = MyCustomTensorHigherDim;
fn unsqueeze(&self, dim: usize) -> Self::Unsqueezed { /* ... */ }
}
Implement the Autoregressive trait for your model:
use hibachi::autoregressive::Autoregressive;
use async_trait::async_trait;
#[async_trait]
impl Autoregressive<Tensor> for MyTransformerModel {
async fn forward(&self, tensor: <Tensor as Unsqueezable>::Unsqueezed) -> Tensor {
// Your transformer forward logic here
// Input shape: (batch, seq, ...)
// Output shape: (batch, ...)
}
}
Implement the Autoregressive trait for your model:
use hibachi::autoregressive::Autoregressive;
use async_trait::async_trait;
#[async_trait]
impl Feedforward<Tensor, Tensor> for MyTransformerModel {
async fn forward(&self, tensor: Tensor) -> Tensor {
// Your feedforward forward logic here
// Input shape: (batch, ...)
// Output shape: (batch, ...)
}
}
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.