| Crates.io | trustformers-serve |
| lib.rs | trustformers-serve |
| version | 0.1.0-alpha.1 |
| created_at | 2025-11-09 10:46:25.912844+00 |
| updated_at | 2025-11-09 10:46:25.912844+00 |
| description | High-performance inference server for TrustformeRS models |
| homepage | https://github.com/cool-japan/trustformers |
| repository | https://github.com/cool-japan/trustformers |
| max_upload_size | |
| id | 1923982 |
| size | 11,447,543 |
High-performance inference server for TrustformeRS models with advanced batching and optimization capabilities.
The dynamic batching system automatically groups inference requests to maximize throughput while maintaining low latency. Key features include:
use trustformers_serve::{BatchingConfig, BatchingMode, OptimizationTarget};
use std::time::Duration;
let config = BatchingConfig {
max_batch_size: 32, // Maximum requests per batch
min_batch_size: 4, // Minimum batch size before timeout
max_wait_time: Duration::from_millis(50), // Maximum wait time for batch formation
enable_adaptive_batching: true, // Enable load-based adaptation
mode: BatchingMode::Dynamic, // Batching mode
optimization_target: OptimizationTarget::Balanced, // Optimization goal
memory_limit: Some(1024 * 1024 * 100), // 100MB memory limit
enable_priority_scheduling: true, // Enable priority-based scheduling
..Default::default()
};
Built-in metrics collection provides real-time insights:
use trustformers_serve::{
DynamicBatchingService, BatchingConfig,
Request, RequestInput, Priority,
};
#[tokio::main]
async fn main() -> Result<()> {
// Configure batching
let config = BatchingConfig::default();
// Create and start service
let service = DynamicBatchingService::new(config);
service.start().await?;
// Submit request
let request = Request {
id: RequestId::new(),
input: RequestInput::Text {
text: "Hello, world!".to_string(),
max_length: Some(100),
},
priority: Priority::Normal,
submitted_at: Instant::now(),
deadline: None,
metadata: Default::default(),
};
let result = service.submit_request(request).await?;
println!("Result: {:?}", result);
// Get statistics
let stats = service.get_stats().await;
println!("Throughput: {:.1} req/s", stats.metrics_summary.throughput_rps);
Ok(())
}
Prevents out-of-memory errors by tracking memory usage:
let config = BatchingConfig {
memory_limit: Some(1024 * 1024 * 512), // 512MB limit
dynamic_config: DynamicBatchConfig {
memory_aware: true,
padding_strategy: PaddingStrategy::Minimal,
enable_bucketing: true,
bucket_boundaries: vec![128, 256, 512, 1024],
..Default::default()
},
..Default::default()
};
Handle critical requests with higher priority:
let critical_request = Request {
priority: Priority::Critical,
deadline: Some(Instant::now() + Duration::from_millis(100)),
..default_request
};
Optimized for text generation with incremental decoding:
let config = BatchingConfig {
mode: BatchingMode::Continuous,
optimization_target: OptimizationTarget::Throughput,
..Default::default()
};
See the examples/ directory for comprehensive demonstrations:
dynamic_batching_demo.rs: Complete demonstration of all batching featuresLicensed under either of
at your option.