| Crates.io | llm-latency-lens-providers |
| lib.rs | llm-latency-lens-providers |
| version | 0.1.2 |
| created_at | 2025-11-07 21:00:47.23895+00 |
| updated_at | 2025-11-07 21:15:44.523116+00 |
| description | Provider adapters for LLM Latency Lens |
| homepage | https://github.com/llm-devops/llm-latency-lens |
| repository | https://github.com/llm-devops/llm-latency-lens |
| max_upload_size | |
| id | 1922162 |
| size | 154,991 |
Production-ready provider adapters for LLM Latency Lens, enabling high-precision latency measurements and streaming token analysis across multiple LLM providers.
All providers implement the Provider trait which defines:
#[async_trait]
pub trait Provider: Send + Sync {
fn name(&self) -> &'static str;
async fn health_check(&self) -> Result<()>;
async fn stream(&self, request: StreamingRequest, timing_engine: &TimingEngine) -> Result<StreamingResponse>;
async fn complete(&self, request: StreamingRequest, timing_engine: &TimingEngine) -> Result<CompletionResult>;
fn calculate_cost(&self, model: &str, input_tokens: u64, output_tokens: u64) -> Option<f64>;
fn supported_models(&self) -> Vec<String>;
fn validate_model(&self, model: &str) -> Result<()>;
}
The ProviderError enum provides comprehensive error handling:
HttpError: Network-level errors from reqwestApiError: API-specific errors with status codesAuthenticationError: Invalid API keysRateLimitError: Rate limiting with retry-afterTimeoutError: Request timeoutsStreamingError: SSE streaming errorsSseParseError: SSE parsing errorsEach error implements:
is_retryable(): Whether the error should trigger a retryretry_delay(): Suggested delay before retryAll providers use Server-Sent Events (SSE) for streaming:
TokenEvent with timing dataEach token event includes:
sequence: Token position in streamcontent: Token text contenttimestamp_nanos: Absolute timestamp in nanosecondstime_since_start: Duration from request start (TTFT for first token)inter_token_latency: Time since previous tokenerror.rsComprehensive error types for all provider operations:
traits.rsCore trait definitions and types:
Provider traitStreamingRequest and builderStreamingResponse with token streamCompletionResult with analyticsMessage and MessageRoleResponseMetadataopenai.rsOpenAI Chat Completions API implementation:
Supported Models:
Features:
API Details:
/v1/chat/completionsdata: [DONE] terminatoranthropic.rsAnthropic Messages API implementation:
Supported Models:
Features:
API Details:
/v1/messagesx-api-key headeranthropic-version headerEvent Types:
message_start: Message metadatacontent_block_start: Content block startcontent_block_delta: Token deltas (text_delta)content_block_stop: Content block endmessage_delta: Usage statisticsmessage_stop: Stream completiongoogle.rsGoogle Gemini API stub implementation:
Supported Models:
Status: Stub implementation - returns not implemented error Coming Soon: Full streaming implementation
lib.rsMain library module with:
create_provider() factory functionsupported_providers() helperuse llm_latency_lens_providers::{
openai::OpenAIProvider,
traits::{Provider, StreamingRequest, MessageRole},
};
use llm_latency_lens_core::TimingEngine;
use futures::StreamExt;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Create provider
let provider = OpenAIProvider::builder()
.api_key(std::env::var("OPENAI_API_KEY")?)
.max_retries(3)
.build();
// Create timing engine
let timing = TimingEngine::new();
// Build request
let request = StreamingRequest::builder()
.model("gpt-4o")
.message(MessageRole::System, "You are a helpful assistant.")
.message(MessageRole::User, "Explain quantum computing in one paragraph.")
.max_tokens(200)
.temperature(0.7)
.build();
// Stream response
let mut response = provider.stream(request, &timing).await?;
// Process tokens
while let Some(token) = response.token_stream.next().await {
let event = token?;
if let Some(text) = event.content {
print!("{}", text);
}
// Log timing information
if event.sequence == 0 {
println!("\nTTFT: {:?}", event.time_since_start);
}
if let Some(latency) = event.inter_token_latency {
println!("Token {} latency: {:?}", event.sequence, latency);
}
}
Ok(())
}
use llm_latency_lens_providers::{
anthropic::AnthropicProvider,
traits::{Provider, StreamingRequest, MessageRole},
};
use llm_latency_lens_core::TimingEngine;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
// Create provider
let provider = AnthropicProvider::builder()
.api_key(std::env::var("ANTHROPIC_API_KEY")?)
.build();
// Create timing engine
let timing = TimingEngine::new();
// Build request with system message
let request = StreamingRequest::builder()
.model("claude-3-5-sonnet-20241022")
.message(MessageRole::System, "You are a concise assistant.")
.message(MessageRole::User, "What is the speed of light?")
.max_tokens(100)
.build();
// Use complete() for full response
let result = provider.complete(request, &timing).await?;
println!("Response: {}", result.content);
println!("TTFT: {:?}", result.ttft());
println!("Avg inter-token latency: {:?}", result.avg_inter_token_latency());
println!("Tokens/sec: {:.2}", result.tokens_per_second().unwrap_or(0.0));
// Calculate cost
if let (Some(input), Some(output)) = (result.metadata.input_tokens, result.metadata.output_tokens) {
if let Some(cost) = provider.calculate_cost(&result.metadata.model, input, output) {
println!("Estimated cost: ${:.6}", cost);
}
}
Ok(())
}
use llm_latency_lens_providers::{create_provider, traits::*};
use llm_latency_lens_core::TimingEngine;
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let provider_name = std::env::var("PROVIDER")?;
let api_key = std::env::var("API_KEY")?;
// Create provider dynamically
let provider = create_provider(&provider_name, api_key)?;
// Verify health
provider.health_check().await?;
// List supported models
println!("{} supports: {:?}", provider.name(), provider.supported_models());
Ok(())
}
All providers implement accurate cost calculation based on current pricing (2024):
| Model | Input | Output |
|---|---|---|
| gpt-4o | $2.50 | $10.00 |
| gpt-4o-mini | $0.15 | $0.60 |
| gpt-4-turbo | $10.00 | $30.00 |
| gpt-4 | $30.00 | $60.00 |
| gpt-3.5-turbo | $0.50 | $1.50 |
| Model | Input | Output |
|---|---|---|
| claude-3-5-sonnet | $3.00 | $15.00 |
| claude-3-5-haiku | $0.80 | $4.00 |
| claude-3-opus | $15.00 | $75.00 |
| claude-3-haiku | $0.25 | $1.25 |
| Model | Input | Output |
|---|---|---|
| gemini-1.5-pro | $1.25 | $5.00 |
| gemini-1.5-flash | $0.075 | $0.30 |
| gemini-1.5-flash-8b | $0.0375 | $0.15 |
The crate includes comprehensive unit tests for all components:
cargo test
Test coverage includes:
llm-latency-lens-core: Core timing and typestokio: Async runtimereqwest: HTTP clientreqwest-eventsource: SSE parsingserde/serde_json: Serializationasync-trait: Async trait supportfutures: Stream utilitiesthiserror: Error handlingtracing: LoggingApache-2.0