| Crates.io | llm-edge-monitoring |
| lib.rs | llm-edge-monitoring |
| version | 0.1.0 |
| created_at | 2025-11-09 01:50:10.156595+00 |
| updated_at | 2025-11-09 01:50:10.156595+00 |
| description | Observability and monitoring for LLM Edge Agent |
| homepage | |
| repository | https://github.com/globalbusinessadvisors/llm-edge-agent |
| max_upload_size | |
| id | 1923497 |
| size | 63,430 |
Observability and monitoring for LLM Edge Agent. This crate provides comprehensive metrics, tracing, and cost tracking capabilities for production LLM deployments.
Add this to your Cargo.toml:
[dependencies]
llm-edge-monitoring = "0.1.0"
| Metric | Type | Labels | Description |
|---|---|---|---|
llm_edge_requests_total |
Counter | provider, model, status, error_type |
Total number of requests processed |
llm_edge_request_duration_ms |
Histogram | provider, model |
Request latency in milliseconds |
llm_edge_active_requests |
Gauge | - | Number of currently active requests |
| Metric | Type | Labels | Description |
|---|---|---|---|
llm_edge_tokens_total |
Counter | provider, model, type |
Total tokens used (input/output) |
| Metric | Type | Labels | Description |
|---|---|---|---|
llm_edge_cost_usd_total |
Counter | provider, model |
Total cost in USD |
| Metric | Type | Labels | Description |
|---|---|---|---|
llm_edge_cache_hits_total |
Counter | tier |
Cache hits by tier (L1/L2/L3) |
llm_edge_cache_misses_total |
Counter | tier |
Cache misses by tier |
| Metric | Type | Labels | Description |
|---|---|---|---|
llm_edge_provider_available |
Gauge | provider |
Provider health status (1=healthy, 0=unhealthy) |
use llm_edge_monitoring::metrics;
// Record a successful request
metrics::record_request_success("openai", "gpt-4", 245);
// Record a failed request
metrics::record_request_failure("anthropic", "claude-3", "rate_limit");
// Record token usage
metrics::record_token_usage("openai", "gpt-4", 150, 500);
// Record cost
metrics::record_cost("openai", "gpt-4", 0.0075);
use llm_edge_monitoring::metrics;
// Record cache hits and misses
metrics::record_cache_hit("L1");
metrics::record_cache_miss("L2");
use llm_edge_monitoring::metrics;
// Update provider health status
metrics::record_provider_health("openai", true);
metrics::record_provider_health("anthropic", false);
// Track active requests
metrics::record_active_requests(42);
use llm_edge_monitoring::metrics;
use std::time::Instant;
async fn handle_llm_request(
provider: &str,
model: &str,
) -> Result<String, Box<dyn std::error::Error>> {
let start = Instant::now();
// Increment active requests
metrics::record_active_requests(1);
// Make the LLM request
let result = make_llm_call(provider, model).await;
let latency_ms = start.elapsed().as_millis() as u64;
match result {
Ok(response) => {
// Record success metrics
metrics::record_request_success(provider, model, latency_ms);
metrics::record_token_usage(provider, model, 150, 500);
metrics::record_cost(provider, model, 0.0075);
Ok(response)
}
Err(e) => {
// Record failure metrics
metrics::record_request_failure(provider, model, "api_error");
Err(e)
}
}
}
async fn make_llm_call(provider: &str, model: &str) -> Result<String, Box<dyn std::error::Error>> {
// Your LLM call implementation
Ok("response".to_string())
}
OpenTelemetry tracing support is included for distributed tracing:
use llm_edge_monitoring::tracing;
// Note: Tracing setup is currently under development
// Future API will support OTLP exporter configuration and span creation
Add the metrics endpoint to your prometheus.yml:
scrape_configs:
- job_name: 'llm-edge-agent'
scrape_interval: 15s
static_configs:
- targets: ['localhost:9090']
Request Rate by Provider:
rate(llm_edge_requests_total[5m])
Average Request Latency:
rate(llm_edge_request_duration_ms_sum[5m]) /
rate(llm_edge_request_duration_ms_count[5m])
Cache Hit Rate:
sum(rate(llm_edge_cache_hits_total[5m])) /
(sum(rate(llm_edge_cache_hits_total[5m])) + sum(rate(llm_edge_cache_misses_total[5m])))
Cost per Hour:
rate(llm_edge_cost_usd_total[1h]) * 3600
Error Rate:
rate(llm_edge_requests_total{status="error"}[5m]) /
rate(llm_edge_requests_total[5m])
Monitor your LLM costs in real-time:
use llm_edge_monitoring::metrics;
// Calculate and record costs based on token usage
fn calculate_cost(provider: &str, model: &str, input_tokens: usize, output_tokens: usize) -> f64 {
let cost = match (provider, model) {
("openai", "gpt-4") => {
(input_tokens as f64 * 0.00003) + (output_tokens as f64 * 0.00006)
}
("anthropic", "claude-3-opus") => {
(input_tokens as f64 * 0.000015) + (output_tokens as f64 * 0.000075)
}
_ => 0.0,
};
metrics::record_cost(provider, model, cost);
cost
}
The crate provides custom error types for monitoring operations:
use llm_edge_monitoring::{MonitoringError, MonitoringResult};
fn example() -> MonitoringResult<()> {
// Your monitoring code here
Ok(())
}
Track performance across different cache tiers:
// L1 (in-memory), L2 (Redis), L3 (DynamoDB)
metrics::record_cache_hit("L1");
metrics::record_cache_miss("L1");
metrics::record_cache_hit("L2");
Monitor provider health during failover scenarios:
// Primary provider fails
metrics::record_provider_health("primary", false);
metrics::record_request_failure("primary", "gpt-4", "timeout");
// Failover to secondary
metrics::record_provider_health("secondary", true);
metrics::record_request_success("secondary", "claude-3", 180);
Run tests:
cargo test
Check documentation:
cargo doc --open
Licensed under the Apache License, Version 2.0. See LICENSE for details.
Contributions are welcome! Please see our Contributing Guide for details.
llm-edge-core - Core abstractions and traitsllm-edge-providers - Provider implementationsllm-edge-cache - Multi-tier caching layerllm-edge-orchestrator - Request orchestration and routing