burn_telemetry

Crates.io	burn_telemetry
lib.rs	burn_telemetry
version	0.1.0
created_at	2025-11-10 17:34:46.697436+00
updated_at	2025-11-10 17:34:46.697436+00
description	burn training logger
homepage	https://github.com/mosure/burn_telemetry
repository	https://github.com/mosure/burn_telemetry
max_upload_size
id	1925854
size	360,183

Mitchell Mosure (mosure)

documentation

README

burn_telemetry 🔥📊

Training logger plugins for Burn

Export training metrics to various observability and visualization platforms with minimal overhead.

features

OpenTelemetry Logger (`otel` feature)

Export metrics to OpenTelemetry-compatible backends (Prometheus, Jaeger, etc.)

🚀 Minimal overhead - Efficient async export with configurable batching
🎯 Multi-GPU/node support - Resource attributes for distributed training
🔌 OTLP & stdout - Support for OTLP protocol and debugging via stdout
⚙️ Configurable - Fine-tune export intervals, batch sizes, and timeouts

Rerun Logger (`rerun` feature)

Rich visualization with Rerun.io for training metrics, images, and tensors

Rich visualizations - Images, videos, tensors, and custom data types
Timeline-based - Explore metric evolution across training epochs
Interactive - Real-time visualization during training
Multi-modal - Scalars, text, images, tensors all in one view
Video streaming - Incremental frame logging for live training visualization

usage

OpenTelemetry

use burn::train::LearnerBuilder;
use burn_telemetry::otel::{OpenTelemetryConfig, OpenTelemetryLogger};

// Configure OpenTelemetry logger
let config = OpenTelemetryConfig::new("my-training-service")
    .with_endpoint("http://localhost:4317")  // OTLP endpoint
    .with_resource_attribute("gpu.id", "0")   // Multi-GPU tracking
    .with_resource_attribute("node.id", "worker-1");

// Create logger
let otel_logger = OpenTelemetryLogger::new(
    config,
    "mnist-training",
    Some("gpu-0"),
).expect("Failed to create logger");

// Use with LearnerBuilder
let learner = LearnerBuilder::new(artifact_dir)
    .metric_train_numeric(AccuracyMetric::new())
    .metric_valid_numeric(LossMetric::new())
    .metric_logger(otel_logger)
    .build(/* ... */);

Rerun

use burn::train::LearnerBuilder;
use burn_telemetry::rerun::{RerunConfig, RerunLogger};
use burn_telemetry::rerun::{BurnToImage, BurnToRerun, BurnToVideo};

// Configure Rerun logger
let config = RerunConfig::new("training-visualization")
    .with_spawn_viewer(true);  // Opens Rerun viewer automatically

// Create logger
let mut rerun_logger = RerunLogger::new(
    config,
    "mnist-training",
).expect("Failed to create logger");

// Use with LearnerBuilder for standard metrics
let learner = LearnerBuilder::new(artifact_dir)
    .metric_train_numeric(AccuracyMetric::new())
    .metric_logger(rerun_logger)
    .build(/* ... */);

// Log custom visualizations (images, tensors, videos, etc.)
let image_tensor: Tensor<Backend, 3> = /* ... */;
let rerun_image = image_tensor.into_rerun_image().await;
rerun_logger.log_image("sample_output", rerun_image, epoch);

// Log videos (4D tensor with shape [T, H, W, C])
let video_tensor: Tensor<Backend, 4> = /* ... */;
let video_frames = video_tensor.into_rerun_video().await;
rerun_logger.log_video("training_animation", video_frames, epoch);

// Video streaming - log frames incrementally with automatic cleanup
let stream_handle = rerun_logger.init_video_stream("live_training");
for frame_tensor in training_frames {
    let frame = frame_tensor.into_rerun_image().await;
    rerun_logger.log_video_frame(&stream_handle, frame, epoch);
}
// Stream is automatically cleaned up when handle is dropped

Using Multiple Loggers

Combine multiple loggers to export to multiple backends simultaneously:

use burn_telemetry::MultiLogger;
use burn_telemetry::otel::OpenTelemetryLogger;
use burn_telemetry::rerun::RerunLogger;

// Create individual loggers
let otel_logger = OpenTelemetryLogger::new(/* ... */)?;
let rerun_logger = RerunLogger::new(/* ... */)?;

// Combine them with MultiLogger
let multi_logger = MultiLogger::new()
    .add_logger(otel_logger)
    .add_logger(rerun_logger);

// Use with LearnerBuilder
let learner = LearnerBuilder::new(artifact_dir)
    .metric_train_numeric(AccuracyMetric::new())
    .metric_logger(multi_logger)
    .build(/* ... */);

For a runnable end-to-end walkthrough see example/multi_logger.rs, which initializes both backends and streams a short training loop through the combined logger.

cargo features

[dependencies]
burn_telemetry = { version = "0.1", features = ["otel", "rerun"] }

Available features:

otel (default) - OpenTelemetry logger
rerun - Rerun.io logger
Both can be enabled simultaneously

examples

The showcase MNIST example lives at example/mnist.rs and automatically wires in whichever telemetry features you compiled:

cargo run --example mnist --features rerun,otel Streams scalar metrics to Rerun and OpenTelemetry simultaneously while also logging sample images plus predictions after training. Use --record-to, --limit-train, --limit-valid, --full-dataset, --preview-samples, and --quick to match your demo needs.
cargo run --example multi_logger --features otel,rerun Drives both loggers simultaneously, exercising max_batch_size / async_export with OTLP while streaming the same metrics into a live Rerun viewer.

The MNIST dataset is downloaded the first time you run an example and cached in ~/.cache/burn-dataset/mnist. For quick sanity checks you can limit the dataset size:

cargo run --example mnist -- --quick --no-rerun

Examples default to the CUDA backend (device 0) and limit training to 10k train / 2k valid samples so they converge quickly during demos. Use --full-dataset to remove those limits or --limit-* to pin precise counts. See cargo run --example mnist -- --help for the full list of CLI flags shared by the harness. Running the CUDA backend requires a machine with a compatible NVIDIA driver and CUDA runtime; fall back to the CPU build if you do not have one.

attribution

The Burn-to-Rerun conversion utilities are adapted from brush by Arthur Brussee.

Commit count: 0