otlp2parquet

Crates.iootlp2parquet
lib.rsotlp2parquet
version0.9.0
created_at2025-11-29 01:09:28.484197+00
updated_at2026-01-19 03:58:36.63803+00
descriptionStream OpenTelemetry logs, metrics, and traces to Parquet files
homepage
repositoryhttps://github.com/smithclay/otlp2parquet
max_upload_size
id1956227
size696,742
Clay Smith (smithclay)

documentation

README

otlp2parquet

CI Crates.io License

What if your observability data was just a bunch of Parquet files?

Receive OpenTelemetry logs, metrics, and traces and write them as Parquet files to local disk or S3-compatible storage. Query with duckdb, Spark, pandas, or anything that reads Parquet.

If you want to stream real-time observability data directly to AWS, Azure or Cloudflare: check out the related otlp2pipeline project.

flowchart TB
    subgraph Sources["OpenTelemetry Sources"]
        Traces
        Metrics
        Logs
    end

    subgraph otlp2parquet["otlp2parquet"]
        Decode["Decode"] --> Arrow["Arrow"] --> Write["Parquet"]
    end

    subgraph Storage["Storage"]
        Local["Local File"]
        S3["S3-Compatible"]
    end

    Query["Query Engines"]

    Sources --> otlp2parquet
    otlp2parquet --> Storage
    Query --> Storage

Quick Start

# requires rust toolchain: `curl https://sh.rustup.rs -sSf | sh`
cargo install otlp2parquet

otlp2parquet

Server starts on http://localhost:4318. Send a simple OTLP HTTP log:

# otlp2parquet batches writes to disk every BATCH_AGE_MAX_SECONDS by default
curl -X POST http://localhost:4318/v1/logs \
  -H "Content-Type: application/json" \
  -d '{"resourceLogs":[{"scopeLogs":[{"logRecords":[{"body":{"stringValue":"hello world"}}]}]}]}'

Query it:

# see https://duckdb.org/install
duckdb -c "SELECT * FROM './data/logs/**/*.parquet'"

Print configuration to receive OTLP from a collector, Claude Code, or Codex:

otlp2parquet connect otel-collector
otlp2parquet connect claude-Code
otlp2parquet connect codex

Why?

  • Keep monitoring data around a long time Parquet on S3 can be 90% cheaper than large monitoring vendors for long-term analytics.
  • Query with good tools — duckDB, Spark, Trino, Pandas
  • Deploy anywhere — Local binary, containers, or your own servers.

Run with Docker

docker-compose up

Supported Signals

Logs, Metrics, Traces via OTLP/HTTP (protobuf or JSON, gzip compression supported). No gRPC support for now.

APIs, schemas, and partition layout

  • OTLP/HTTP endpoints: /v1/logs, /v1/metrics, /v1/traces (protobuf or JSON; gzip supported)
  • Partition layout: logs/{service}/year=.../hour=.../{ts}-{uuid}.parquet, metrics/{type}/{service}/..., traces/{service}/...
  • Storage: filesystem or S3-compatible object storage
  • Schemas: ClickHouse-compatible, PascalCase columns; five metric schemas (Gauge, Sum, Histogram, ExponentialHistogram, Summary)
  • Error model: HTTP 400 on invalid input/too large; 5xx on conversion/storage

Future work (contributions welcome)

  • OpenTelemetry Arrow alignment

  • Additional platforms: Azure Functions; Kubernetes manifests

Learn More


Caveats
  • Batching: Use an OTel Collector upstream to batch and reduce request overhead.
  • Schema: Uses ClickHouse-compatible column names. Will converge with OTel Arrow (OTAP) when it stabilizes.
  • Status: Functional but evolving. API may change.
Commit count: 73

cargo fmt