term-core

Crates.ioterm-core
lib.rsterm-core
version0.0.1
created_at2025-07-26 18:40:27.00966+00
updated_at2025-07-26 18:40:27.00966+00
descriptionA Rust data validation library providing Deequ-like capabilities without Spark dependencies
homepagehttps://github.com/term/term
repositoryhttps://github.com/term/term
max_upload_size
id1769359
size920,896
Eric Simon (ericpsimon)

documentation

https://docs.rs/term-core

README

Term - Lightning-Fast Data Validation for Rust

CI codecov Crates.io Documentation License: MIT

Bulletproof data validation without the infrastructure headache.

Get Started β€’ Documentation β€’ Examples β€’ API Reference

Why Term?

Every data pipeline is a ticking time bomb. Null values crash production. Duplicate IDs corrupt databases. Format changes break downstream systems. Yet most teams discover these issues only after the damage is done.

Traditional data validation tools assume you have a data team, a Spark cluster, and weeks to implement. Term takes a different approach:

  • πŸš€ 5-minute setup - From install to first validation. No clusters, no configs, no complexity
  • ⚑ 100MB/s single-core performance - Validate millions of rows in seconds, not hours
  • πŸ›‘οΈ Fail fast, fail safe - Catch data issues before they hit production
  • πŸ“Š See everything - Built-in OpenTelemetry means you're never debugging blind
  • πŸ”§ Zero infrastructure - Single binary runs on your laptop, in CI/CD, or in the cloud

Term is data validation for the 99% of engineering teams who just want their data to work.

🎯 5-Minute Quickstart

# Add to your Cargo.toml
cargo add term-core tokio --features tokio/full
use term_core::prelude::*;

#[tokio::main]
async fn main() -> Result<()> {
    // Load your data
    let ctx = SessionContext::new();
    ctx.register_csv("users", "users.csv", CsvReadOptions::new()).await?;

    // Define what good data looks like
    let checks = ValidationSuite::builder("User Data Quality")
        .check(
            Check::builder("No broken data")
                .is_complete("user_id")          // No missing IDs
                .is_unique("email")              // No duplicate emails
                .has_pattern("email", r"@", 1.0) // All emails have @
                .build()
        )
        .build();

    // Validate and get instant feedback
    let report = checks.run(&ctx).await?;
    println!("{}", report);  // βœ… All 3 checks passed!

    Ok(())
}

That's it! No clusters to manage, no JVMs to tune, no YAML to write.

πŸ”₯ Real-World Example: Validate 1M Rows in Under 1 Second

// Validate a production dataset with multiple quality checks
let suite = ValidationSuite::builder("Production Pipeline")
    .check(
        Check::builder("Data Freshness")
            .satisfies("created_at > now() - interval '1 day'")
            .has_size(Assertion::GreaterThan(1000))
            .build()
    )
    .check(
        Check::builder("Business Rules")
            .has_min("revenue", Assertion::GreaterThan(0.0))
            .has_mean("conversion_rate", Assertion::Between(0.01, 0.10))
            .has_correlation("ad_spend", "revenue", Assertion::GreaterThan(0.5))
            .build()
    )
    .build();

// Runs all checks in a single optimized pass
let report = suite.run(&ctx).await?;

🎨 What Can You Validate?

πŸ” Data Quality

  • Completeness checks
  • Uniqueness validation
  • Pattern matching
  • Type consistency

πŸ“Š Statistical

  • Min/max boundaries
  • Mean/median checks
  • Standard deviation
  • Correlation analysis

πŸ›‘οΈ Security

  • Email validation
  • Credit card detection
  • URL format checks
  • PII detection

πŸš€ Performance

  • Row count assertions
  • Query optimization
  • Batch processing
  • Memory efficiency

πŸ“ˆ Benchmarks: 15x Faster with Smart Optimization

Dataset: 1M rows, 20 constraints
Without optimizer: 3.2s (20 full scans)
With Term:         0.21s (2 optimized scans)

πŸš€ Getting Started

Installation

[dependencies]
term-core = "0.0.1"
tokio = { version = "1", features = ["full"] }

# Optional features
term-core = { version = "0.0.1", features = ["cloud-storage"] }  # S3, GCS, Azure support

Learn Term in 30 Minutes

  1. Quick Start Tutorial - Your first validation in 5 minutes
  2. From Deequ to Term - Migration guide with examples
  3. Production Best Practices - Scaling and monitoring

Example Projects

Check out the examples/ directory for real-world scenarios:

πŸ“š Documentation

Our documentation is organized using the DiΓ‘taxis framework:

πŸ› οΈ Contributing

We love contributions! Term is built by the community, for the community.

# Get started in 3 steps
git clone https://github.com/withterm/term.git
cd term
cargo test

πŸ—ΊοΈ Roadmap

Now (v0.0.1)

  • βœ… Core validation engine
  • βœ… File format support (CSV, JSON, Parquet)
  • βœ… Cloud storage integration
  • βœ… OpenTelemetry support

Next (v0.0.2)

  • πŸ”œ Database connectivity (PostgreSQL, MySQL, SQLite)
  • πŸ”œ Streaming data support
  • πŸ”œ Python bindings
  • πŸ”œ Web UI for validation reports

Future

  • 🎯 Data profiling and suggestions
  • 🎯 Incremental validation
  • 🎯 Distributed execution
  • 🎯 More language bindings

πŸ“„ License

Term is MIT licensed. See LICENSE for details.

πŸ™ Acknowledgments

Term stands on the shoulders of giants:


Ready to bulletproof your data pipelines?

⚑ Get Started β€’ πŸ“– Read the Docs β€’ πŸ’¬ Join Community

Commit count: 0

cargo fmt