dsq-core

Crates.iodsq-core
lib.rsdsq-core
version0.1.0
created_at2025-12-15 18:01:23.01218+00
updated_at2025-12-15 18:01:23.01218+00
descriptionCore functionality for dsq - data processing with jq syntax
homepage
repositoryhttps://github.com/durableprogramming/dsq
max_upload_size
id1986461
size607,559
David J Berube (djberube)

documentation

README

dsq-core

Build status Crates.io Documentation Rust 1.69+

Core library for dsq data processing.

dsq-core provides the fundamental data processing capabilities for dsq, extending jq-compatible syntax to work with structured data formats like Parquet, Avro, CSV, and more. It leverages Polars DataFrames for high-performance data manipulation.

Key Components

Value Type System

The core Value enum bridges between JSON-like values and Polars DataFrames:

use dsq_core::value::Value;

// JSON-like values
let json_val = Value::object([
    ("name".to_string(), Value::string("Alice")),
    ("age".to_string(), Value::int(30)),
].into());

// DataFrame values
let df_val = Value::dataframe(dataframe);

Operations Framework

Comprehensive data operations library:

use dsq_core::ops::{Operation, basic::*};

// Select columns
let selected = select_columns(&data, &["name", "age"])?;

// Sort data
let sorted = sort_by_columns(&selected, &[SortOptions::desc("age")])?;

// Take first N rows
let result = head(&sorted, 10)?;

I/O Support

Input/output for multiple file formats:

use dsq_core::io;

// Read CSV file
let data = io::read_file("data.csv", &io::ReadOptions::default())?;

// Write to Parquet
io::write_file(&result, "output.parquet", &io::WriteOptions::default())?;

Filter System

jq-compatible filter compilation and execution:

use dsq_core::filter::{FilterExecutor, ExecutorConfig};

// Execute jq-style filter
let mut executor = FilterExecutor::with_config(ExecutorConfig::default());
let result = executor.execute_str("map(select(.age > 30)) | sort_by(.name)", data)?;

Installation

Add this to your Cargo.toml:

[dependencies]
dsq-core = "0.1"

Or for the latest development version:

[dependencies]
dsq-core = { git = "https://github.com/durableprogramming/dsq", branch = "main" }

Requirements

  • Rust 1.69 or later
  • Polars for full DataFrame functionality

API Reference

Full API documentation is available at docs.rs/dsq-core.

Quick Start

use dsq_core::{Value, ops, io};

// Read data from a file
let data = io::read_file("data.csv", &io::ReadOptions::default())?;

// Apply operations
let result = ops::OperationPipeline::new()
    .select(vec!["name".to_string(), "age".to_string()])
    .filter("age > 25")?
    .sort_by(vec![ops::SortOptions::desc("age".to_string())])
    .head(10)
    .execute(data)?;

// Write to Parquet
io::write_file(&result, "output.parquet", &io::WriteOptions::default())?;
# Ok::<(), dsq_core::Error>(())

High-Level API

For more convenient usage, dsq-core provides a fluent API:

use dsq_core::api::Dsq;

// Chain operations easily
let result = Dsq::from_file("data.csv")?
    .select(&["name", "age", "department"])
    .filter_expr("age > 25")
    .sort_by(&["department", "age"])
    .group_by(&["department"])
    .aggregate(&["department"], vec![
        dsq_core::ops::aggregate::AggregationFunction::Count,
        dsq_core::ops::aggregate::AggregationFunction::Mean("salary".to_string()),
    ])
    .to_json()?;

Feature Flags

dsq-core supports optional features for different use cases:

  • default - Includes all-formats, io, and filter for full functionality
  • all-formats - Enables all supported data formats
  • io - File I/O operations and format conversion
  • filter - jq-compatible filter compilation and execution
  • repl - Interactive REPL support
  • cli - Command-line interface components

Format-Specific Features

  • csv - CSV/TSV reading and writing
  • json - JSON and JSON Lines support
  • parquet - Apache Parquet format support
  • avro - Apache Avro format support (requires Polars avro feature)

Dependencies

dsq-core builds on several key dependencies:

  • Polars - High-performance DataFrame operations
  • Arrow - Columnar memory format
  • Serde - Serialization/deserialization
  • Tokio - Async runtime for streaming operations
  • Nom - Parser combinators for filter syntax
  • dsq-shared - Shared types and utilities

Contributing

Contributions are welcome! Please see the main CONTRIBUTING.md file for guidelines.

License

Licensed under either of:

at your option.

Commit count: 0

cargo fmt