| Crates.io | azof |
| lib.rs | azof |
| version | 0.2.1 |
| created_at | 2025-05-18 08:29:26.41481+00 |
| updated_at | 2025-05-18 08:29:26.41481+00 |
| description | Lakehouse format with event time travel |
| homepage | |
| repository | https://github.com/MaciekLesiczka/azof |
| max_upload_size | |
| id | 1678469 |
| size | 113,172 |
Query tables in object storage as of event time.
Azof is a lakehouse format with time-travel capabilities that allows you to query data as it existed at any point in time, based on when events actually occurred rather than when they were recorded.
Traditional data lakehouse formats allow time travel based on when data was written (processing time). Azof instead focuses on event time - the time when events actually occurred in the real world. This distinction is crucial for:
Event time-based time travel: Query data based on when events occurred, not when they were recorded
Efficient storage of updates: Preserves compacted snapshots of state to minimize storage and query overhead
Hierarchical organization: Uses segments and delta files to efficiently organize temporal data
Tunable compaction policy: Adjust based on your data distribution patterns
SQL integration: Query using DataFusion with familiar SQL syntax
Integration with object storage: Works with any object store (local, S3, etc.)
The Azof project is organized as a Rust workspace with multiple crates:
To build all projects in the workspace:
cargo build --workspace
The azof-cli provides a command-line interface for interacting with azof:
# Scan a table (current version)
cargo run -p azof-cli -- scan --path ./test-data --table table0
# Scan a table as of a specific event time
cargo run -p azof-cli -- scan --path ./test-data --table table0 --as-of "2024-03-15T14:30:00"
# Generate test parquet file from CSV
cargo run -p azof-cli -- gen --path ./test-data --table table2 --file base
The azof-datafusion crate provides integration with Apache DataFusion, allowing you to:
use azof_datafusion::context::ExecutionContext;
async fn query_azof() -> Result<(), Box<dyn std::error::Error>> {
let ctx = ExecutionContext::new("/path/to/azof");
let df = ctx
.sql(
"
SELECT key as symbol, revenue, net_income
FROM financials
AT ('2019-01-17T00:00:00.000Z')
WHERE industry IN ('Software')
ORDER BY revenue DESC
LIMIT 5;
",
)
.await?;
df.show().await?;
Ok(())
}
Run the example:
cargo run --example query_example -p azof-datafusion
If you install the CLI with cargo install --path crates/azof-cli, you can run it directly with:
azof-cli scan --path ./test-data --table table0
Azof is under development. The goal is to implement a data lakehouse with the following capabilities: