| Crates.io | vortex-scan |
| lib.rs | vortex-scan |
| version | 0.53.0 |
| created_at | 2025-01-21 12:49:38.18722+00 |
| updated_at | 2025-09-24 14:43:44.291193+00 |
| description | Scanning operations for Vortex |
| homepage | https://github.com/spiraldb/vortex |
| repository | https://github.com/spiraldb/vortex |
| max_upload_size | |
| id | 1524857 |
| size | 149,114 |
A high-performance scanning and (non-shuffling) query execution engine for the Vortex columnar format, featuring work-stealing parallelism and exhaustively tested concurrent execution.
The vortex-scan crate provides efficient scanning operations over Vortex arrays with support for:
use vortex_scan::ScanBuilder;
use vortex_expr::lit;
// Create a scan that reads specific columns with a filter
let scan = ScanBuilder::new(layout_reader)
.with_projection(select(["name", "age"]))
.with_filter(column("age").gt(lit(18)))
.build()?;
// Execute the scan
for batch in scan.into_array_iter()? {
let batch = batch?;
// Process batch...
}
// Execute scan across multiple threads
let scan = ScanBuilder::new(layout_reader)
.with_projection(projection)
.with_filter(filter)
.into_array_iter_multithread()?;
for batch in scan {
let batch = batch?;
// Results are automatically collected from worker threads
}
use arrow_array::RecordBatch;
// Convert scan results to Arrow RecordBatches
let reader = ScanBuilder::new(layout_reader)
.with_filter(filter)
.into_record_batch_reader(arrow_schema)?;
for batch in reader {
let record_batch: RecordBatch = batch?;
// Process Arrow RecordBatch...
}
use vortex_scan::Selection;
// Select specific rows by index
let scan = ScanBuilder::new(layout_reader)
.with_selection(Selection::IncludeByIndex(indices.into()))
.build()?;
// Or use row ranges
let scan = ScanBuilder::new(layout_reader)
.with_row_range(1000..2000)
.build()?;
The crate implements a sophisticated work-stealing queue that allows multiple worker threads to efficiently share work:
Filters are automatically optimized using:
All concurrent code has been verified using:
Run the standard test suite:
cargo test -p vortex-scan --all-features
The crate includes comprehensive Loom tests that exhaustively verify concurrent behavior. These tests run by default but can be disabled if need be:
# Skip Loom tests when using incompatible tools like address sanitizer
RUSTFLAGS="--cfg disable_loom" cargo test -p vortex-scan
Loom tests verify:
The default concurrency level is 2, meaning each worker thread can have 2 tasks in flight. This can be adjusted:
let scan = ScanBuilder::new(layout_reader)
.with_concurrency(4) // Increase for more I/O parallelism
.build()?;
The multi-threaded executor uses buffering based on the formula:
buffer_size = num_workers * concurrency
This controls how many splits are processed concurrently.
Core dependencies:
vortex-array: Core array types and operationsvortex-layout: Layout reader abstractionvortex-expr: Expression evaluation frameworkfutures: Async runtime abstractionstokio (optional): Multi-threaded async runtimearrow-array (optional): Arrow integrationdefault: Standard features for most use casestokio: Enable multi-threaded execution with Tokio runtimeroaring: Support for Roaring bitmap selections