| Crates.io | vortex-scan |
| lib.rs | vortex-scan |
| version | 0.58.0 |
| created_at | 2025-01-21 12:49:38.18722+00 |
| updated_at | 2026-01-07 17:32:01.622029+00 |
| description | Scanning operations for Vortex |
| homepage | https://github.com/spiraldb/vortex |
| repository | https://github.com/spiraldb/vortex |
| max_upload_size | |
| id | 1524857 |
| size | 155,946 |
A high-performance scanning and (non-shuffling) query execution engine for the Vortex columnar format, featuring work-stealing parallelism and exhaustively tested concurrent execution.
The vortex-scan crate provides efficient scanning operations over Vortex arrays with support for:
use vortex_scan::ScanBuilder;
use vortex_array::expr::lit;
// Create a scan that reads specific columns with a filter
let scan = ScanBuilder::new(layout_reader)
.with_projection(select(["name", "age"]))
.with_filter(column("age").gt(lit(18)))
.build() ?;
// Execute the scan
for batch in scan.into_array_iter() ? {
let batch = batch ?;
// Process batch...
}
// Execute scan across multiple threads
let scan = ScanBuilder::new(layout_reader)
.with_projection(projection)
.with_filter(filter)
.into_array_iter_multithread() ?;
for batch in scan {
let batch = batch ?;
// Results are automatically collected from worker threads
}
use arrow_array::RecordBatch;
// Convert scan results to Arrow RecordBatches
let reader = ScanBuilder::new(layout_reader)
.with_filter(filter)
.into_record_batch_reader(arrow_schema) ?;
for batch in reader {
let record_batch: RecordBatch = batch ?;
// Process Arrow RecordBatch...
}
use vortex_scan::Selection;
// Select specific rows by index
let scan = ScanBuilder::new(layout_reader)
.with_selection(Selection::IncludeByIndex(indices.into()))
.build() ?;
// Or use row ranges
let scan = ScanBuilder::new(layout_reader)
.with_row_range(1000..2000)
.build() ?;
The crate implements a sophisticated work-stealing queue that allows multiple worker threads to efficiently share work:
Filters are automatically optimized using:
All concurrent code has been verified using:
Run the standard test suite:
cargo test -p vortex-scan --all-features
The crate includes comprehensive Loom tests that exhaustively verify concurrent behavior. These tests run by default but can be disabled if need be:
# Skip Loom tests when using incompatible tools like address sanitizer
RUSTFLAGS="--cfg disable_loom" cargo test -p vortex-scan
Loom tests verify:
The default concurrency level is 2, meaning each worker thread can have 2 tasks in flight. This can be adjusted:
let scan = ScanBuilder::new(layout_reader)
.with_concurrency(4) // Increase for more I/O parallelism
.build() ?;
The multi-threaded executor uses buffering based on the formula:
buffer_size = num_workers * concurrency
This controls how many splits are processed concurrently.
Core dependencies:
vortex-array: Core array types and operations (includes expression evaluation framework)vortex-layout: Layout reader abstractionfutures: Async runtime abstractionsarrow-array (optional): Arrow integrationdefault: Standard features for most use casesroaring: Support for Roaring bitmap selections