| Crates.io | lance |
| lib.rs | lance |
| version | 1.0.1 |
| created_at | 2022-07-28 07:11:32.95739+00 |
| updated_at | 2025-12-30 21:58:37.337069+00 |
| description | A columnar data format that is 100x faster than Parquet for random access. |
| homepage | |
| repository | https://github.com/lance-format/lance |
| max_upload_size | |
| id | 634289 |
| size | 4,132,093 |
The Open Lakehouse Format for Multimodal AI
Install using cargo:
cargo install lance
Suppose batches is an Arrow Vec<RecordBatch> and schema is Arrow SchemaRef:
use lance::{dataset::WriteParams, Dataset};
let write_params = WriteParams::default();
let mut reader = RecordBatchIterator::new(
batches.into_iter().map(Ok),
schema
);
Dataset::write(reader, &uri, Some(write_params)).await.unwrap();
let dataset = Dataset::open(path).await.unwrap();
let mut scanner = dataset.scan();
let batches: Vec<RecordBatch> = scanner
.try_into_stream()
.await
.unwrap()
.map(|b| b.unwrap())
.collect::<Vec<RecordBatch>>()
.await;
let values: Result<RecordBatch> = dataset.take(&[200, 199, 39, 40, 100], &projection).await;
Assume "embeddings" is a FixedSizeListArray
use ::lance::index::vector::VectorIndexParams;
let params = VectorIndexParams::default();
params.num_partitions = 256;
params.num_sub_vectors = 16;
// this will Err if list_size(embeddings) / num_sub_vectors does not meet simd alignment
dataset.create_index(&["embeddings"], IndexType::Vector, None, ¶ms, true).await;
Lance is an open lakehouse format for multimodal AI. It contains a file format, table format, and catalog spec that allows you to build a complete lakehouse on top of object storage to power your AI workflows.
The key features of Lance include:
Expressive hybrid search: Combine vector similarity search, full-text search (BM25), and SQL analytics on the same dataset with accelerated secondary indices.
Lightning-fast random access: 100x faster than Parquet or Iceberg for random access without sacrificing scan performance.
Native multimodal data support: Store images, videos, audio, text, and embeddings in a single unified format with efficient blob encoding and lazy loading.
Data evolution: Efficiently add columns with backfilled values without full table rewrites, perfect for ML feature engineering.
Zero-copy versioning: ACID transactions, time travel, and automatic versioning without needing extra infrastructure.
Rich ecosystem integrations: Apache Arrow, Pandas, Polars, DuckDB, Apache Spark, Ray, Trino, Apache Flink, and open catalogs (Apache Polaris, Unity Catalog, Apache Gravitino).
For more details, see the full Lance format specification.