lance

Crates.iolance
lib.rslance
version1.0.1
created_at2022-07-28 07:11:32.95739+00
updated_at2025-12-30 21:58:37.337069+00
descriptionA columnar data format that is 100x faster than Parquet for random access.
homepage
repositoryhttps://github.com/lance-format/lance
max_upload_size
id634289
size4,132,093
Lance Community (lance-community)

documentation

README

Rust Implementation of Lance

Lance Logo

The Open Lakehouse Format for Multimodal AI

Installation

Install using cargo:

cargo install lance

Examples

Create dataset

Suppose batches is an Arrow Vec<RecordBatch> and schema is Arrow SchemaRef:

use lance::{dataset::WriteParams, Dataset};

let write_params = WriteParams::default();
let mut reader = RecordBatchIterator::new(
    batches.into_iter().map(Ok),
    schema
);
Dataset::write(reader, &uri, Some(write_params)).await.unwrap();

Read

let dataset = Dataset::open(path).await.unwrap();
let mut scanner = dataset.scan();
let batches: Vec<RecordBatch> = scanner
    .try_into_stream()
    .await
    .unwrap()
    .map(|b| b.unwrap())
    .collect::<Vec<RecordBatch>>()
    .await;

Take

let values: Result<RecordBatch> = dataset.take(&[200, 199, 39, 40, 100], &projection).await;

Vector index

Assume "embeddings" is a FixedSizeListArray

use ::lance::index::vector::VectorIndexParams;

let params = VectorIndexParams::default();
params.num_partitions = 256;
params.num_sub_vectors = 16;

// this will Err if list_size(embeddings) / num_sub_vectors does not meet simd alignment
dataset.create_index(&["embeddings"], IndexType::Vector, None, &params, true).await;

What is Lance?

Lance is an open lakehouse format for multimodal AI. It contains a file format, table format, and catalog spec that allows you to build a complete lakehouse on top of object storage to power your AI workflows.

The key features of Lance include:

  • Expressive hybrid search: Combine vector similarity search, full-text search (BM25), and SQL analytics on the same dataset with accelerated secondary indices.

  • Lightning-fast random access: 100x faster than Parquet or Iceberg for random access without sacrificing scan performance.

  • Native multimodal data support: Store images, videos, audio, text, and embeddings in a single unified format with efficient blob encoding and lazy loading.

  • Data evolution: Efficiently add columns with backfilled values without full table rewrites, perfect for ML feature engineering.

  • Zero-copy versioning: ACID transactions, time travel, and automatic versioning without needing extra infrastructure.

  • Rich ecosystem integrations: Apache Arrow, Pandas, Polars, DuckDB, Apache Spark, Ray, Trino, Apache Flink, and open catalogs (Apache Polaris, Unity Catalog, Apache Gravitino).

For more details, see the full Lance format specification.

Commit count: 0

cargo fmt