lance-hdfs-provider

Crates.iolance-hdfs-provider
lib.rslance-hdfs-provider
version0.1.0
created_at2026-01-08 08:07:19.367676+00
updated_at2026-01-08 08:07:19.367676+00
descriptionHDFS store provider for lance
homepage
repositoryhttps://github.com/fMeow/lance-hdfs-provider
max_upload_size
id2029738
size190,425
fMeow (fMeow)

documentation

https://docs.rs/lance-hdfs-provider

README

lance-hdfs-provider

HDFS store provider for Lance built on top of the OpenDAL hdfs service. It lets Lance and LanceDB read and write datasets directly to Hadoop HDFS.

Installation

Add the crate in your Cargo.toml:

[dependencies]
lance-hdfs-provider = "0.1.0"

Quickstart: Lance dataset

Register the provider, then read or write using HDFS URIs:

use std::sync::Arc;
use lance::{io::ObjectStoreRegistry, session::Session,
    dataset::{DEFAULT_INDEX_CACHE_SIZE, DEFAULT_METADATA_CACHE_SIZE}
};
use lance::dataset::builder::DatasetBuilder;
use lance_hdfs_provider::HdfsStoreProvider;

# #[tokio::main(flavor = "current_thread")]
# async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut registry = ObjectStoreRegistry::default();
    registry.insert("hdfs", Arc::new(HdfsStoreProvider));

    let session = Arc::new(Session::new(
        DEFAULT_INDEX_CACHE_SIZE,
        DEFAULT_METADATA_CACHE_SIZE,
        Arc::new(registry),
    ));

    let uri = "hdfs://127.0.0.1:9000/sample-dataset";

    // Load an existing dataset
    let _dataset = DatasetBuilder::from_uri(uri)
        .with_session(session.clone())
        .load()
        .await?;

    // Or write a new dataset (see examples)
    Ok(())
# }

Quickstart: LanceDB

Use the same registry when creating the LanceDB session:

use std::sync::Arc;
use lance::{io::ObjectStoreRegistry, session::Session,
    dataset::{DEFAULT_INDEX_CACHE_SIZE, DEFAULT_METADATA_CACHE_SIZE}
};
use lance_hdfs_provider::HdfsStoreProvider;

# #[tokio::main(flavor = "current_thread")]
# async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut registry = ObjectStoreRegistry::default();
    registry.insert("hdfs", Arc::new(HdfsStoreProvider));

    let session = Arc::new(Session::new(
        DEFAULT_INDEX_CACHE_SIZE,
        DEFAULT_METADATA_CACHE_SIZE,
        Arc::new(registry),
    ));

    let db = lancedb::connect("hdfs://127.0.0.1:9000/test-db")
        .session(session.clone())
        .execute()
        .await?;

    let table = db.open_table("table1").execute().await?;
    Ok(())
# }

Notes

  • Ensure your HDFS URI includes the NameNode. It can be a server with host and port (e.g. hdfs://127.0.0.1:9000/path), or a named cluster.
  • Authentication and additional options can be passed via Lance StorageOptions; any key supported by OpenDAL's HDFS service can be provided.

Licenses

Licensed under either of

Commit count: 8

cargo fmt