datafusion-bigtable

Crates.iodatafusion-bigtable
lib.rsdatafusion-bigtable
version0.1.0
sourcesrc
created_at2022-03-12 04:32:00.70872
updated_at2022-03-12 04:32:00.70872
descriptionBigtable data source for Apache Arrow Datafusion
homepagehttps://github.com/datafusion-contrib/datafusion-bigtable
repositoryhttps://github.com/datafusion-contrib/datafusion-bigtable
max_upload_size
id548632
size39,275
Rich (jychen7)

documentation

README

Datafusion-Bigtable

Bigtable data source for Apache Arrow Datafusion

Run SQL on Bigtable

This crate implements Bigtable data source and Executor for Datafusion. It is built on top of gRPC client tonic.

Quick Start

let bigtable_datasource = BigtableDataSource::new(
    "emulator".to_owned(),                               // project
    "dev".to_owned(),                                    // instance
    "weather_balloons".to_owned(),                       // table
    "measurements".to_owned(),                           // column family
    vec!["_row_key".to_owned()],                         // table_partition_cols
    "#".to_owned(),                                      // table_partition_separator
    vec![Field::new("pressure", DataType::Utf8, false)], // qualifiers
    true,                                                // only_read_latest
).await.unwrap();

let mut ctx = ExecutionContext::new();
ctx.register_table("weather_balloons", Arc::new(bigtable_datasource)).unwrap();

ctx.sql("SELECT \"_row_key\", pressure, \"_timestamp\" FROM weather_balloons where \"_row_key\" = 'us-west2#3698#2021-03-05-1200'").await?.collect().await?;

Roadmap

Bigtable

  • ✅ UTF8 string
  • ✅ 64-bit big-endian signed integer

SQL

  • ✅ select by "_row_key" =
  • ✅ select by "_row_key" IN
  • ✅ select by "_row_key" BETWEEN
  • ✅ select by composite row keys =
  • ✅ select by composite row keys IN
  • ✅ select by composite row keys BETWEEN (only supported by last table_partition_cols)

General

Note: datafusion-bigtable provides the physical Executor for Datafusion. Any aggregation, group by, join are implemented and handled by Datafusion.

Commit count: 21

cargo fmt