Crates.io | delta_kernel |
lib.rs | delta_kernel |
version | 0.4.1 |
source | src |
created_at | 2024-04-05 22:04:19.969606 |
updated_at | 2024-10-28 22:21:40.918553 |
description | Core crate providing a Delta/Deltalake implementation focused on interoperability with a wide range of query engines. |
homepage | https://delta.io |
repository | https://github.com/delta-incubator/delta-kernel-rs |
max_upload_size | |
id | 1197857 |
size | 828,837 |
Delta-kernel-rs is an experimental Delta implementation focused on interoperability with a wide range of query engines. It currently only supports reads.
The Delta Kernel project is a Rust and C library for building Delta connectors that can read (and soon, write) Delta tables without needing to understand the Delta protocol details. This is the Rust/C equivalent of Java Delta Kernel.
Delta-kernel-rs is split into a few different crates:
C
or C++
See the ffi
directory for more information.By default we build only the kernel
and acceptance
crates, which will also build derive-macros
as a dependency.
To get started, install Rust via rustup, clone the repository, and then run:
cargo test --all-features
This will build the kernel, run all unit tests, fetch the Delta Acceptance Tests data and run the acceptance tests against it.
In general, you will want to depend on delta-kernel-rs
by adding it as a dependency to your
Cargo.toml
, (that is, for rust projects using cargo) for other projects please see the FFI
module. The core kernel includes facilities for reading delta tables, but requires the consumer
to implement the Engine
trait in order to use the table-reading APIs. If there is no need to
implement the consumer's own Engine
trait, the kernel has a feature flag to enable a default,
asynchronous Engine
implementation built with Arrow and Tokio.
# fewer dependencies, requires consumer to implement Engine trait.
# allows consumers to implement their own in-memory format
delta_kernel = "0.4"
# or turn on the default engine, based on arrow
delta_kernel = { version = "0.4", features = ["default-engine"] }
There are more feature flags in addition to the default-engine
flag shown above. Relevant flags
include:
Feature flag | Description |
---|---|
default-engine |
Turn on the 'default' engine: async, arrow-based Engine implementation |
sync-engine |
Turn on the 'sync' engine: synchronous, arrow-based Engine implementation. Only supports local storage! |
arrow-conversion |
Conversion utilities for arrow/kernel schema interoperation |
arrow-expression |
Expression system implementation for arrow |
We intend to follow Semantic Versioning. However, in the 0.x
line, the APIs
are still unstable. We therefore may break APIs within minor releases (that is, 0.1
-> 0.2
), but
we will not break APIs in patch releases (0.1.0
-> 0.1.1
).
If you enable the default-engine
or sync-engine
features, you get an implemenation of the
Engine
trait that uses Arrow as its data format.
The arrow crate
tends to release new major versions rather
quickly. To enable engines that already integrate arrow to also integrate kernel and not force them
to track a specific version of arrow that kernel depends on, we take as broad dependecy on arrow
versions as we can.
This means you can force kernel to rely on the specific arrow version that your engine already uses,
as long as it falls in that range. You can see the range in the Cargo.toml
in the same folder as
this README.md
.
For example, although arrow 53.1.0 has been released, you can force kernel to compile on 53.0 by
putting the following in your project's Cargo.toml
:
[patch.crates-io]
arrow = "53.0"
arrow-arith = "53.0"
arrow-array = "53.0"
arrow-buffer = "53.0"
arrow-cast = "53.0"
arrow-data = "53.0"
arrow-ord = "53.0"
arrow-json = "53.0"
arrow-select = "53.0"
arrow-schema = "53.0"
parquet = "53.0"
Note that unfortunatly patching in cargo
requires that exactly one version matches your
specification. If only arrow "53.0.0" had been released the above will work, but if "53.0.1" where
to be released, the specification will break and you will need to provide a more restrictive
specification like "=53.0.0"
.
You may also need to patch the object_store
version used if the version of parquet
you depend on
depends on a different version of object_store
. This can be done by including object_store
in
the patch list with the required version. You can find this out by checking the parquet
docs.rs
page, switching to the version you want to use,
and then checking what version of object_store
it depends on.
There are some example programs showing how delta-kernel-rs
can be used to interact with delta
tables. They live in the kernel/examples
directory.
delta-kernel-rs is still under heavy development but follows conventions adopted by most Rust projects.
There are a few key concepts that will help in understanding kernel:
Engine
trait encapsulates all the functionality and engine or connector needs to provide to
the Delta Kernel in order to read the Delta table.DefaultEngine
is our default implementation of the the above trait. It lives in
engine/default
, and provides a reference implementation for all Engine
functionality. DefaultEngine
uses arrow as its in-memory
data format.Scan
is the entrypoint for reading data from a table.Some design principles which should be considered:
Engine
implementation. The core kernel does not use async at
all. We do not wish to impose the need for an entire async runtime on an engine or connector. The
DefaultEngine
does use async quite heavily. It doesn't depend on a particular runtime however,
and implementations could provide an "executor" based on tokio, smol, async-std, or whatever might
be needed. Currently only a tokio
based executor is provided.Table
API. The kernel intentionally exposes the concept of immutable versions of tables
through the snapshot API. This encourages users to think about the Delta table state more
accurately.rust-analyzer
is your friend. rustup component add rust-analyzer
emacs
, both eglot and
lsp-mode provide excellent integration with
rust-analyzer
. rustic is a nice mode as well..vscode/settings.json
.{
"editor.formatOnSave": true,
"rust-analyzer.cargo.features": ["default-engine", "acceptance"]
}
cargo docs --open