datafusion-ffi

Crates.iodatafusion-ffi
lib.rsdatafusion-ffi
version43.0.0
sourcesrc
created_at2024-11-09 04:06:03.548485
updated_at2024-11-09 04:06:03.548485
descriptionForeign Function Interface implementation for DataFusion
homepagehttps://datafusion.apache.org
repositoryhttps://github.com/apache/datafusion
max_upload_size
id1441797
size62,924
Andrew Lamb (alamb)

documentation

README

datafusion-ffi: Apache DataFusion Foreign Function Interface

This crate contains code to allow interoperability of Apache DataFusion with functions from other languages using a stable interface.

See API Docs for details and examples.

We expect this crate may be used by both sides of the FFI. This allows users to create modules that can interoperate with the necessity of using the same version of DataFusion. The driving use case has been the datafusion-python repository, but many other use cases may exist. We envision at least two use cases.

  1. datafusion-python which will use the FFI to provide external services such as a TableProvider without needing to re-export the entire datafusion-python code base. With datafusion-ffi these packages do not need datafusion-python as a dependency at all.
  2. Users may want to create a modular interface that allows runtime loading of libraries.

Struct Layout

In this crate we have a variety of structs which closely mimic the behavior of their internal counterparts. In the following example, we will refer to the TableProvider, but the same pattern exists for other structs.

Each of the exposted structs in this crate is provided with a variant prefixed with Foreign. This variant is designed to be used by the consumer of the foreign code. The Foreign structs should never access the private_data fields. Instead they should only access the data returned through the function calls defined on the FFI_ structs. The second purpose of the Foreign structs is to contain additional data that may be needed by the traits that are implemented on them. Some of these traits require borrowing data which can be far more convienent to be locally stored.

For example, we have a struct FFI_TableProvider to give access to the TableProvider functions like table_type() and scan(). If we write a library that wishes to expose it's TableProvider, then we can access the private data that contains the Arc reference to the TableProvider via FFI_TableProvider. This data is local to the library.

If we have a program that accesses a TableProvider via FFI, then it will use ForeignTableProvider. When using ForeignTableProvider we must not attempt to access the private_data field in FFI_TableProvider. If a user is testing locally, you may be able to successfully access this field, but it will only work if you are building against the exact same version of DataFusion for both libraries and the same compiler. It will not work in general.

It is worth noting that which library is the local and which is foreign depends on which interface we are considering. For example, suppose we have a Python library called my_provider that exposes a TableProvider called MyProvider via FFI_TableProvider. Within the library my_provider we can access the private_data via FFI_TableProvider. We connect this to datafusion-python, where we access it as a ForeignTableProvider. Now when we call scan() on this interface, we have to pass it a FFI_SessionConfig. The SessionConfig is local to datafusion-python and not my_provider. It is important to be careful when expanding these functions to be certain which side of the interface each object refers to.

Commit count: 9447

cargo fmt