spark-connect

Crates.io	spark-connect
lib.rs	spark-connect
version	0.2.1
created_at	2025-10-17 10:47:13.122659+00
updated_at	2026-01-22 20:44:53.241076+00
description	Rust client for Apache Spark Connect.
homepage	https://github.com/franciscoabsampaio/spark-connect
repository	https://github.com/franciscoabsampaio/spark-connect
max_upload_size
id	1887495
size	925,709

Francisco A. B. Sampaio (franciscoabsampaio)

documentation

https://docs.rs/spark-connect

README

spark-connect

An idiomatic, SQL-first Rust client for Apache Spark Connect.

This crate provides a fully asynchronous, strongly typed API for interacting with a remote Spark Connect server over gRPC.

It allows you to build and execute SQL queries, bind parameters safely, and collect Arrow RecordBatch results - just like any other SQL toolkit - all in native Rust.

✨ Features

⚙️ Spark-compatible connection builder (sc://host:port format);
🪶 Async execution using tokio and tonic;
🧩 Parameterized queries;
🧾 Arrow-native results returned as Vec<RecordBatch>;

Getting Started

use spark_connect::SparkSessionBuilder;

# #[tokio::main]
# async fn main() -> Result<(), Box<dyn std::error::Error>> {
// 1️⃣ Connect to a Spark Connect endpoint
let session = SparkSessionBuilder::new("sc://localhost:15002")
    .build()
    .await?;

// 2️⃣ Execute a simple SQL query and receive a Vec<RecordBatches>
let batches = session
    .query("SELECT ? AS rule, ? AS text")
    .bind(42)
    .bind("world")
    .execute()
    .await?;

# Ok(())
# }

It's that simple!

🧩 Parameterized Queries

Behind the scenes, the [SparkSession::query] method uses the [ToLiteral] trait to safely bind parameters before execution:

use spark_connect::ToLiteral;
 
// This is
 
let batches = session
    .query("SELECT ? AS id, ? AS text")
    .bind(42)
    .bind("world")
    .await?;

// the same as this

let lazy_plan = session.sql(
    "SELECT ? AS id, ? AS text",
    vec![42.to_literal(), "world".to_literal()]
).await?;
let batches = session.collect(lazy_plan);

😴 Lazy Execution

The biggest advantage to using the sql() method instead of query() is lazy execution - queries can be lazily evaluated and collected afterwards. If you're coming from PySpark or Scala, this should be the familiar interface.

🧠 Concepts

SparkSession — the main entry point for executing SQL queries and managing a session.
SparkClient — low-level gRPC client (used internally).
SqlQueryBuilder — helper for binding parameters and executing queries.

⚙️ Requirements

A running Spark Connect server (Spark 3.4+);
Network access to the configured sc:// endpoint;
tokio runtime.

🔒 Example Connection Strings

sc://localhost:15002
sc://spark-cluster:15002/?user_id=francisco
sc://10.0.0.5:15002;session_id=abc123;user_agent=my-app

🏗️ Building With Different Versions of Spark Connect

Currently, this crate is built against Spark 3.5.x. If you need to build against a different version of Spark Connect, you can:

Clone this repository.
Go to the official Apache Spark repository and find the protobuf definitions for the desired version. Refer to the table below for the exact path.
Download the protobuf directory and replace the protobuf/ directory of this repository with the desired version.
After replacing the files, run cargo build to regenerate the gRPC client code.
Use the crate as usual.

Version	Path to the protobuf directory
4.x	`branch-4.x / sql/connect/common/src/main/protobuf`
3.4-3.5	`branch-3.x / connector/connect/common/src/main/protobuf`

⚠️ Note that compatibility is not guaranteed, and you may encounter issues if there are significant changes between versions.

📘 Learn More

🙏 Acknowledgements

This project takes heavy inspiration from the spark-connect-rs project, and would've been much harder without it!

This project is not affiliated with, endorsed by, or sponsored by the Apache Software Foundation. “Apache”, “Apache Spark”, and “Spark Connect” are trademarks of the Apache Software Foundation.

Commit count: 47

spark-connect

documentation

README

spark-connect

✨ Features

Getting Started

🧩 Parameterized Queries

😴 Lazy Execution

🧠 Concepts

⚙️ Requirements

🔒 Example Connection Strings

🏗️ Building With Different Versions of Spark Connect

📘 Learn More

🙏 Acknowledgements

cargo fmt