| Crates.io | spark-connect |
| lib.rs | spark-connect |
| version | 0.2.1 |
| created_at | 2025-10-17 10:47:13.122659+00 |
| updated_at | 2026-01-22 20:44:53.241076+00 |
| description | Rust client for Apache Spark Connect. |
| homepage | https://github.com/franciscoabsampaio/spark-connect |
| repository | https://github.com/franciscoabsampaio/spark-connect |
| max_upload_size | |
| id | 1887495 |
| size | 925,709 |

An idiomatic, SQL-first Rust client for Apache Spark Connect.
This crate provides a fully asynchronous, strongly typed API for interacting with a remote Spark Connect server over gRPC.
It allows you to build and execute SQL queries, bind parameters safely,
and collect Arrow RecordBatch results - just like any other SQL toolkit -
all in native Rust.
sc://host:port format);tokio and tonic;Vec<RecordBatch>;use spark_connect::SparkSessionBuilder;
# #[tokio::main]
# async fn main() -> Result<(), Box<dyn std::error::Error>> {
// 1️⃣ Connect to a Spark Connect endpoint
let session = SparkSessionBuilder::new("sc://localhost:15002")
.build()
.await?;
// 2️⃣ Execute a simple SQL query and receive a Vec<RecordBatches>
let batches = session
.query("SELECT ? AS rule, ? AS text")
.bind(42)
.bind("world")
.execute()
.await?;
# Ok(())
# }
It's that simple!
Behind the scenes, the [SparkSession::query] method
uses the [ToLiteral] trait to safely bind parameters
before execution:
use spark_connect::ToLiteral;
// This is
let batches = session
.query("SELECT ? AS id, ? AS text")
.bind(42)
.bind("world")
.await?;
// the same as this
let lazy_plan = session.sql(
"SELECT ? AS id, ? AS text",
vec![42.to_literal(), "world".to_literal()]
).await?;
let batches = session.collect(lazy_plan);
The biggest advantage to using the sql() method
instead of query() is lazy execution -
queries can be lazily evaluated and collected afterwards.
If you're coming from PySpark or Scala, this should be the familiar interface.
SparkSession — the main entry point for executing
SQL queries and managing a session.SparkClient — low-level gRPC client (used internally).SqlQueryBuilder — helper for binding parameters
and executing queries.sc:// endpoint;tokio runtime.sc://localhost:15002
sc://spark-cluster:15002/?user_id=francisco
sc://10.0.0.5:15002;session_id=abc123;user_agent=my-app
Currently, this crate is built against Spark 3.5.x. If you need to build against a different version of Spark Connect, you can:
protobuf directory and replace the protobuf/ directory of this repository with the desired version.cargo build to regenerate the gRPC client code.| Version | Path to the protobuf directory |
|---|---|
| 4.x | branch-4.x / sql/connect/common/src/main/protobuf |
| 3.4-3.5 | branch-3.x / connector/connect/common/src/main/protobuf |
⚠️ Note that compatibility is not guaranteed, and you may encounter issues if there are significant changes between versions.
This project takes heavy inspiration from the spark-connect-rs project, and would've been much harder without it!
© 2025 Francisco A. B. Sampaio. Licensed under the MIT License.
This project is not affiliated with, endorsed by, or sponsored by the Apache Software Foundation. “Apache”, “Apache Spark”, and “Spark Connect” are trademarks of the Apache Software Foundation.