| Crates.io | hdbconnect-arrow |
| lib.rs | hdbconnect-arrow |
| version | 0.2.3 |
| created_at | 2026-01-11 06:03:34.876019+00 |
| updated_at | 2026-01-23 00:15:50.970414+00 |
| description | Apache Arrow integration for hdbconnect SAP HANA driver |
| homepage | |
| repository | https://github.com/bug-ops/pyhdb-rs |
| max_upload_size | |
| id | 2035262 |
| size | 388,600 |
Apache Arrow integration for the hdbconnect SAP HANA driver. Converts HANA result sets to Arrow RecordBatch format, enabling zero-copy interoperability with the entire Arrow ecosystem.
Apache Arrow is the universal columnar data format for analytics. By converting SAP HANA data to Arrow, you unlock seamless integration with:
| Category | Tools |
|---|---|
| DataFrames | Polars, pandas, Vaex, Dask |
| Query engines | DataFusion, DuckDB, ClickHouse, Ballista |
| ML/AI | Ray, Hugging Face Datasets, PyTorch, TensorFlow |
| Data lakes | Delta Lake, Apache Iceberg, Lance |
| Visualization | Perspective, Graphistry, Falcon |
| Languages | Rust, Python, R, Julia, Go, Java, C++ |
[!TIP] Arrow's columnar format enables vectorized processing — operations run 10-100x faster than row-by-row iteration.
[dependencies]
hdbconnect-arrow = "0.1"
Or with cargo-add:
cargo add hdbconnect-arrow
[!IMPORTANT] Requires Rust 1.88 or later.
use hdbconnect_arrow::{HanaBatchProcessor, BatchConfig, Result};
use arrow_schema::{Schema, Field, DataType};
use std::sync::Arc;
fn process_results(result_set: hdbconnect::ResultSet) -> Result<()> {
let schema = Arc::new(Schema::new(vec![
Field::new("id", DataType::Int64, false),
Field::new("name", DataType::Utf8, true),
]));
let config = BatchConfig::default();
let mut processor = HanaBatchProcessor::new(Arc::clone(&schema), config);
for row in result_set {
if let Some(batch) = processor.process_row(&row?)? {
println!("Batch with {} rows", batch.num_rows());
}
}
// Flush remaining rows
if let Some(batch) = processor.flush()? {
println!("Final batch with {} rows", batch.num_rows());
}
Ok(())
}
use hdbconnect_arrow::{hana_type_to_arrow, hana_field_to_arrow};
use hdbconnect::TypeId;
// Convert individual types
let arrow_type = hana_type_to_arrow(TypeId::DECIMAL, Some(18), Some(2));
// Returns: DataType::Decimal128(18, 2)
// Convert entire field metadata
let arrow_field = hana_field_to_arrow(&hana_field_metadata);
use hdbconnect_arrow::BatchConfig;
use std::num::NonZeroUsize;
let config = BatchConfig::new(NonZeroUsize::new(10_000).unwrap());
Query HANA data with SQL using Apache DataFusion:
use datafusion::prelude::*;
let batches = collect_batches_from_hana(result_set)?;
let ctx = SessionContext::new();
ctx.register_batch("hana_data", batches[0].clone())?;
let df = ctx.sql("SELECT * FROM hana_data WHERE amount > 1000").await?;
df.show().await?;
Load Arrow data directly into DuckDB:
use duckdb::{Connection, arrow::record_batch_to_duckdb};
let conn = Connection::open_in_memory()?;
conn.register_arrow("sales", batches)?;
let mut stmt = conn.prepare("SELECT region, SUM(amount) FROM sales GROUP BY region")?;
let result = stmt.query_arrow([])?;
Convert to Polars DataFrame:
use polars::prelude::*;
let batch = processor.flush()?.unwrap();
let df = DataFrame::try_from(batch)?;
let result = df.lazy()
.filter(col("status").eq(lit("active")))
.group_by([col("region")])
.agg([col("amount").sum()])
.collect()?;
Serialize Arrow data for storage or network transfer:
use arrow_ipc::writer::FileWriter;
use parquet::arrow::ArrowWriter;
use std::fs::File;
// Arrow IPC (Feather) format
let file = File::create("data.arrow")?;
let mut writer = FileWriter::try_new(file, &schema)?;
writer.write(&batch)?;
writer.finish()?;
// Parquet format
let file = File::create("data.parquet")?;
let mut writer = ArrowWriter::try_new(file, schema.clone(), None)?;
writer.write(&batch)?;
writer.close()?;
Export Arrow data to Python without copying (requires pyo3):
use pyo3_arrow::PyArrowType;
use pyo3::prelude::*;
#[pyfunction]
fn get_hana_data(py: Python<'_>) -> PyResult<PyArrowType<RecordBatch>> {
let batch = fetch_from_hana()?;
Ok(PyArrowType(batch))
}
// Python: df = pl.from_arrow(get_hana_data())
Enable optional features in Cargo.toml:
[dependencies]
hdbconnect-arrow = { version = "0.1", features = ["async", "test-utils"] }
| Feature | Description | Default |
|---|---|---|
async |
Async support via hdbconnect_async |
No |
test-utils |
Expose MockRow/MockRowBuilder for testing |
No |
[!TIP] Enable
test-utilsin dev-dependencies for unit testing without a HANA connection.
| HANA Type | Arrow Type | Notes |
|---|---|---|
| TINYINT | UInt8 | Unsigned in HANA |
| SMALLINT | Int16 | |
| INT | Int32 | |
| BIGINT | Int64 | |
| REAL | Float32 | |
| DOUBLE | Float64 | |
| DECIMAL(p,s) | Decimal128(p,s) | Full precision preserved |
| CHAR, VARCHAR | Utf8 | |
| NCHAR, NVARCHAR | Utf8 | Unicode strings |
| CLOB, NCLOB | LargeUtf8 | Large text |
| BLOB | LargeBinary | Large binary |
| DATE | Date32 | Days since epoch |
| TIME | Time64(Nanosecond) | |
| TIMESTAMP | Timestamp(Nanosecond) | |
| BOOLEAN | Boolean | |
| GEOMETRY, POINT | Binary | WKB format |
HanaBatchProcessor — Converts HANA rows to Arrow RecordBatch with configurable batch sizesBatchConfig — Configuration for batch processing (uses NonZeroUsize for type-safe batch size)SchemaMapper — Maps HANA result set metadata to Arrow schemasBuilderFactory — Creates appropriate Arrow array builders for HANA typesTypeCategory — Centralized HANA type classification enumHanaCompatibleBuilder — Trait for Arrow builders that accept HANA valuesFromHanaValue — Sealed trait for type-safe value conversionBatchProcessor — Core batch processing interfaceLendingBatchIterator — GAT-based streaming iterator for large result setsRowLike — Row abstraction for testing without HANA connectionWhen test-utils feature is enabled:
use hdbconnect_arrow::{MockRow, MockRowBuilder};
let row = MockRowBuilder::new()
.push_i64(42)
.push_string("test")
.push_null()
.build();
use hdbconnect_arrow::{ArrowConversionError, Result};
fn convert_data() -> Result<()> {
// ArrowConversionError covers:
// - Type mismatches
// - Decimal overflow
// - Schema incompatibilities
// - Invalid batch configuration
Ok(())
}
The crate is optimized for high-throughput data transfer:
[!NOTE] For large result sets, use
LendingBatchIteratorto stream data with constant memory usage.
This crate is part of the pyhdb-rs workspace, providing the Arrow integration layer for the Python SAP HANA driver.
Related crates:
hdbconnect-py — PyO3 bindings exposing Arrow data to Python[!NOTE] Minimum Supported Rust Version: 1.88. MSRV increases are minor version bumps.
Licensed under either of:
at your option.