hdbconnect-arrow

Crates.io	hdbconnect-arrow
lib.rs	hdbconnect-arrow
version	0.2.3
created_at	2026-01-11 06:03:34.876019+00
updated_at	2026-01-23 00:15:50.970414+00
description	Apache Arrow integration for hdbconnect SAP HANA driver
homepage
repository	https://github.com/bug-ops/pyhdb-rs
max_upload_size
id	2035262
size	388,600

Andrei G (bug-ops)

documentation

README

hdbconnect-arrow

Apache Arrow integration for the hdbconnect SAP HANA driver. Converts HANA result sets to Arrow RecordBatch format, enabling zero-copy interoperability with the entire Arrow ecosystem.

Why Arrow?

Apache Arrow is the universal columnar data format for analytics. By converting SAP HANA data to Arrow, you unlock seamless integration with:

Category	Tools
DataFrames	Polars, pandas, Vaex, Dask
Query engines	DataFusion, DuckDB, ClickHouse, Ballista
ML/AI	Ray, Hugging Face Datasets, PyTorch, TensorFlow
Data lakes	Delta Lake, Apache Iceberg, Lance
Visualization	Perspective, Graphistry, Falcon
Languages	Rust, Python, R, Julia, Go, Java, C++

[!TIP] Arrow's columnar format enables vectorized processing — operations run 10-100x faster than row-by-row iteration.

Installation

[dependencies]
hdbconnect-arrow = "0.1"

Or with cargo-add:

cargo add hdbconnect-arrow

[!IMPORTANT] Requires Rust 1.88 or later.

Usage

Basic batch processing

use hdbconnect_arrow::{HanaBatchProcessor, BatchConfig, Result};
use arrow_schema::{Schema, Field, DataType};
use std::sync::Arc;

fn process_results(result_set: hdbconnect::ResultSet) -> Result<()> {
    let schema = Arc::new(Schema::new(vec![
        Field::new("id", DataType::Int64, false),
        Field::new("name", DataType::Utf8, true),
    ]));

    let config = BatchConfig::default();
    let mut processor = HanaBatchProcessor::new(Arc::clone(&schema), config);

    for row in result_set {
        if let Some(batch) = processor.process_row(&row?)? {
            println!("Batch with {} rows", batch.num_rows());
        }
    }

    // Flush remaining rows
    if let Some(batch) = processor.flush()? {
        println!("Final batch with {} rows", batch.num_rows());
    }

    Ok(())
}

Schema mapping

use hdbconnect_arrow::{hana_type_to_arrow, hana_field_to_arrow};
use hdbconnect::TypeId;

// Convert individual types
let arrow_type = hana_type_to_arrow(TypeId::DECIMAL, Some(18), Some(2));
// Returns: DataType::Decimal128(18, 2)

// Convert entire field metadata
let arrow_field = hana_field_to_arrow(&hana_field_metadata);

Custom batch size

use hdbconnect_arrow::BatchConfig;
use std::num::NonZeroUsize;

let config = BatchConfig::new(NonZeroUsize::new(10_000).unwrap());

Ecosystem integration

DataFusion — SQL queries on Arrow data

Query HANA data with SQL using Apache DataFusion:

use datafusion::prelude::*;

let batches = collect_batches_from_hana(result_set)?;
let ctx = SessionContext::new();
ctx.register_batch("hana_data", batches[0].clone())?;

let df = ctx.sql("SELECT * FROM hana_data WHERE amount > 1000").await?;
df.show().await?;

DuckDB — analytical queries

Load Arrow data directly into DuckDB:

use duckdb::{Connection, arrow::record_batch_to_duckdb};

let conn = Connection::open_in_memory()?;
conn.register_arrow("sales", batches)?;

let mut stmt = conn.prepare("SELECT region, SUM(amount) FROM sales GROUP BY region")?;
let result = stmt.query_arrow([])?;

Polars — zero-copy DataFrame

Convert to Polars DataFrame:

use polars::prelude::*;

let batch = processor.flush()?.unwrap();
let df = DataFrame::try_from(batch)?;

let result = df.lazy()
    .filter(col("status").eq(lit("active")))
    .group_by([col("region")])
    .agg([col("amount").sum()])
    .collect()?;

Arrow IPC / Parquet — serialization

Serialize Arrow data for storage or network transfer:

use arrow_ipc::writer::FileWriter;
use parquet::arrow::ArrowWriter;
use std::fs::File;

// Arrow IPC (Feather) format
let file = File::create("data.arrow")?;
let mut writer = FileWriter::try_new(file, &schema)?;
writer.write(&batch)?;
writer.finish()?;

// Parquet format
let file = File::create("data.parquet")?;
let mut writer = ArrowWriter::try_new(file, schema.clone(), None)?;
writer.write(&batch)?;
writer.close()?;

Python interop — PyCapsule zero-copy

Export Arrow data to Python without copying (requires pyo3):

use pyo3_arrow::PyArrowType;
use pyo3::prelude::*;

#[pyfunction]
fn get_hana_data(py: Python<'_>) -> PyResult<PyArrowType<RecordBatch>> {
    let batch = fetch_from_hana()?;
    Ok(PyArrowType(batch))
}

// Python: df = pl.from_arrow(get_hana_data())

Features

Enable optional features in Cargo.toml:

[dependencies]
hdbconnect-arrow = { version = "0.1", features = ["async", "test-utils"] }

Feature	Description	Default
`async`	Async support via `hdbconnect_async`	No
`test-utils`	Expose `MockRow`/`MockRowBuilder` for testing	No

[!TIP] Enable test-utils in dev-dependencies for unit testing without a HANA connection.

Type mapping

HANA → Arrow type conversion table

HANA Type	Arrow Type	Notes
TINYINT	UInt8	Unsigned in HANA
SMALLINT	Int16
INT	Int32
BIGINT	Int64
REAL	Float32
DOUBLE	Float64
DECIMAL(p,s)	Decimal128(p,s)	Full precision preserved
CHAR, VARCHAR	Utf8
NCHAR, NVARCHAR	Utf8	Unicode strings
CLOB, NCLOB	LargeUtf8	Large text
BLOB	LargeBinary	Large binary
DATE	Date32	Days since epoch
TIME	Time64(Nanosecond)
TIMESTAMP	Timestamp(Nanosecond)
BOOLEAN	Boolean
GEOMETRY, POINT	Binary	WKB format

API overview

Core types

HanaBatchProcessor — Converts HANA rows to Arrow RecordBatch with configurable batch sizes
BatchConfig — Configuration for batch processing (uses NonZeroUsize for type-safe batch size)
SchemaMapper — Maps HANA result set metadata to Arrow schemas
BuilderFactory — Creates appropriate Arrow array builders for HANA types
TypeCategory — Centralized HANA type classification enum

Traits

HanaCompatibleBuilder — Trait for Arrow builders that accept HANA values
FromHanaValue — Sealed trait for type-safe value conversion
BatchProcessor — Core batch processing interface
LendingBatchIterator — GAT-based streaming iterator for large result sets
RowLike — Row abstraction for testing without HANA connection

Test utilities

When test-utils feature is enabled:

use hdbconnect_arrow::{MockRow, MockRowBuilder};

let row = MockRowBuilder::new()
    .push_i64(42)
    .push_string("test")
    .push_null()
    .build();

Error handling

use hdbconnect_arrow::{ArrowConversionError, Result};

fn convert_data() -> Result<()> {
    // ArrowConversionError covers:
    // - Type mismatches
    // - Decimal overflow
    // - Schema incompatibilities
    // - Invalid batch configuration
    Ok(())
}

Performance

The crate is optimized for high-throughput data transfer:

Zero-copy conversion — Arrow builders write directly to memory without intermediate allocations
Batch processing — Configurable batch sizes to balance memory usage and throughput
Decimal optimization — Direct BigInt arithmetic avoids string parsing overhead
Builder reuse — Builders reset between batches, eliminating repeated allocations

[!NOTE] For large result sets, use LendingBatchIterator to stream data with constant memory usage.

Part of pyhdb-rs

This crate is part of the pyhdb-rs workspace, providing the Arrow integration layer for the Python SAP HANA driver.

Related crates:

hdbconnect-py — PyO3 bindings exposing Arrow data to Python

Resources

Apache Arrow — Official Arrow project
Arrow Rust — Rust implementation
DataFusion — Query engine built on Arrow
Powered by Arrow — Projects using Arrow

MSRV policy

[!NOTE] Minimum Supported Rust Version: 1.88. MSRV increases are minor version bumps.

License

Licensed under either of:

Apache License, Version 2.0 (LICENSE-APACHE)
MIT license (LICENSE-MIT)

at your option.

Commit count: 92

hdbconnect-arrow

documentation

README

hdbconnect-arrow

Why Arrow?

Installation

Usage

Basic batch processing

Schema mapping

Custom batch size

Ecosystem integration

Features

Type mapping

API overview

Performance

Part of pyhdb-rs

Resources

MSRV policy

License

cargo fmt