sas7bdat

Crates.io	sas7bdat
lib.rs	sas7bdat
version	0.1.1
created_at	2026-01-16 09:42:08.572441+00
updated_at	2026-01-16 09:52:27.569144+00
description	Rust library + CLI for decoding SAS7BDAT datasets and streaming them to modern formats.
homepage	https://github.com/tkragholm/sas7bdat-parser-rs
repository	https://github.com/tkragholm/sas7bdat-parser-rs
max_upload_size
id	2048331
size	462,341

Tobias Kragholm (tkragholm)

documentation

https://docs.rs/sas7bdat

README

sas7bdat

sas7bdat is a Rust library for decoding SAS7BDAT datasets with a focus on reproducible research workflows. It exposes a safe API for inspecting metadata, streaming rows, and writing Parquet output so that legacy SAS exports can participate in modern data engineering pipelines. The project is Rust-first (library + CLI) with Python (PyO3) and R (extendr) bindings under active development. It was originally built for heavy, secure processing on Statistics Denmark’s servers over large national registers.

This project aims to bridge a legacy, closed-source data format into modern, open-source workflows. Today many stacks lean on the venerable C-based ReadStat (e.g., haven, pyreadstat); implementing the reader in Rust should make contributions more approachable and redistribution (cross-compilation, shipping wheels/binaries) simpler while preserving performance.

Related work

ReadStat (C) — battle-tested reference library used by haven and pyreadstat (WizardMac/ReadStat).
cppsas7bdat (C++) — C++ reader used for comparison (olivia76/cpp-sas7bdat).
Sas7Bdat.Core (C#) — .NET reader (richokelly/Sas7Bdat).
pandas (Python) — pandas’ built-in SAS reader (Python implementation, independent of ReadStat) (pandas-dev/pandas).
Reverse-engineered SAS7BDAT docs — historical compatibility study and binary format notes (BioStatMatt/sas7bdat).

The crate powers a test suite that cross-checks parsed output against community fixtures and other statistical packages (pandas, PyReadStat, Haven). It also ships an example that downloads the U.S. Census American Housing Survey (AHS) public-use file, converts it to Parquet, and demonstrates end-to-end integration.

Features

Zero-copy metadata decoding, including column projections and row pagination.
Configurable Parquet writer with row-group sizing heuristics.
Support for companion catalog files to hydrate value labels.
Comprehensive fixtures spanning multiple SAS encodings and compression modes.
Datatest-based regression suite that compares results with external toolchains.

Getting started

Add the library to an existing Cargo project:

cargo add sas7bdat

Or build the repository directly:

git clone https://github.com/tkragholm/sas7bdat-parser-rs.git
cd sas7bdat-parser-rs
git submodule update --init --recursive
cargo build

CLI usage

This repo also ships a small CLI to batch‑convert SAS7BDAT files to Parquet/CSV/TSV using streaming sinks. It supports directory recursion, simple projection, and pagination.

cargo run --bin sas7 -- convert path/to/dir --sink parquet --jobs 4
cargo run --bin sas7 -- convert file.sas7bdat --sink csv --out file.csv --columns COL1,COL2 --skip 100 --max-rows 1000
cargo run --bin sas7 -- inspect file.sas7bdat --json

Options include --out-dir, --out, --sink {parquet|csv|tsv}, CSV/TSV --headers/--no-headers and --delimiter, projection via --columns or --column-indices, pagination with --skip and --max-rows, and Parquet tuning flags --parquet-row-group-size and --parquet-target-bytes.

Converting the AHS dataset

The repository includes an example that downloads the 2013 AHS public-use file ZIP archive, extracts the embedded .sas7bdat, and writes ahs2013n.parquet to the working directory:

cargo run --example sas_to_parquet            # default output ahs2013n.parquet
cargo run --example sas_to_parquet -- data/ahs.parquet

The example requires network access to https://www2.census.gov/ during the download step. If the download is slow or blocked, point at a local or alternate ZIP:

curl -L -o /tmp/ahs2013.zip "https://www2.census.gov/programs-surveys/ahs/2013/AHS%202013%20National%20PUF%20v2.0%20Flat%20SAS.zip"
AHS_ZIP_PATH=/tmp/ahs2013.zip cargo run --example sas_to_parquet

# or use a mirror
AHS_ZIP_URL=https://your.mirror/AHS2013.zip cargo run --example sas_to_parquet

Using the library

use std::fs::File;
use sas7bdat::SasReader;

fn main() -> sas7bdat::Result<()> {
    let mut sas = SasReader::open("dataset.sas7bdat")?;
    let metadata = sas.metadata().clone();
    println!("Columns: {}", metadata.variables.len());

    let mut rows = sas.rows()?;
    while let Some(row) = rows.try_next()? {
        // Inspect row values here
        println!("first column = {:?}", row[0]);
    }

    Ok(())
}

See the examples in examples/ for more complete pipelines, including Parquet export.

Testing

Run the unit and integration test suites:

cargo test

Snapshot fixtures rely on datasets under fixtures/raw_data/. Large archives are ignored by .gitignore but are required for the full regression suite.

License

Licensed under the MIT License.

Contributing

Issues and pull requests are welcome. Please open an issue before proposing substantial architectural changes so we can coordinate design and testing expectations.

Commit count: 0

sas7bdat

documentation

README

sas7bdat

Related work

Features

Getting started

CLI usage

Converting the AHS dataset

Using the library

Testing

License

Contributing

cargo fmt