| Crates.io | sas7bdat |
| lib.rs | sas7bdat |
| version | 0.1.1 |
| created_at | 2026-01-16 09:42:08.572441+00 |
| updated_at | 2026-01-16 09:52:27.569144+00 |
| description | Rust library + CLI for decoding SAS7BDAT datasets and streaming them to modern formats. |
| homepage | https://github.com/tkragholm/sas7bdat-parser-rs |
| repository | https://github.com/tkragholm/sas7bdat-parser-rs |
| max_upload_size | |
| id | 2048331 |
| size | 462,341 |
sas7bdat is a Rust library for decoding SAS7BDAT datasets with a focus on reproducible research workflows. It exposes a safe API for inspecting metadata, streaming rows, and writing Parquet output so that legacy SAS exports can participate in modern data engineering pipelines. The project is Rust-first (library + CLI) with Python (PyO3) and R (extendr) bindings under active development. It was originally built for heavy, secure processing on Statistics Denmark’s servers over large national registers.
This project aims to bridge a legacy, closed-source data format into modern, open-source workflows. Today many stacks lean on the venerable C-based ReadStat (e.g., haven, pyreadstat); implementing the reader in Rust should make contributions more approachable and redistribution (cross-compilation, shipping wheels/binaries) simpler while preserving performance.
The crate powers a test suite that cross-checks parsed output against community fixtures and other statistical packages (pandas, PyReadStat, Haven). It also ships an example that downloads the U.S. Census American Housing Survey (AHS) public-use file, converts it to Parquet, and demonstrates end-to-end integration.
Add the library to an existing Cargo project:
cargo add sas7bdat
Or build the repository directly:
git clone https://github.com/tkragholm/sas7bdat-parser-rs.git
cd sas7bdat-parser-rs
git submodule update --init --recursive
cargo build
This repo also ships a small CLI to batch‑convert SAS7BDAT files to Parquet/CSV/TSV using streaming sinks. It supports directory recursion, simple projection, and pagination.
cargo run --bin sas7 -- convert path/to/dir --sink parquet --jobs 4
cargo run --bin sas7 -- convert file.sas7bdat --sink csv --out file.csv --columns COL1,COL2 --skip 100 --max-rows 1000
cargo run --bin sas7 -- inspect file.sas7bdat --json
Options include --out-dir, --out, --sink {parquet|csv|tsv}, CSV/TSV --headers/--no-headers and --delimiter, projection via --columns or --column-indices, pagination with --skip and --max-rows, and Parquet tuning flags --parquet-row-group-size and --parquet-target-bytes.
The repository includes an example that downloads the 2013 AHS public-use file ZIP archive, extracts the embedded .sas7bdat, and writes ahs2013n.parquet to the working directory:
cargo run --example sas_to_parquet # default output ahs2013n.parquet
cargo run --example sas_to_parquet -- data/ahs.parquet
The example requires network access to https://www2.census.gov/ during the download step.
If the download is slow or blocked, point at a local or alternate ZIP:
curl -L -o /tmp/ahs2013.zip "https://www2.census.gov/programs-surveys/ahs/2013/AHS%202013%20National%20PUF%20v2.0%20Flat%20SAS.zip"
AHS_ZIP_PATH=/tmp/ahs2013.zip cargo run --example sas_to_parquet
# or use a mirror
AHS_ZIP_URL=https://your.mirror/AHS2013.zip cargo run --example sas_to_parquet
use std::fs::File;
use sas7bdat::SasReader;
fn main() -> sas7bdat::Result<()> {
let mut sas = SasReader::open("dataset.sas7bdat")?;
let metadata = sas.metadata().clone();
println!("Columns: {}", metadata.variables.len());
let mut rows = sas.rows()?;
while let Some(row) = rows.try_next()? {
// Inspect row values here
println!("first column = {:?}", row[0]);
}
Ok(())
}
See the examples in examples/ for more complete pipelines, including Parquet export.
Run the unit and integration test suites:
cargo test
Snapshot fixtures rely on datasets under fixtures/raw_data/. Large archives are ignored by .gitignore but are required for the full regression suite.
Licensed under the MIT License.
Issues and pull requests are welcome. Please open an issue before proposing substantial architectural changes so we can coordinate design and testing expectations.