dataload

Crates.iodataload
lib.rsdataload
version0.1.1
created_at2025-12-01 00:48:51.55182+00
updated_at2025-12-01 01:01:17.48093+00
descriptionA flexible data loading library for CSV and Excel files with automatic delimiter detection
homepage
repositoryhttps://github.com/OneThing98/dataload
max_upload_size
id1959068
size121,524
(OneThing98)

documentation

https://docs.rs/dataload

README

dataload

A flexible Rust library for loading CSV and Excel files into Polars DataFrames.

Features

  • Automatic file type detection via magic bytes and file extensions
  • Smart delimiter detection for CSV files (comma, tab, semicolon, pipe)
  • Excel support for xlsx, xls, xlsm, xlsb, and ods formats
  • Builder-pattern API for flexible configuration
  • Feature flags to minimize dependencies

Installation

Add to your Cargo.toml:

[dependencies]
dataload = "0.1"

To disable Excel support (reduces compile time and dependencies):

[dependencies]
dataload = { version = "0.1", default-features = false, features = ["csv"] }

Quick Start

use dataload::{DataLoader, load_file, load_bytes};
use std::path::Path;

// Simple one-liner
let df = load_file(Path::new("data.csv"))?;

// From bytes
let csv_data = b"name,age\nAlice,30\nBob,25";
let df = load_bytes(csv_data, "data.csv")?;

// With custom options
let df = DataLoader::new()
    .with_delimiter(dataload::Delimiter::Tab)
    .with_header(false)
    .with_skip_rows(1)
    .with_max_rows(Some(1000))
    .load_file(Path::new("data.tsv"))?;

CSV Loading

The library automatically detects the delimiter by analyzing the file content:

use dataload::load_bytes;

// Auto-detects comma delimiter
let df = load_bytes(b"a,b,c\n1,2,3", "data.csv")?;

// Auto-detects tab delimiter
let df = load_bytes(b"a\tb\tc\n1\t2\t3", "data.tsv")?;

// Auto-detects semicolon delimiter
let df = load_bytes(b"a;b;c\n1;2;3", "data.csv")?;

Or specify a delimiter explicitly:

use dataload::{DataLoader, Delimiter};

let df = DataLoader::new()
    .with_delimiter(Delimiter::Pipe)
    .load_bytes(b"a|b|c\n1|2|3", "data.txt")?;

Excel Loading

use dataload::DataLoader;
use std::path::Path;

// Load first sheet (default)
let df = DataLoader::new()
    .load_file(Path::new("report.xlsx"))?;

// Load specific sheet by name
let df = DataLoader::new()
    .with_sheet_name("Sales Data")
    .load_file(Path::new("report.xlsx"))?;

// Load specific sheet by index (0-based)
let df = DataLoader::new()
    .with_sheet_index(2)
    .load_file(Path::new("report.xlsx"))?;

// List available sheets
let sheets = dataload::list_sheets(&file_bytes)?;

Configuration Options

Option Description Default
delimiter CSV delimiter (Auto, Comma, Tab, Semicolon, Pipe, Custom(u8)) Auto
has_header First row is header true
skip_rows Rows to skip from start 0
max_rows Maximum rows to read None (all)
sheet_index Excel sheet index (0-based) None (first)
sheet_name Excel sheet name None
infer_schema Infer column types true
infer_schema_length Rows for type inference Some(1000)

Error Handling

All operations return Result<T, DataLoadError>:

use dataload::{load_file, DataLoadError};
use std::path::Path;

match load_file(Path::new("data.csv")) {
    Ok(df) => println!("Loaded {} rows", df.height()),
    Err(DataLoadError::Io(e)) => eprintln!("File error: {e}"),
    Err(DataLoadError::UnsupportedFileType(ext)) => eprintln!("Unknown type: {ext}"),
    Err(e) => eprintln!("Error: {e}"),
}

Feature Flags

Feature Description Default
csv CSV/TSV file support
excel Excel file support (xlsx, xls, etc.)

License

Licensed under Apache License, Version 2.0

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Commit count: 0

cargo fmt