| Crates.io | csv_deserializer |
| lib.rs | csv_deserializer |
| version | 0.1.4 |
| created_at | 2025-12-24 17:48:22.033136+00 |
| updated_at | 2025-12-30 18:35:51.489602+00 |
| description | A rust library to load a csv in a rusty typed way, all values are converted to the most appropriate enum |
| homepage | |
| repository | https://github.com/AliothCancer/csv_deserializer |
| max_upload_size | |
| id | 2003625 |
| size | 46,374 |
This repo contains a rust binary (main.rs) which translate a csv table in rust types, every column is converted into a Vec of enum representing all the unique types, if a column is of String type then every unique String will be deserialized as an enum variant (see iris dataset example).
git clone https://github.com/AliothCancer/csv_deserializer.git
A folder called csv_deserializer will be created
cd csv_deserializer
cargo build --release
~/.local/bin:
cp target/release/csv_deserializer ~/.local/bin
❯ csv_deserializer -h
Usage: csv_deserializer [OPTIONS] --input-file <input_file>
Options:
-i, --input-file <input_file>
-n, --null-values <a,b,..>
-h, --help Print help
-V, --version Print version
Note on null values:
--null-values is an optional comma separate list of string which will be converted to the Null variant which all generated enums haveThere is 2 struct to represent the csv file as rust type:
#[derive(Debug)]
pub struct CsvDataset<'a> {
pub names: Vec<ColName>,
pub values: Vec<Vec<CsvAny>>,
pub null_values: NullValues<'a>,
pub info: Vec<ColumnInfo>,
}
CsvDataset is defined in the lib.rs. It can also be used to easily load a csv
Every csv "cell" is stored in CsvAny type:#[derive(Debug, PartialEq, PartialOrd, Clone)]
pub enum CsvAny {
Str(String),
Int(i64),
Float(f64),
Null, // to represent null values
Empty, // if it is just empty
}
CsvDataFrame is generated from the binary of this crate so it is available only after you put the rust generated code in a rs file and defined it as a module. The exact structure depends on the csv file you passed, i.e. name of the columns, unique values for each column. (See the iris example as a reference of the structure of this type)
To use this library for generating and utilizing a typed Rust interface for your CSV files, follow these steps:
First, load your CSV file using a csv::Reader. You then create a CsvDataset by providing the reader and specifying which strings should be treated as null values.
let file = File::open("iris.csv")?;
let rdr = csv::ReaderBuilder::new()
.has_headers(true)
.from_reader(file);
let dataset = CsvDataset::new(rdr, NullValues(&["NA"]));
Use the csv_deserializing cli to generate the rust code for a specific csv file. The binary will print all the rust code so you can redirect this output to a file from your command line to save it.
Once the code is saved into a file (e.g., iris.rs), you can import it into your project. To work with the typed data, initialize a CsvDataFrame type by passing the CsvDataset you created earlier.
mod iris;
use iris::*;
let df = CsvDataFrame::new(&dataset);
// Build a reader for the csv file
let path = "iris.csv";
let file = File::open(path)?;
let rdr = csv::ReaderBuilder::new()
.has_headers(true)
.from_reader(file);
// builf the CsvDataset with reader and nullvalues
let dataset = CsvDataset::new(rdr, NullValues(vec!["NA"]));
// The iris.rs file is generate with the binary of csv_deserializer
// Then inside the iris.rs file a CsvDataFrame is used
// as the main struct which contains all the data
let df = CsvDataFrame::new(&dataset);
// Do ETL stuffes in a type safe way but it comes at less
// flexibility sometimes, so you can always use CsvDataset which
// use CsvAny as the type for every cell
// Can destruct the column wrapper called CsvColumn with if let
if let CsvColumn::target(target) = &df.target
&& let CsvColumn::petal_length_cm(_pet_length) = &df.petal_length_cm
{
target.iter().for_each(|x| match x {
target::Iris_setosa => todo!(),
target::Iris_versicolor => todo!(),
target::Iris_virginica => todo!(),
target::Null => todo!(),
});
}
// Can use a list of all columns
// make sure to use completion
// for match arms
for col in df.get_columns() {
match col {
CsvColumn::sepal_length_cm(sepal_length_cms) => todo!(),
CsvColumn::sepal_width_cm(sepal_width_cms) => todo!(),
CsvColumn::petal_length_cm(petal_length_cms) => todo!(),
CsvColumn::petal_width_cm(petal_width_cms) => todo!(),
CsvColumn::target(targets) => todo!(),
}
}
Sanitization is achived converting any number or special char to Strings that will be used in the generated code. In particular the function which does it is contained in sanitizer.rs (sanitize_identifier).
The library identifies types by attempting to parse each raw CSV value.
i64, it is treated as an Int; if it parses as an f64, it is treated as a Float. For example taking a look at sepal length (cm) in the iris dataset, the resulting type is:#[derive(Debug, Clone, Copy, PartialEq)]
pub enum sepal_length_cm {
Float(f64),
Null,
}
// Also implement from string
impl std::str::FromStr for sepal_length_cm {
type Err = String;
fn from_str(s: &str) -> Result<Self, Self::Err> {
let f = s.parse::<f64>().unwrap();
Ok(sepal_length_cm::Float(f))
}
}
Str. The generated rust code for a string values column is like: (Example for iris dataset)create_enum!(target;
"Iris-setosa" => Iris_setosa,
"Iris-versicolor" => Iris_versicolor,
"Iris-virginica" => Iris_virginica,
Null,
);
The create_enum macro is used to have a sintactic sugar way to associate raw strings to the the typed enum variant.
ColumnInfo tracks the count of these types and stores unique variants to facilitate categorical Enum generation.This is the example for the iris dataset:
sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),target
5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa
Rust generated code:
#[derive(Debug)]
pub enum CsvColumn {
sepal_length_cm(Vec<sepal_length_cm>),
sepal_width_cm(Vec<sepal_width_cm>),
petal_length_cm(Vec<petal_length_cm>),
petal_width_cm(Vec<petal_width_cm>),
target(Vec<target>),
}
pub struct CsvDataFrame {
pub sepal_length_cm: CsvColumn,
pub sepal_width_cm: CsvColumn,
pub petal_length_cm: CsvColumn,
pub petal_width_cm: CsvColumn,
pub target: CsvColumn,
}
Each enum used to represent the csv value have a Null variant.