csv-schema-validator

Crates.iocsv-schema-validator
lib.rscsv-schema-validator
version0.2.0
created_at2025-08-06 19:40:17.458117+00
updated_at2025-10-16 21:10:10.376154+00
descriptionDerive macro to validate CSV
homepage
repositoryhttps://github.com/cleuton/csv-schema-validator
max_upload_size
id1784250
size59,591
Cleuton Sampaio (cleuton)

documentation

README

csv-schema-validator

In the roadmap: version 0.2.1 with more cross-validations:

  • if_column: Checks if the value of the conditional column is in a list of values, then checks the value of the annotated field:
#[validate(if_column("status", ["paid", "cancelled"], ["done","rejected"]))] // This is not the final format! Just an idea.

Version 0.2.0

Crates.io Documentation

.

.

A Rust library for validating CSV record data based on rules defined directly in your structs using the #[derive(ValidateCsv)] macro.

Installation

Add the following to your Cargo.toml:

[dependencies]
csv-schema-validator = "0.2.0"
serde = { version = "1.0", features = ["derive"] }
csv = "1.3"
regex = "1.11"
once_cell = "1.21"

Quick Start

use serde::Deserialize;
use csv::Reader;
use csv_schema_validator::{ValidateCsv, ValidationError};

#[derive(Deserialize, ValidateCsv, Debug)]
struct TestRecord {
    #[validate(range(min = 0.0, max = 100.0))]
    grade: f64,

    #[validate(regex = r"^[A-Z]{3}\d{4}$")]
    code: String,

    #[validate(required, length(min = 10, max = 50), not_blank)]
    name: Option<String>,

    #[validate(custom = "length_validation")]
    comments: String,

    #[validate(required, one_of("short", "medium", "long"))]
    more_comments: Option<String>,

    #[validate(required, not_in("forbidden", "banned"))]
    tag: Option<String>,

    #[validate(range(min = -5, max = 20))]
    temp1: i32,

    #[validate(range(min = 10))]
    temp2: i32,

    #[validate(range(max = 100))]
    temp3: i32,

}

// Custom validator: comments must be at most 50 characters
fn length_validator(s: &str) -> Result<(), String> {
    if s.len() <= 50 {
        Ok(())
    } else {
        Err("Comments too long".into())
    }
}

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut reader = Reader::from_path("data.csv")?;
    for result in reader.deserialize() {
        let rec: Record = result?;
        rec.validate_csv()?;
        println!("Record valid: {:?}", rec);
    }
    Ok(())
}

Usage

Range Validation (since 0.1.0, changed in 0.1.3)

#[validate(range(min = 0.0, max = 100.0))]
grade: f64,

Ensures that grade is between 0.0 and 100.0 (inclusive).

If using version 0.1.3 you can specify just min or just max to check greater-or-equal-to and less-or-equal-to. Literal type must match field type. You can have int or float fields but literals must match field type. Only for numeric fields.

Regex Validation (since 0.1.0)

#[validate(regex = r"^[A-Z]{3}\d{4}$")]
code: String,

Validates the field against a regular expression. Only for String.

Required Validation (since 0.1.0)

#[validate(required)]
name: Option<String>,

Ensures that the Option is not None. If using required the field must be Option<T>.

Custom Validation (since 0.1.0)

#[validate(custom = "path::to::func")]
comments: String,

Calls your custom function fn(&T) -> Result<(), String> for additional checks. Only for String fields.

Length (since 0.1.1)

#[validate(required, length(min = 10, max = 50))]
name: Option<String>,

Only for String fields.

Not Blank (since 0.1.2)

Checks for all spaces or all whitespaces field (Strings):

#[validate(required, length(min = 10, max = 50), not_blank)]
name: Option<String>,

Only for String fields.

One of (since 0.1.2)

Checks if the string has one of the allowed values:

#[validate(required, one_of("short", "medium", "long"))]
more_comments: Option<String>,

Only for String fields.

Not in (since 0.1.2)

Checks if the string has one of the not allowed values:

#[validate(required, not_in("forbidden", "banned"))]
tag: Option<String>,

Only for String fields.

if_then (since 0.2.0)

Defines a cross-column implication rule between two columns. If the conditional column matches a given value, the current column must equal a specific target value.

#[validate(if_then("<conditional_column>", "<conditional_value>", "<expected_value>"))]
  • All arguments must be String literals. The types will be adjusted according to the fields types.
  • If the conditional column (<conditional_column>) is Some(<conditional_value>), then the current field must be equal to <expected_value>.
  • If the condition is not met, the current field is not validated (it can be None or any other value).
  • Both columns must be optional (Option<T> and Option<R>), but their inner types may differ — for example, Option<String> and Option<i32>.
  • Comparison uses equality (==).
#[derive(Deserialize, ValidateCsv, Debug)]
struct Order {
    status: Option<String>,

    // If status == "paid" → payment_state must be "done"
    #[validate(if_then("status", "paid", "done"))]
    payment_state: Option<String>,

    plan: Option<String>,

    // If plan == "P" → seats must be 100
    #[validate(if_then("plan", "P", "100"))]
    seats: Option<u32>,
}

Struct check

The macro validates the type it is annotating, only strucs with named fields are allowed:

use serde::Deserialize;
use csv_schema_validator::ValidateCsv;

#[derive(Deserialize, ValidateCsv)]
struct TupleStruct(f64, String);

#[derive(Deserialize, ValidateCsv)]
enum Status {
    Success { code: f64, message: String },
    Error(f64, String),
    Unknown,
}

fn main() {
    let record = TupleStruct(42.0, "ABC1234".to_string());
    let s = Status::Success { code: 200.0, message: "OK".into() };
    let _ = record.validate_csv();
    let _ = s.validate_csv();
}

Trying to compile this code will result in errors:

cargo run
error: only structs with named fields (e.g., `struct S { a: T }`) are supported
 --> src/main.rs:5:19
  |
5 | struct TupleStruct(f64, String);
  |                   ^^^^^^^^^^^^^

error: only structs are supported
  --> src/main.rs:8:1
   |
8  | / enum Status {
9  | |     Success { code: f64, message: String },
10 | |     Error(f64, String),
11 | |     Unknown,
12 | | }
   | |_^

Complete example

This is an example which reads a csv file:

Cargo.toml:

[package]
name = "use-csv-validator"
version = "0.1.1"
edition = "2021"

[dependencies]
csv = "1.1"
serde = { version = "1.0", features = ["derive"] }
csv-schema-validator = "0.1.3"

src/main.rs:

use std::error::Error;
use csv::ReaderBuilder;
use serde::Deserialize;
use csv_schema_validator::{ValidateCsv, ValidationError};

/// Custom validator: ensure comments string isn't too long
fn length_validation(s: &str) -> Result<(), String> {
    if s.len() <= 20 {
        Ok(())
    } else {
        Err("Comments too long".into())
    }
}

#[derive(Deserialize, ValidateCsv, Debug)]
struct TestRecord {
    #[validate(range(min = 0.0, max = 100.0))]
    grade: f64,

    #[validate(regex = r"^[A-Z]{3}\d{4}$")]
    code: String,

    #[validate(required, length(min = 10, max = 50), not_blank)]
    name: Option<String>,

    #[validate(custom = "length_validation")]
    comments: String,

    #[serde(rename = "more")]
    #[validate(required, one_of("short", "medium", "long"))]
    more_comments: Option<String>,

    #[validate(required, not_in("forbidden", "banned"))]
    tag: Option<String>,

    #[validate(range(min = 1))]
    level: i32,

    #[validate(range(max = 100))]
    top: Option<i32>,
}

fn main() -> Result<(), Box<dyn Error>> {
    // open the CSV file placed alongside Cargo.toml
    let mut reader = ReaderBuilder::new()
        .has_headers(true)
        .from_path("data.csv")?;

    // for each record, deserialize and validate
    for (i, result) in reader.deserialize::<TestRecord>().enumerate() {
        let record = result?;
        match record.validate_csv() {
            Ok(()) => println!("Line {}: Record is valid: {:?}", i + 1, record),
            Err(errors) => {
                eprintln!("Line {}: Validation errors:", i + 1);
                for ValidationError { field, message } in errors {
                    eprintln!("  Field `{}`: {}", field, message);
                }
            }
        }
    }

    Ok(())
}

data.csv:

grade,code,name,comments,more,tag,level,top
85.5,XYZ1234,Alice Smith,All good,short,allowed,2,
90.0,XYZ5678,Bob Marley,Too long comment indeed,medium,allowed,0,
110.0,XYZ4567,      ,ok,short,allowed,5,
95.0,xWF9101,Charlie,code,long,allowed,6,
110.0,XYZ2345,Dave Copperfield,range,short,allowed,-1,80
34.0,XYZ6789,,name,medium,allowed,5,
78.0,XYZ7890,Frank,more,invalid comment,allowed,10,
88.0,XYZ4567,Grace,All good,short,,3,
90.0,XYZ3567,Grace of All Times,All good,medium,forbidden,5,150
3.0,XYZ3456,Eve Max Smith,,short,invalid grade,2,
f34s,XYZ3456,Eve,comments,short,invalid grade,,,

Running this example will generate these messages:

Line 1: Record is valid: TestRecord { grade: 85.5, code: "XYZ1234", name: Some("Alice Smith"), comments: "All good", more_comments: Some("short"), tag: Some("allowed"), level: 2, top: None }
Line 2: Validation errors:
  Field `comments`: Comments too long
  Field `level`: value below min: 1
Line 3: Validation errors:
  Field `grade`: value out of expected range: 0 to 100
  Field `name`: length out of expected range: 10 to 50
  Field `name`: must not be blank or contain only whitespace
Line 4: Validation errors:
  Field `code`: does not match the expected pattern
  Field `name`: length out of expected range: 10 to 50
Line 5: Validation errors:
  Field `grade`: value out of expected range: 0 to 100
  Field `level`: value below min: 1
Line 6: Validation errors:
  Field `name`: mandatory field
Line 7: Validation errors:
  Field `name`: length out of expected range: 10 to 50
  Field `more_comments`: invalid value
Line 8: Validation errors:
  Field `name`: length out of expected range: 10 to 50
  Field `tag`: mandatory field
Line 9: Validation errors:
  Field `tag`: value not allowed
  Field `top`: value above max: 100
Line 10: Record is valid: TestRecord { grade: 3.0, code: "XYZ3456", name: Some("Eve Max Smith"), comments: "", more_comments: Some("short"), tag: Some("invalid grade"), level: 2, top: None }
Error: Error(UnequalLengths { pos: Some(Position { byte: 542, line: 12, record: 11 }), expected_len: 8, len: 9 })

Why Use This Crate?

  • Declarative API: Define validation rules directly in your struct.
  • Zero Runtime Overhead: All checks are generated at compile time.
  • Seamless Serde & CSV Integration: Works directly with serde and csv crates.
  • Clear Error Messages: Each failure reports the field and reason.

Comparison with csv Crate Validations

While the csv crate provides low‑level parsing and some helper methods, this derive‑based approach offers:

  • Field‑Level Declarative Rules: Annotate each struct field with its own validation, rather than writing imperative checks after parsing.
  • Type‑Safety & Integration: Leverages your existing serde::Deserialize types, so you get compile‑time guarantees on types and validations in one place.
  • Custom Validators: Easily plug in custom functions per field without manual looping or error‑handling boilerplate.
  • Centralized Error Collection: Automatically collects all errors into a single Vec<ValidationError>, instead of ad‑hoc early exits.
  • Reusable Across Projects: Define your struct once, reuse validations in different contexts (CLI, web server, batch jobs) with the same guarantees.

By contrast, using the csv crate directly may require manual loops over records and explicit match/if chains for each validation, leading to more boilerplate and potential for missing checks.

Compatibility

  • This crate requires the Rust standard library (it is not compatible with #![no_std] environments).
  • Rust 1.56+
  • serde 1.0
  • csv 1.3
  • regex 1.11

Contributing

Feel free to open issues and submit pull requests. See CONTRIBUTING.md for details.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Links

Commit count: 0

cargo fmt