| Crates.io | sanitise |
| lib.rs | sanitise |
| version | 0.4.0 |
| created_at | 2023-05-16 15:50:19.426051+00 |
| updated_at | 2023-05-21 19:08:27.451402+00 |
| description | Headache-free data clean-up |
| homepage | |
| repository | https://github.com/Spartan2909/sanitise |
| max_upload_size | |
| id | 866183 |
| size | 94,666 |
A library for headache-free data clean-up and validation.
sanitise is a CSV processing and validation library that generates code at compile time based on a YAML configuration file. The generated code is robust and will not panic.
no_std environments are supported, but the alloc crate is required.
Add sanitise to your dependencies in your Cargo.toml:
[dependencies]
sanitise = "0.1"
Import the macro:
use sanitise::sanitise_string;
And call:
// main.rs
use std::{fs, iter::zip};
use sanitise::sanitise_string;
fn main() {
let csv = fs::read_to_string("data.csv").unwrap();
let ((time_millis, pulse, movement), (time_secs,)) = sanitise_string!(include_str!("sanitise_config.yaml"), &csv).unwrap();
println!("time_millis,time_secs,pulse,movement");
for (((time_millis, pulse), movement), time_secs) in zip(zip(zip(time_millis, pulse), movement), time_secs) {
println!("{time_millis},{time_secs},{pulse},{movement}")
}
}
# sanitise_config.yaml
processes:
- name: validate
columns:
- title: time
type: integer
- title: pulse
type: integer
max: 100
min: 40
on-invalid: average
valid-streak: 3
- title: movement
type: integer
valid-values: [0, 1]
output-type: boolean
output: "value == 1"
- name: process
columns:
- title: time
type: integer
output: "value / 1000"
- title: pulse
type: integer
ignore: true
- title: movement
type: integer
ignore: true
# data.csv
time,pulse,movement
0,67,0
15,45,1
126,132,1
The first argument to sanitise_string! must be either a string literal or a macro call that expands to a string literal. The second argument must be an expression that resolves to a &str in CSV format. In the above example, sanitise_config.yaml must be next to main.rs, and data.csv must be in the working directory at runtime.
The other macro, sanitise!, is used when your data has already been parsed into the correct shape. See the documentation for more details.
For details on the configuration file, see the specification.
benchmark: Print the time taken to complete various stages of the process. Disables no_std support. You probably don't want this.The macro creates linear finite automata to process each column. If on-invalid is set to average for a given column, that column's automaton will use a state machine to keep track of valid and invalid values. If a column is ignored, no automaton will be generated for it. All data is stored in native Rust types.
Licensed under either of
at your option.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.