cleanse

Crates.iocleanse
lib.rscleanse
version0.6.0
sourcesrc
created_at2021-08-18 21:35:59.135918
updated_at2021-08-19 02:55:12.600111
descriptionSmall utility to clean up delimited (TSV/CSV) data.
homepage
repositoryhttps://github.com/sstadick/cleanse
max_upload_size
id439244
size27,443
Seth (sstadick)

documentation

https://docs.rs/cleanse

README

🔥 cleanse

A small utility to clean up delimited data to make it consumable by standard unix tools.

Search words

Clean tsv data. Clean csv data.

Overview

Under the hood this uses the csv crate to parse data as a CSV, respecting quoting and escaping rules. For each field cleanse will then try to do the following three things:

  1. Inside a field, replace any instances of the delimiter character with .
  2. Inside a field, replace any instances of the terminator \n character with .
  3. Inside a field, replace any malformed UTF8 with the utf8 replacment character.

If any changes were made to a field a log entry is made with the record number, field number and changes.

Example

$ cat data.tsv | cleanse -o cleansed.tsv -
Aug 18 15:28:02.556  INFO cleanse: Record number 23485, field number 35: [TerminatorReplacement]
Aug 18 15:28:02.724  INFO cleanse: Record number 31036, field number 24: [DelimiterReplacement]
Aug 18 15:28:02.984  INFO cleanse: Record number 44053, field number 35: [TerminatorReplacement]
Aug 18 15:28:03.456  INFO cleanse: Record number 66273, field number 35: [TerminatorReplacement]
Aug 18 15:28:05.149  INFO cleanse: Record number 150669, field number 14: [FixedEncoding]

Commit count: 16

cargo fmt