Crates.io | cleanse |
lib.rs | cleanse |
version | 0.6.0 |
source | src |
created_at | 2021-08-18 21:35:59.135918 |
updated_at | 2021-08-19 02:55:12.600111 |
description | Small utility to clean up delimited (TSV/CSV) data. |
homepage | |
repository | https://github.com/sstadick/cleanse |
max_upload_size | |
id | 439244 |
size | 27,443 |
A small utility to clean up delimited data to make it consumable by standard unix tools.
Clean tsv data. Clean csv data.
Under the hood this uses the csv
crate to parse data as a CSV, respecting quoting and escaping rules. For each field
cleanse
will then try to do the following three things:
delimiter
character with
.\n
character with
.If any changes were made to a field a log entry is made with the record number, field number and changes.
$ cat data.tsv | cleanse -o cleansed.tsv -
Aug 18 15:28:02.556 INFO cleanse: Record number 23485, field number 35: [TerminatorReplacement]
Aug 18 15:28:02.724 INFO cleanse: Record number 31036, field number 24: [DelimiterReplacement]
Aug 18 15:28:02.984 INFO cleanse: Record number 44053, field number 35: [TerminatorReplacement]
Aug 18 15:28:03.456 INFO cleanse: Record number 66273, field number 35: [TerminatorReplacement]
Aug 18 15:28:05.149 INFO cleanse: Record number 150669, field number 14: [FixedEncoding]