Crates.io | detchar |
lib.rs | detchar |
version | 0.1.0 |
source | src |
created_at | 2022-05-13 14:34:10.666344 |
updated_at | 2022-05-13 14:34:10.666344 |
description | Command line tool for detecting file encodings |
homepage | |
repository | https://github.com/clbarnes/detchar |
max_upload_size | |
id | 585911 |
size | 993,221 |
A simple CLI to detect character encodings in files; similar to chardet
.
Implemented as a very, very thin wrapper over chardetng.
The example text files in ./data
are from this kaggle dataset.
chardetng
has a feature which parallelises elimination of possible encodings for each text file.
This can be enabled by compiling detchar
with the multithreading
feature.
However, this is disabled by default, because for large numbers of files it is generally more effective to just parallelise over files, using e.g. GNU parallel
:
cat my_file_list.txt | parallel detchar