detchar

Crates.iodetchar
lib.rsdetchar
version0.1.0
sourcesrc
created_at2022-05-13 14:34:10.666344
updated_at2022-05-13 14:34:10.666344
descriptionCommand line tool for detecting file encodings
homepage
repositoryhttps://github.com/clbarnes/detchar
max_upload_size
id585911
size993,221
Chris Barnes (clbarnes)

documentation

README

detchar

A simple CLI to detect character encodings in files; similar to chardet.

Implemented as a very, very thin wrapper over chardetng.

The example text files in ./data are from this kaggle dataset.

Multithreading

chardetng has a feature which parallelises elimination of possible encodings for each text file. This can be enabled by compiling detchar with the multithreading feature.

However, this is disabled by default, because for large numbers of files it is generally more effective to just parallelise over files, using e.g. GNU parallel:

cat my_file_list.txt | parallel detchar
Commit count: 6

cargo fmt