# normalized-hasher [![badge github]][url github] [![badge crates.io]][url crates.io] [![badge license]][url license] [badge github]: https://img.shields.io/badge/github-FloGa%2Fnormalized--hasher-green [badge crates.io]: https://img.shields.io/crates/v/normalized-hasher [badge license]: https://img.shields.io/crates/l/normalized-hasher [url github]: https://github.com/FloGa/normalized-hasher [url crates.io]: https://crates.io/crates/normalized-hasher [url license]: https://github.com/FloGa/normalized-hasher/blob/develop/LICENSE Create cross-platform hashes of text files. *This is the binary crate. If you're looking for the library crate instead, go to [`normalized-hash`].* [`normalized-hash`]: https://github.com/FloGa/normalized-hasher/crates/normalized-hash ## Motivation Hashes or checksums are a great means for validating the contents of files. You record the hash of a file, distribute the file and the hash code, and everyone can run the hasher again to verify that the file has not changed since you created the hash the first time. Each small change will also change the hash code. Even if it is a change you cannot even see. In my job, we unfortunately had this situation a couple of times. The workflow is as follows: We create code and generate a hash from this code. Both are inserted into a specification document. Then we copy and paste the code to a customer's system and run the hasher again to verify that the code is still the same as in the specification. But from time to time, we got different hashes. After some search for the reason, we stumbled across this one coworker who did not save their files with UNIX line endings (a single LF) like the rest of us, but with Windows line endings (CR followed by LF). Just by looking at the files, they seemed identical, but after enabling control characters, we could clearly see the differences in the end of every line. By copying the code to the customer system, the line endings get automatically converted into UNIX style, hence the hash would be different from what we generate on our systems. This is an embarrassing situation, because this involves huge paper work to request a change in the already finalized specification document. To come over this problem, I created this program. A file hasher that would convert file endings to UNIX style on the fly when generating the hash. So, no matter how the file was created, the hash would be the same. ## Installation `normalized-hasher` can be installed easily through Cargo via `crates.io`: ```shell cargo install --locked normalized-hasher ``` Please note that the `--locked` flag is necessary here to have the exact same dependencies as when the application was tagged and tested. Without it, you might get more up-to-date versions of dependencies, but you have the risk of undefined and unexpected behavior if the dependencies changed some functionalities. The application might even fail to build if the public API of a dependency changed too much. Alternatively, pre-built binaries can be downloaded from the [GitHub releases][gh-releases] page. [gh-releases]: https://github.com/FloGa/normalized-hasher/releases ## Usage ```text Usage: normalized-hasher [OPTIONS] [FILE_OUT] Arguments: File to be hashed [FILE_OUT] Optional file path to write normalized input into Options: --eol End-of-line sequence, will be appended to each normalized line for hashing [default: "\n"] --ignore-whitespaces Ignore all whitespaces This will remove all whitespaces from the input file when generating the hash. --no-eof Skip last end-of-line on end-of-file With this flag, no trailing EOL will be appended at the end of the file. -h, --help Print help (see a summary with '-h') -V, --version Print version ``` ### Flags - `--eol` With the `--eol` flag you can change the end-of-line sequence that will be appended to each normalized line to generate the hash. This can be useful if you explicitly want CRLF endings, for example. Please note that you need to escape control characters properly in your shell. For Bash, you can type: ```shell normalized-hasher --eol $'\r\n' input.txt output.txt ``` - `--ignore-whitespaces` In some extreme cases, you might want to ignore all whitespaces in a file. With the `--ignore-whitespaces` flag, all whitespaces are removed prior to generate the hash. - `--no-eof` With the `--no-eof` flag you can avoid appending the EOL sequence at the end of the file. This is for use cases where such trailing EOL is not desireable, like in Windows files. In contrast to UNIX files which usually end with a final LF, Windows files do not usually end with an additional CRLF. ## Examples Simple example with default options, without writing an output file: ```shell normalized-hasher input.txt ``` More complex example, with writing output: ```shell normalized-hasher --eol $'\r\n' --no-eof input.txt output.txt ```