unbom

Crates.iounbom
lib.rsunbom
version
sourcesrc
created_at2025-02-06 00:39:58.320578+00
updated_at2025-03-03 20:03:14.143187+00
descriptionRemove UTF-8 BOM from files
homepagehttps://github.com/ssg/unbom-rs
repository
max_upload_size
id1544896
Cargo.toml error:TOML parse error at line 18, column 1 | 18 | autolib = false | ^^^^^^^ unknown field `autolib`, expected one of `name`, `version`, `edition`, `authors`, `description`, `readme`, `license`, `repository`, `homepage`, `documentation`, `build`, `resolver`, `links`, `default-run`, `default_dash_run`, `rust-version`, `rust_dash_version`, `rust_version`, `license-file`, `license_dash_file`, `license_file`, `licenseFile`, `license_capital_file`, `forced-target`, `forced_dash_target`, `autobins`, `autotests`, `autoexamples`, `autobenches`, `publish`, `metadata`, `keywords`, `categories`, `exclude`, `include`
size0
Sedat Kapanoğlu (ssg)

documentation

README

unbom

about

This is the Rust port of my unbom tool in C# which basically removes UTF-8 BOM markers from files safely. UTF-8 BOM markers aren't useful, and can even cause problems with some tools that are not designed to handle them. Only use if you have problems with your UTF-8 files of course.

to do

  • implement --recurse in a cross-platform compatible way

usage

Remove UTF-8 markers from all "txt" files in the current directory, and save the original in ".txt.bak" files:

unbom *.txt

Perform the same, but do not create a backup:

unbom -n *.txt

Remove UTF-8

challenges

I'm porting these as Rust exercises. It brings interesting challenges. Porting this tool made me tackle these issues:

  • Parsing command-line arguments in a cross-platform tool requires you to be aware of Unix wildcard expansion. I knew that the distinction existed, but I didn't have to think about this before, so my design process for CLI tools on Windows were pretty straightforward. There's no way to receive wildcards from command-line arguments on Unix without a specialized syntax or user specifically surrounding the argument with double quotes. Actually, now I understand why Unix find tool is designed the way it is.

  • Temporary file handling is subject to a few security issues. I haven't thought about it much on Windows as I've regarded these tools mostly for personal use. But, Rust's API made me consider on ways of making it more secure especially regarding atomic temporary file creation and permissions handling.

  • Writing CLI code for Rust can be very verbose if you want to be explicit about error handling (and I think you should). Almost every line of code needs handling the error case, reporting that to user and bailing out with a failure exit code. I experimented with that before, but I think I found a better balance in this codebase by using Result<> return type on main() and .inspect_err() to catch and report errors to the users. Combined with ? operator, it becomes both expressive and lean.

license

MIT License

Commit count: 0

cargo fmt