| Crates.io | rehuman |
| lib.rs | rehuman |
| version | 0.1.0 |
| created_at | 2025-10-27 12:26:49.802694+00 |
| updated_at | 2025-10-27 12:26:49.802694+00 |
| description | Unicode-safe text cleaning & typographic normalization for Rust |
| homepage | https://github.com/pszemraj/rehuman |
| repository | https://github.com/pszemraj/rehuman |
| max_upload_size | |
| id | 1902832 |
| size | 142,550 |
Unicode-safe text cleaning & normalization for Rust.
Strip invisible characters, normalize typography, and enforce consistent formatting-ideal for text sourced from web scraping, user input, or LLMs.
This crate is a Rust rewrite and expansion of humanize-ai-lib by Nordth.
Untrusted text often contains:
rehuman fixes this in a single pass with predictable, measurable output.
Library crate: add rehuman to your project with cargo add rehuman or edit Cargo.toml:
[dependencies]
rehuman = "0.1.0" # replace with the latest published version
CLI binaries: install the published release (installs both rehuman and ishuman):
cargo install rehuman
For the latest version(s), clone this repo and run cargo install --path .:
git clone https://github.com/pszemraj/rehuman.git
cd rehuman
cargo install --path .
Binaries will be installed to ~/.cargo/bin by default.1
[!WARNING] This is an early release focused on correctness. Performance optimizations are in progress. Use
--streamorStreamCleanerto stream large files.
use rehuman::{clean, humanize};
let cleaned = clean("Hello\u{200B}there"); // -> "Hello there"
let humanized = humanize("“Quote”—and…more"); // -> "\"Quote\"-and...more"
[!IMPORTANT] By default
rehuman::cleanremoves emoji to guarantee ASCII-only output2.
use rehuman::clean;
// Default behavior removes emoji
let cleaned = clean("Thanks 👍"); // -> "Thanks "
To keep emoji, construct a cleaner with CleaningOptions::builder().keyboard_only(false) (or pass --keep-emoji on the CLI).
rehuman reads the input and emits cleaned text to STDOUT-your source file stays untouched unless you pass --inplace:
# Stream-clean to STDOUT and capture stats
rehuman notes.txt --stream --stats > notes.cleaned.txt
# Overwrite the original file in place
rehuman notes.txt --inplace
[!TIP] Both CLI tools act as filters, so you can drop them into pipelines
cat notes.txt | rehuman --stream | tee notes.cleaned.txt
curl https://example.com/raw.txt | rehuman --stream --stats-json >/tmp/clean.txt
Use ishuman when you only need detection:
# Exit status 0 when clean, 1 when changes would be made (no stdout by default)
ishuman notes.txt
# Add --stats or --json to explain what would change
ishuman notes.txt --stats
Run rehuman --help or ishuman --help for the full list of flags (emoji policy, line endings, configs, streaming, etc.).
More details are available in the docs/ folder:
rehuman and ishumanunorm feature, enabled by default)rehuman (cleaner) and ishuman (detector) with streaming & in-place modesMIT