| Crates.io | clip-sanitize |
| lib.rs | clip-sanitize |
| version | 0.2.1 |
| created_at | 2026-01-12 14:06:41.077672+00 |
| updated_at | 2026-01-12 14:06:41.077672+00 |
| description | Meta-library for robust text sanitization, repair, and normalization. |
| homepage | |
| repository | https://github.com/5ocworkshop |
| max_upload_size | |
| id | 2037799 |
| size | 53,074 |
The "Universal Adapter" for Text.
clip-sanitize is a robust Rust library designed to clean, repair, and normalize text when moving between disparate systems (e.g., Windows CP1252 to Linux UTF-8). It acts as a hygiene pipeline to prevent "paste-jacking", fix character encoding errors (Mojibake), and standardize line endings.
é -> é) caused by double-encoding (Windows-1252 misinterpreted as UTF-8).“, ”) to straight quotes (") for code compatibility.Add this to your Cargo.toml:
[dependencies]
clip-sanitize = "0.2.1"
use clip_sanitize::{Sanitizer, FlowDirection};
fn main() -> Result<(), Box<dyn std::error::Error>> {
// configured for moving text from Linux to Windows
let sanitizer = Sanitizer::new(FlowDirection::LinuxToWindows);
let input = b"Hello\nWorld";
let (cleaned, report) = sanitizer.process(input)?;
// Output is now CRLF: "Hello\r\nWorld"
assert_eq!(cleaned, &b"Hello\r\nWorld"[..]);
println!("Original Encoding: {}", report.original_encoding);
Ok(())
}
use clip_sanitize::{Sanitizer, FlowDirection, HygieneOptions, LineEnding};
let options = HygieneOptions {
replace_nbsps: true,
fix_smart_quotes: false, // Keep curly quotes
strip_invisibles: true,
};
let sanitizer = Sanitizer::new(FlowDirection::Custom)
.repair(true) // Fix Mojibake
.hygiene(options) // Custom hygiene
.line_ending(LineEnding::Lf); // Force Linux line endings
FlowDirection::LinuxToWindows: Enforces CRLF, enables full hygiene.FlowDirection::WindowsToLinux: Enforces LF, enables full hygiene.FlowDirection::Custom: Uses default settings (repair + hygiene + LF) unless overridden.MIT