| Crates.io | superfold |
| lib.rs | superfold |
| version | 0.1.1 |
| created_at | 2025-05-16 05:54:36.557626+00 |
| updated_at | 2025-05-16 06:02:38.039664+00 |
| description | A multilingual Rust library and CLI to process UTF-8 strings to exclude diacritics and fold non-phonetic graphemes into their phonetic ASCII representation. |
| homepage | |
| repository | https://github.com/0xCarbon/superfold |
| max_upload_size | |
| id | 1676124 |
| size | 70,995 |
A multilingual Rust library and CLI tool to process UTF-8 strings to exclude diacritics and fold non-phonetic graphemes into their phonetic ASCII representation (romantization by transliteration). This library preserves original whitespace (spaces, tabs, newlines, etc.), only transforming the actual word content and emoji representations. This means that: Japonic and Sino-Tibetan based languages such as Chinese and Japanese characters are represented as ASCII. Also means that: Emoji are replaced by their name enclosed by ":" as 🍆 becomes ":eggplant:".
Examples:
use superfold::fold;
assert_eq!(fold("北亰"), "BeiJing");
assert_eq!(fold("🦄"), ":unicorn:");
// Whitespace and structure are preserved:
assert_eq!(
fold(" 你好 世界\nNext line with piejlüsse কথাটা 🦄!"),
" NiHao ShiJie\nNext line with piejlusse kotha :unicorn:!"
);
This library is inspired by great work of others such as:
superfold can also be used as a command-line tool to process files and directories.
Installation:
If you have Rust installed, you can build and install the CLI:
cargo install --path . # Run from the root of the superfold project directory
Or, after building with cargo build --release, find the binary at target/release/superfold.
Usage:
superfold [OPTIONS] [INPUTS]...
Options:
-o, --output-dir <OUTPUT_DIR>: Output directory for processed files when multiple inputs or a directory are provided. Defaults to "superfold_output".-f, --overwrite: Overwrite output files or directory if they already exist.-h, --help: Print help information.-V, --version: Print version information.Examples:
Fold a string from stdin:
echo "precisão" | superfold
Output:
precisao
Fold a single file (outputs to filename_folded.ext):
superfold myfile.txt
This will create myfile_folded.txt in the same directory.
Fold specific files into an output directory:
superfold file1.txt path/to/file2.log -o my_folded_texts
This will create my_folded_texts/file1.txt and my_folded_texts/file2.log.
Fold all text files in a directory (recursively) into an output directory:
superfold ./input_documents --output-dir ./folded_documents
This will process text files in ./input_documents and its subdirectories, replicating the structure in ./folded_documents.
Overwrite existing output:
superfold myfile.txt -f
Piping:
superfold supports piping from stdin and to stdout, fitting into standard Unix pipelines:
cat long_text_file.txt | superfold > output.txt
echo "你好 🦄" | superfold | sed 's/:unicorn:/U/' # Example of further processing