| Crates.io | tktax-io |
| lib.rs | tktax-io |
| version | 0.2.2 |
| created_at | 2025-02-01 02:28:34.233907+00 |
| updated_at | 2025-02-01 02:28:34.233907+00 |
| description | A library providing text preprocessing, tokenization, and formatted header printing utilities for the TKTAX project. |
| homepage | |
| repository | https://github.com/klebs6/tktax |
| max_upload_size | |
| id | 1538019 |
| size | 78,256 |
tktax-io is a Rust library that supplies text preprocessing utilities, tokenization and stemming routines, as well as configurable formatted header-printing functions. It is designed for integration within the TKTAX project but can also be adopted for general lexical cleansing or linguistic normalization workflows.
regex to remove extraneous punctuation and special symbols.Below is a minimal example showing how to use the main functions in this crate:
use tktax_io::{preprocess, tokenize_and_stem, print_header, print_thick_header};
fn main() {
// Input text to preprocess
let transaction_description = "7-ELEVEN!!!";
// Remove punctuation and transform to lowercase
let clean_text = preprocess(transaction_description);
println!("Preprocessed: {}", clean_text);
// Tokenize and stem the cleaned text
let tokens = tokenize_and_stem(&clean_text);
println!("Tokens: {:?}", tokens);
// Print a couple of headers
print_header("Light Header");
print_thick_header("Heavy Header");
}
Run the tests with:
cargo test
This project is licensed under either of:
at your option.