tktax-io

Crates.iotktax-io
lib.rstktax-io
version0.2.2
created_at2025-02-01 02:28:34.233907+00
updated_at2025-02-01 02:28:34.233907+00
descriptionA library providing text preprocessing, tokenization, and formatted header printing utilities for the TKTAX project.
homepage
repositoryhttps://github.com/klebs6/tktax
max_upload_size
id1538019
size78,256
(klebs6)

documentation

README

tktax-io

tktax-io is a Rust library that supplies text preprocessing utilities, tokenization and stemming routines, as well as configurable formatted header-printing functions. It is designed for integration within the TKTAX project but can also be adopted for general lexical cleansing or linguistic normalization workflows.

Features

  • Punctuation Filtering: Uses regex to remove extraneous punctuation and special symbols.
  • Case Normalization: Converts strings to lowercase for uniform comparisons.
  • Tokenization & Stemming: Splits text using Unicode word boundaries and applies Snowball-based stemming to reduce words to canonical roots.
  • Formatted Header Printing: Generates structured output lines with user-configurable width and character styles.

Example Usage

Below is a minimal example showing how to use the main functions in this crate:

use tktax_io::{preprocess, tokenize_and_stem, print_header, print_thick_header};

fn main() {
    // Input text to preprocess
    let transaction_description = "7-ELEVEN!!!";

    // Remove punctuation and transform to lowercase
    let clean_text = preprocess(transaction_description);
    println!("Preprocessed: {}", clean_text);

    // Tokenize and stem the cleaned text
    let tokens = tokenize_and_stem(&clean_text);
    println!("Tokens: {:?}", tokens);

    // Print a couple of headers
    print_header("Light Header");
    print_thick_header("Heavy Header");
}

Run the tests with:

cargo test

Contributing

  1. Fork the repository and create a feature branch.
  2. Make changes, then open a pull request to the main repository.
  3. Provide a clear and detailed description of all modifications.

License

This project is licensed under either of:

  • Apache License, Version 2.0
  • MIT License

at your option.

Commit count: 0

cargo fmt