# Tagalyzer This is a CLI tool that counts words in files, then prints the counts in an easily human-readable format. I made it to help me analyze my own writing to help me pick tags for blog posts. This tool will eventually be a word relative frequency analyzer. The eventual intended goal is to point it at a directory or list of files, and it will analyze statistical values for a sum total of all words in all files, as well as breaking out the how word frequency varies by file. ## Install ### CLI If you want to analyze writing samples yourself, install the command line tool: ```bash cargo install tagalyzer ``` After that, try running `tagalyzer --help` to see the usage and checking out the examples below. ### Library If you want to use the library to do text analysis in your own project, use Cargo to add Tagalyzer as a dependency: ```bash cargo add tagalyzer ``` ## Examples ```bash $ tagalyzer LICENSE-* # Glob matching, case-insensitive text processing Sorted wordcount for LICENSE-MIT software : 10 without : 4 including : 4 --- [snip] --- Sorted wordcount for LICENSE-APACHE work : 33 any : 30 license : 26 --- [snip] --- ``` ```bash $ tagalyzer LICENSE-MIT -c # Case sensitive when counting, not when filtering Sorted wordcount for LICENSE-MIT Software : 6 SOFTWARE : 4 ANY : 3 this : 3 including : 3 OTHER : 3 --- [snip] --- ``` ```bash $ tagalyzer LICENSE-MIT -ci # Case sensitive, filters "or" but not "OR" Sorted wordcount for LICENSE-MIT OR : 8 THE : 7 Software : 6 OF : 5 IN : 5 SOFTWARE : 4 --- [snip] --- ``` ## Long-Term Plans I plan on developing this tool into both a CLI binary and a parallel library to provide an out-of-the-box solution and high customization respectively. It will fit into my workflow by providing frequency of words and phrases (e.g. strings of up to n words or characters) of the directory where I keep all my blog posts, which I can use to help me decide on a set of applicable tags. ## License This work is licensed under either the MIT or Apache 2.0 license at the choice of the user. Contributions are assumed to be licensed under MIT unless otherwise stated. The Rust language and various libraries are used in this project under the MIT license. ## Contributing Contributions are always welcome! The project is hosted on [GitLab](https://gitlab.com/garver-the-system/tagalyzer). Bug reports, commits, or even just suggestions are appreciated. If you do want to contribute code, I'm more familiar with merging branches than forks. I have gating tests and lints in CI, which should be equivalent to the code block below. If the code or results ever differ between this block running locally and what happens in CI, please open an issue. ```bash cargo fmt && cargo test && cargo clippy --no-deps -- \ -Dclippy::pedantic \ -Dclippy::nursery \ -Dclippy::style \ -Dclippy::unwrap_used \ -Dclippy::expect_used \ -Dclippy::missing_docs_in_private_items \ -Dclippy::single_char_lifetime_names \ -Dclippy::use_self \ -Dclippy::str_to_string \ -Ddead_code \ -Aclippy::needless_return \ -Aclippy::tabs_in_doc_comments \ -Aclippy::needless_raw_string_hashes \ -Dwarnings ```