angr

Crates.io	angr
lib.rs	angr
version	0.1.0
source	src
created_at	2024-06-29 20:32:29.083789
updated_at	2024-06-29 20:32:29.083789
description	A tool to analyse ngrams in text files.
homepage	https://github.com/ash-entwisle/angr
repository	https://github.com/ash-entwisle/angr
max_upload_size
id	1287620
size	10,461,694

Ash (ash-entwisle)

documentation

README

NGram Analysis

This is a tool to analyze the n-grams of a raw text corpus. It can be used to analyze the n-grams of a text file and generate a report of the n-grams found. I use this tool to get data on bigrams to optimise keyboard layouts, when doing it on raw text, I strip all punctuation and special characters from the text file with the following command:

sed 's/[^a-zA-Z ]//g' "text.txt" | tr 'A-Z' 'a-z' | sed -E 's/[[:space:]]+/ /g' >> text-clean.txt

Example data

There is some example data in the ./data directory. You can use this data to test the tool. This is a small excerpt from the Wikipedia Corpus off of corpusdata.org.

Commit count: 13

angr

documentation

README

NGram Analysis

Example data

cargo fmt