textalyzer

Crates.io	textalyzer
lib.rs	textalyzer
version	0.5.0
created_at	2019-02-18 18:56:11.306602+00
updated_at	2025-07-07 19:00:47.537331+00
description	Analyze key metrics like number of words, readability, and complexity of any kind of text
homepage
repository	https://github.com/ad-si/textalyzer
max_upload_size
id	115611
size	72,558

Adrian Sieber (ad-si)

documentation

README

Textalyzer

Analyze key metrics like number of words, readability, complexity, etc. of any kind of text.

Usage

# Word frequency histogram
textalyzer histogram <filepath>

# Find duplicated code blocks (default: minimum 3 non-empty lines)
textalyzer duplication <path> [<additional paths...>]

# Find duplications with at least 5 non-empty lines
textalyzer duplication --min-lines=5 <path> [<additional paths...>]

# Include single-line duplications
textalyzer duplication --min-lines=1 <path> [<additional paths...>]

# Output duplications as JSON
textalyzer duplication --json <path> [<additional paths...>]

Example JSON output:

[{
  "content": "<duplicated text block>",
  "locations": [
    { "path": "file1.txt", "line": 12 },
    { "path": "file2.txt", "line": 34 }
  ]
}, {
  "content": "<another duplicated block>",
  "locations": [
    { "path": "file1.txt", "line": 56 },
    { "path": "file3.txt", "line": 78 }
  ]
}]

The duplication command analyzes files for duplicated text blocks. It can:

Analyze multiple files or recursively scan directories
Filter duplications based on minimum number of non-empty lines with --min-lines=N (default: 2)
Detect single-line duplications when using --min-lines=1
Rank duplications by number of consecutive lines
Show all occurrences with file and line references
Utilize multithreaded processing for optimal performance on all available CPU cores
Use memory mapping for efficient processing of large files with minimal memory overhead
Output duplication data as JSON with --json

Commit count: 68

textalyzer

documentation

README

Textalyzer

Usage

cargo fmt