textalyzer

Crates.iotextalyzer
lib.rstextalyzer
version0.5.0
created_at2019-02-18 18:56:11.306602+00
updated_at2025-07-07 19:00:47.537331+00
description Analyze key metrics like number of words, readability, and complexity of any kind of text
homepage
repositoryhttps://github.com/ad-si/textalyzer
max_upload_size
id115611
size72,558
Adrian Sieber (ad-si)

documentation

README

Textalyzer

Analyze key metrics like number of words, readability, complexity, etc. of any kind of text.

Usage

# Word frequency histogram
textalyzer histogram <filepath>

# Find duplicated code blocks (default: minimum 3 non-empty lines)
textalyzer duplication <path> [<additional paths...>]

# Find duplications with at least 5 non-empty lines
textalyzer duplication --min-lines=5 <path> [<additional paths...>]

# Include single-line duplications
textalyzer duplication --min-lines=1 <path> [<additional paths...>]

# Output duplications as JSON
textalyzer duplication --json <path> [<additional paths...>]

Example JSON output:

[{
  "content": "<duplicated text block>",
  "locations": [
    { "path": "file1.txt", "line": 12 },
    { "path": "file2.txt", "line": 34 }
  ]
}, {
  "content": "<another duplicated block>",
  "locations": [
    { "path": "file1.txt", "line": 56 },
    { "path": "file3.txt", "line": 78 }
  ]
}]

The duplication command analyzes files for duplicated text blocks. It can:

  • Analyze multiple files or recursively scan directories
  • Filter duplications based on minimum number of non-empty lines with --min-lines=N (default: 2)
  • Detect single-line duplications when using --min-lines=1
  • Rank duplications by number of consecutive lines
  • Show all occurrences with file and line references
  • Utilize multithreaded processing for optimal performance on all available CPU cores
  • Use memory mapping for efficient processing of large files with minimal memory overhead
  • Output duplication data as JSON with --json
Commit count: 68

cargo fmt