| Crates.io | rclean |
| lib.rs | rclean |
| version | 0.1.1 |
| created_at | 2025-07-16 21:21:36.828371+00 |
| updated_at | 2025-07-16 21:21:36.828371+00 |
| description | A high-performance Rust-based disk cleanup tool that finds duplicates and storage outliers |
| homepage | https://github.com/paiml/rclean |
| repository | https://github.com/paiml/rclean |
| max_upload_size | |
| id | 1756639 |
| size | 257,245 |
Learn end-to-end ML engineering from industry veterans at PAIML.COM
A high-performance Rust-based disk cleanup tool that finds duplicate files and storage outliers.

# From source
git clone https://github.com/paiml/rclean.git
cd rclean
cargo install --path .
# Or directly from GitHub
cargo install --git https://github.com/paiml/rclean.git
# Scan current directory for duplicates
rclean
# Scan specific directory
rclean /path/to/directory
# Filter by pattern
rclean ~/Documents --pattern "*.pdf" --pattern-type glob
# Generate CSV report
rclean . --csv duplicate_report.csv
# Find similar files (fuzzy matching) with 70% similarity threshold
rclean ~/Documents --similarity 70
Find files that are consuming disproportionate disk space:
# Find large file outliers
rclean outliers /path --min-size 100MB
# Find hidden space consumers (node_modules, .git, etc.)
rclean outliers ~ --check-hidden --format json
# Find file patterns (backups, logs, etc.)
rclean outliers . --check-patterns
# Export outliers report
rclean outliers . --csv outliers_report.csv
# Combine all features
rclean outliers ~ --min-size 50MB --check-hidden --check-patterns --top 50
# Enable clustering to find groups of similar large files
rclean outliers /path --cluster --cluster-similarity 80 --min-cluster-size 3
Outliers Detection Features:
Find files that are similar but not identical:
# Find files with 70% or higher similarity
rclean ~/Documents --similarity 70
# Find similar Python files
rclean ~/code --pattern "*.py" --pattern-type glob --similarity 80
# Generate CSV report including similar files
rclean . --similarity 60 --csv similarity_report.csv
Use Cases:
RClean supports ripgrep-style pattern matching:
Literal (default): Simple string contains matching
rclean search --path . --pattern ".txt"
Glob: Shell-style patterns
rclean search --path . --pattern "*.txt" --pattern-type glob
rclean search --path . --pattern "**/*.rs" --pattern-type glob
Regex: Full regular expression support
rclean search --path . --pattern "test_.*\.rs$" --pattern-type regex
--hidden: Include hidden files--no-ignore: Ignore .gitignore rules--max-depth <N>: Maximum directory depth to traverseRClean can run as an MCP server for integration with AI assistants:
# Run as MCP server
rclean # Will auto-detect MCP mode when piped
All lint checks now pass! The project follows PMAT (Production Manufacturing and Assembly Technology) quality standards with zero tolerance for warnings.
# Build and test
make all
# Development commands
make format # Format code
make lint # Run clippy linting (FIXED - passes cleanly!)
make lint-extreme # Run extreme linting with PMAT standards
make test # Run all tests
make test-examples # Run example tests (NEW!)
# Build variants
make build-release # Release build for production
# Quality assurance
make quality-gate # Run all quality checks
make format-check # Verify formatting
make lint now passes without errorsmake test-examples target~/.cargo/config[target.x86_64-apple-darwin]
rustflags = [
"-C", "link-arg=-undefined",
"-C", "link-arg=dynamic_lookup",
]
[target.aarch64-apple-darwin]
rustflags = [
"-C", "link-arg=-undefined",
"-C", "link-arg=dynamic_lookup",
]
make all in rclean directoryMIT