| Crates.io | polydup-core |
| lib.rs | polydup-core |
| version | 0.9.3 |
| created_at | 2025-12-22 15:02:59.189389+00 |
| updated_at | 2026-01-25 07:25:34.414514+00 |
| description | Cross-language duplicate code detection library using Tree-sitter and Rabin-Karp |
| homepage | https://github.com/wiesnerbernard/polydup |
| repository | https://github.com/wiesnerbernard/polydup |
| max_upload_size | |
| id | 1999848 |
| size | 275,083 |
Cross-language duplicate code detector powered by Tree-sitter and Rust.
PolyDup includes a hash cache system that dramatically accelerates duplicate detection in CI/CD workflows:
| Mode | Small (1K LOC) | Medium (10K LOC) | Large (100K LOC) |
|---|---|---|---|
| Full scan | ~50ms | ~500ms | ~5s |
| Git-diff (no cache) | ~30ms | ~300ms | ~3s |
| Git-diff (with cache) | ~15ms | ~30ms | ~50ms |
Key benefits:
Quick start:
# Build cache (one-time, ~0.5s for typical codebases)
polydup cache build
# Fast incremental scans using cache
polydup scan . --git-diff origin/main..HEAD
See docs/caching.md for detailed performance characteristics and CI integration patterns.
Shared Core Architecture: All duplicate detection logic lives in Rust, exposed via FFI bindings.
┌─────────────────────────────────────────────┐
│ polydup-core (Rust) │
│ • Tree-sitter parsing │
│ • Rabin-Karp hashing │
│ • Parallel file scanning │
│ • Duplicate detection │
└─────────────────────────────────────────────┘
▲ ▲ ▲
│ │ │
┌─────┴───┐ ┌───┴────┐ ┌─┴─────┐
│ CLI │ │ Node.js│ │ Python│
│ (Rust) │ │(napi-rs)│ │(PyO3) │
└─────────┘ └────────┘ └───────┘
Crates:
cargo install polydup)npm install polydup)pip install polydup)Important: PolyDup is available in multiple forms for different use cases:
- CLI Tool:
cargo install polydup- Command-line scanning- Python Library:
pip install polydup- Python API bindings (NOT a CLI)- Node.js Library:
npm install polydup- Node.js API bindings (NOT a CLI)If you want to run
polydupfrom the command line, usecargo install polydup.
The fastest way to add duplicate detection to your workflow:
name: Code Quality
on:
pull_request:
branches: [ main ]
permissions:
contents: read
pull-requests: write # Required for PR comments
jobs:
duplicate-detection:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0 # Required for git-diff mode
- uses: wiesnerbernard/polydup-action@v0.3.0
with:
threshold: 50
similarity: '0.85'
fail-on-duplicates: true
Benefits:
| Input | Default | Description |
|---|---|---|
threshold |
50 |
Minimum code block size in tokens |
similarity |
0.85 |
Similarity threshold (0.0-1.0) |
fail-on-duplicates |
true |
Fail the check if duplicates found |
format |
text |
Output format: text or json |
base-ref |
auto | Base git reference (auto-detects from PR) |
github-token |
- | Token for PR comments |
comment-on-pr |
true |
Post results as PR comment |
| Output | Description |
|---|---|
duplicates-found |
Number of duplicate code blocks found |
files-scanned |
Number of files scanned |
exit-code |
Exit code (0 = no duplicates, 1 = duplicates) |
- uses: wiesnerbernard/polydup-action@v0.2.1
id: polydup
with:
fail-on-duplicates: false
github-token: ${{ secrets.GITHUB_TOKEN }}
- name: Check results
run: |
echo "Files scanned: ${{ steps.polydup.outputs.files-scanned }}"
echo "Duplicates found: ${{ steps.polydup.outputs.duplicates-found }}"
if [ "${{ steps.polydup.outputs.duplicates-found }}" -gt 10 ]; then
echo "Too many duplicates!"
exit 1
fi
When duplicates are found, the action posts a comment like:
## PolyDup Duplicate Code Report
**Found 3 duplicate code block(s)**
- Files scanned: 12
- Threshold: 50 tokens
- Similarity: 0.85
<details>
<summary>View Details</summary>
[Detailed scan output...]
</details>
**Tip**: Consider refactoring duplicated code to improve maintainability.
See polydup-action for full documentation.
For production CI/CD, install the CLI directly in your workflow:
name: Code Quality
on:
pull_request:
branches: [ main ]
jobs:
duplicate-detection:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0 # For git-diff mode
- name: Install Rust
uses: dtolnay/rust-toolchain@stable
- name: Cache polydup binary
uses: actions/cache@v4
with:
path: ~/.cargo/bin/polydup
key: ${{ runner.os }}-polydup-v0.8.1
- name: Install polydup
run: cargo install polydup --locked
- name: Scan for duplicates
run: |
polydup scan . \
--git-diff origin/${{ github.base_ref }}..HEAD \
--threshold 50 \
--similarity 0.85 \
--format text
Benefits:
The fastest way to use PolyDup locally is via the CLI tool:
# Install from crates.io
cargo install polydup
# Verify installation
polydup --version
# Scan for duplicates
polydup scan ./src
System Requirements:
Note: Homebrew tap coming soon! (
brew install polydup)
Pre-built Binaries:
Download pre-compiled binaries from GitHub Releases:
# macOS (Apple Silicon)
curl -L https://github.com/wiesnerbernard/polydup/releases/latest/download/polydup-macos-aarch64 -o polydup
chmod +x polydup
sudo mv polydup /usr/local/bin/
# macOS (Intel)
curl -L https://github.com/wiesnerbernard/polydup/releases/latest/download/polydup-macos-x86_64 -o polydup
chmod +x polydup
sudo mv polydup /usr/local/bin/
# Linux (x86_64)
curl -L https://github.com/wiesnerbernard/polydup/releases/latest/download/polydup-linux-x86_64 -o polydup
chmod +x polydup
sudo mv polydup /usr/local/bin/
# Linux (x86_64 static - musl)
curl -L https://github.com/wiesnerbernard/polydup/releases/latest/download/polydup-linux-x86_64-musl -o polydup
chmod +x polydup
sudo mv polydup /usr/local/bin/
# Windows (x86_64)
# Download polydup-windows-x86_64.exe from releases page and add to PATH
Note: This is a library package for integrating duplicate detection into Node.js applications. It does NOT provide a CLI. For command-line usage, use
cargo install polydup.
Install as a project dependency:
npm install polydup
Requirements: Node.js 16+ on macOS (Intel/ARM), Windows (x64), or Linux (x64)
Usage:
const { findDuplicates } = require('polydup');
const duplicates = findDuplicates(
['src/', 'tests/'], // Paths to scan
10, // Minimum block size (lines)
0.85 // Similarity threshold (0.0-1.0)
);
console.log(`Found ${duplicates.length} duplicates`);
duplicates.forEach(dup => {
console.log(`${dup.file1}:${dup.start_line1} ↔ ${dup.file2}:${dup.start_line2}`);
console.log(`Similarity: ${(dup.similarity * 100).toFixed(1)}%`);
});
Note: This is a library package for integrating duplicate detection into Python applications. It does NOT provide a CLI. For command-line usage, use
cargo install polydup.Running
python -m polydupwill display installation guidance.
Install from PyPI:
# Using pip
pip install polydup
# Using uv (recommended for faster installs)
uv pip install polydup
Requirements: Python 3.8-3.12 on macOS (Intel/ARM), Windows (x64), or Linux (x64)
Usage:
import polydup
# Scan for duplicates
duplicates = polydup.find_duplicates(
paths=['src/', 'tests/'],
min_block_size=10,
similarity_threshold=0.85
)
print(f"Found {len(duplicates)} duplicates")
for dup in duplicates:
print(f"{dup['file1']}:{dup['start_line1']} ↔ {dup['file2']}:{dup['start_line2']}")
print(f"Similarity: {dup['similarity']*100:.1f}%")
Use the core library in your Rust project:
[dependencies]
polydup-core = "0.1"
use polydup_core::{Scanner, find_duplicates};
use std::path::PathBuf;
fn main() -> anyhow::Result<()> {
let scanner = Scanner::with_config(10, 0.85)?;
let report = scanner.scan(vec![PathBuf::from("src")])?;
println!("Found {} duplicates", report.duplicates.len());
Ok(())
}
cargo build --release -p polydup
./target/release/polydup scan ./src
cd crates/polydup-node
npm install
npm run build
cd crates/polydup-py
maturin develop
python -c "import polydup; print(polydup.version())"
polydup initThe fastest way to get started is with the interactive initialization wizard:
# Run the initialization wizard
polydup init
# Non-interactive mode (use defaults)
polydup init --yes
# Force overwrite existing configuration
polydup init --force
# Only generate CI/CD configuration (skip .polyduprc.toml)
polydup init --ci-only
The wizard will:
.polyduprc.toml with environment-specific defaultsExample workflow:
$ polydup init
PolyDup Initialization Wizard
=============================
Detected environments:
- Node.js
- Python
✔ Select similarity threshold: Standard (0.85)
✔ Select minimum block size: Medium (50 lines)
✔ Add custom exclude patterns? · No
✔ Would you like to create a GitHub Actions workflow? · Yes
Configuration saved to: .polyduprc.toml
GitHub Actions workflow created: .github/workflows/polydup.yml
Next Steps:
1. Install: npm install -g polydup
2. Scan: polydup scan ./src
.polyduprc.toml)After running polydup init, you'll have a .polyduprc.toml file:
[scan]
min_block_size = 50
similarity_threshold = 0.85
[scan.exclude]
patterns = [
"**/node_modules/**",
"**/__pycache__/**",
"**/*.test.js",
"**/*.test.py",
]
[output]
format = "text"
verbose = false
[ci]
enabled = false
fail_on_duplicates = true
Configuration Discovery:
.polyduprc.toml in current directory and parent directories# Scan a directory
polydup scan ./src
# Scan multiple directories
polydup scan ./src ./tests ./lib
# Custom threshold (0.0-1.0, higher = stricter)
polydup scan ./src --threshold 0.85
# Adjust minimum block size (lines)
polydup scan ./src --min-block-size 50
# JSON output for scripting
polydup scan ./src --format json > duplicates.json
Quick scan for severe duplicates:
polydup scan ./src --threshold 0.95 --min-block-size 20
Deep scan for similar code:
polydup scan ./src --threshold 0.70 --min-block-size 5
Scan specific file types:
# PolyDup auto-detects: .rs, .js, .ts, .jsx, .tsx, .py, .vue, .svelte
polydup scan ./src # Scans all supported languages
CI/CD integration:
# Exit with error if duplicates found
polydup scan ./src --threshold 0.90 || exit 1
Text (default): Human-readable colored output with file paths, line numbers, and similarity scores
JSON: Machine-readable format for scripting and tooling integration
polydup scan ./src --format json | jq '.duplicates | length'
PolyDup supports the following subcommands:
| Command | Description | Example |
|---|---|---|
scan |
Scan for duplicate code (default command) | polydup scan ./src |
init |
Interactive setup wizard | polydup init |
config |
Manage configuration file | polydup config validate |
cache |
Manage hash cache for fast git-diff scans | polydup cache build |
ignore |
Manage ignored duplicates | polydup ignore list |
See Ignore System Guide for comprehensive documentation on managing false positives.
Scan Command Options:
The scan command accepts all options listed below. When no subcommand is specified, scan is assumed for backward compatibility.
# These are equivalent:
polydup scan ./src --threshold 0.95
polydup ./src --threshold 0.95
Init Command Options:
| Option | Description |
|---|---|
--yes, -y |
Skip interactive prompts, use defaults |
--force |
Overwrite existing .polyduprc.toml |
--ci-only |
Only generate CI/CD configuration (skip .polyduprc.toml) |
CI-Only Mode:
Use --ci-only to add or update CI/CD workflows without modifying your existing configuration:
# Interactive: Choose your CI platform
polydup init --ci-only
# Non-interactive: Generate GitHub Actions workflow
polydup init --ci-only --yes
Supported CI platforms:
.github/workflows/polydup.yml).gitlab-ci.yml)azure-pipelines.yml)Jenkinsfile)Config Command:
Manage and validate your .polyduprc.toml configuration:
# Validate configuration
polydup config validate
# Show configuration summary
polydup config show
# Show configuration file path
polydup config path
Cache Command:
Manage the hash cache for fast git-diff duplicate detection:
# Build cache for entire codebase (run once, takes ~0.5-2s)
polydup cache build
# Build with custom threshold and verbose output
polydup cache build --min-tokens 100 -v
# View cache statistics
polydup cache info
# Clear the cache
polydup cache clear
How Caching Works:
polydup cache build scans all files and creates .polydup-cache.json with a hash indexpolydup scan --git-diff <range> automatically uses the cache if it existsWhen to Use:
The cache is automatically invalidated when files are modified (based on mtime/size). See Caching Guide for details.
| Option | Type | Default | Description |
|---|---|---|---|
--threshold |
float | 0.9 | Similarity threshold (0.0-1.0) |
--min-block-size |
int | 10 | Minimum lines per code block |
--format |
text|json | text | Output format |
--output |
path | - | Write report to file |
--only-type |
types | - | Filter by clone type (type-1, type-2, type-3) |
--exclude-type |
types | - | Exclude clone types |
--group-by |
criterion | - | Group results (file, similarity, type, size) |
--verbose |
flag | false | Show performance statistics |
--no-color |
flag | false | Disable colored output |
--debug |
flag | false | Enable debug mode with detailed traces |
--enable-type3 |
flag | false | Enable Type-3 gap-tolerant detection |
--save-baseline |
path | - | Save scan results as baseline for future comparisons |
--compare-to |
path | - | Compare against baseline (show only new duplicates) |
--git-diff |
range | - | Only scan files changed in git diff range (e.g., origin/main..HEAD) ⚡ Recommended for CI |
Performance Tip: For large codebases (>50K LOC), increase --min-block-size to 20-50 for faster scans with less noise.
The most powerful feature for CI/CD: Block new duplicates without failing on legacy code.
Many codebases have legacy duplication that's not worth fixing immediately. Baseline mode lets you:
Step 1: Create baseline from your main branch
# On main/master branch: capture current state
polydup scan ./src --save-baseline .polydup-baseline.json
git add .polydup-baseline.json
git commit -m "chore: add duplication baseline"
Step 2: Use in CI/CD to block new duplicates
# .github/workflows/polydup.yml
- name: Check for new duplicates
run: |
polydup scan ./src --compare-to .polydup-baseline.json
# Exits with code 1 if NEW duplicates found
# Exits with code 0 if no new duplicates (CI passes)
Step 3: See it in action on a PR
# Developer adds duplicate code in feature branch
polydup scan ./src --compare-to .polydup-baseline.json
Output:
ℹ Comparing against baseline: .polydup-baseline.json
11 total duplicates, 3 new since baseline
Duplicates
═══════════════════════════════════════════
1. Type-2 (renamed) | Similarity: 100.0% | Length: 59 tokens
├─ src/new-feature.ts:12
└─ src/utils.ts:45
❌ 3 new duplicates found since baseline
Exit code: 1 (CI fails, PR blocked)
Incremental improvement: Update baseline after cleanup
# Team cleans up 10 duplicates
polydup scan ./src --save-baseline .polydup-baseline.json
git add .polydup-baseline.json
git commit -m "chore: update baseline after duplication cleanup"
Combining with filters
# Save baseline excluding Type-3 (noisy matches)
polydup scan ./src --exclude-type type-3 --save-baseline baseline.json
# Only block new Type-1 and Type-2 duplicates
polydup scan ./src --only-type type-1,type-2 --compare-to baseline.json
Manual review mode
# See what duplicates are NEW (no CI failure, just info)
polydup scan ./src --compare-to baseline.json --format json \
| jq '.duplicates | length'
Use with GitHub Actions to comment on PRs:
- name: Check duplicates
id: polydup
run: |
OUTPUT=$(polydup scan ./src --compare-to .polydup-baseline.json --format json || true)
NEW_COUNT=$(echo "$OUTPUT" | jq '.duplicates | length')
echo "new_duplicates=$NEW_COUNT" >> $GITHUB_OUTPUT
- name: Comment on PR
if: steps.polydup.outputs.new_duplicates > 0
uses: actions/github-script@v7
with:
script: |
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: '⚠️ This PR introduces ${{ steps.polydup.outputs.new_duplicates }} new code duplicates. Please refactor before merging.'
})
The fastest, simplest way to check for duplicates in Pull Requests.
Advantages over Baseline Mode:
Single command to check duplicates in a PR:
# Scan only files changed between main and current branch
polydup scan . --git-diff origin/main..HEAD
CI/CD Integration:
GitHub Actions (Recommended):
Use the official PolyDup GitHub Action for the best experience:
# .github/workflows/polydup.yml
name: PolyDup Duplicate Detection
on:
pull_request:
branches: [ main, master ]
jobs:
duplicate-check:
runs-on: ubuntu-latest
name: Detect Duplicate Code
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- uses: wiesnerbernard/polydup-action@v1
with:
fail-on-duplicates: true
comment-on-pr: true
Features:
Manual Installation:
jobs:
duplicate-check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Install PolyDup
run: cargo install polydup
- name: Check for duplicates in PR
run: |
polydup scan . --git-diff origin/main..HEAD
# Exits with code 1 if duplicates found
1. Check uncommitted changes:
polydup scan . --git-diff HEAD
2. Compare branches:
polydup scan . --git-diff main..feature-branch
3. Check last N commits:
polydup scan . --git-diff HEAD~3..HEAD
4. JSON output for tooling:
polydup scan . --git-diff origin/main..HEAD --format json
git diff --name-only --diff-filter=ACMR <range># Before: Scanning entire codebase (50K LOC, 500 files)
polydup scan ./src # 🐢 Takes 15-20 seconds
# After: Git-diff mode (PR with 5 changed files)
polydup scan . --git-diff origin/main..HEAD # ⚡ Takes 0.5-1 second
10-100x speedup on large codebases with focused PRs!
--diff-filter=R, scanned correctlyfetch-depth: 0Use Git-Diff Mode (recommended):
Use Baseline Mode when:
Focus on specific types of duplicates for targeted refactoring:
# Show only exact duplicates (highest priority)
polydup scan ./src --only-type type-1
# Show only renamed duplicates
polydup scan ./src --only-type type-2
# Show both Type-1 and Type-2
polydup scan ./src --only-type type-1,type-2
# Exclude noisy Type-3 matches
polydup scan ./src --exclude-type type-3
Use cases:
--only-type type-1: Quick wins for immediate refactoring--only-type type-2: Identify abstraction opportunities--exclude-type type-3: Reduce false positives in large codebasesOrganize duplicates for different workflows:
# Group by file (refactoring prioritization)
polydup scan ./src --group-by file
# Group by similarity (quality triage)
polydup scan ./src --group-by similarity
# Group by clone type (targeted cleanup)
polydup scan ./src --group-by type
# Group by size (impact analysis)
polydup scan ./src --group-by size
Grouping strategies:
# Save report to file
polydup scan ./src --output duplicates.txt
# JSON for CI/CD pipelines
polydup scan ./src --format json --output report.json
# Disable colors for logs
polydup scan ./src --no-color
# Or use NO_COLOR environment variable
NO_COLOR=1 polydup scan ./src
# Verbose mode with performance stats
polydup scan ./src --verbose
Enhanced error messages with actionable suggestions:
# Enable debug mode for troubleshooting
polydup scan ./src --debug
# Debug mode shows:
# - Current working directory
# - File access permissions
# - Parser errors with context
# - Configuration validation details
Example error output:
Error: Path does not exist: /nonexistent/path
Suggestion: Check the path spelling and ensure it exists
Example: polydup scan ./src
polydup scan /absolute/path/to/project
Debug Info: Current directory: /Users/you/project
Mix and match for powerful workflows:
# High-priority refactoring targets
polydup scan ./src \
--only-type type-1 \
--group-by file \
--min-block-size 50 \
--output refactor-priorities.txt
# CI/CD duplicate gate
polydup scan ./src \
--threshold 0.95 \
--exclude-type type-3 \
--format json \
--output duplicates.json
# Deep analysis with verbose stats
polydup scan ./src \
--enable-type3 \
--group-by similarity \
--verbose
# Quick triage without noise
polydup scan ./src \
--only-type type-1,type-2 \
--group-by type \
--no-color
PolyDup provides a professional dashboard with actionable insights:
╔═══════════════════════════════════════════════════════════╗
║ Scan Results ║
╠═══════════════════════════════════════════════════════════╣
║ Files scanned: 142 ║
║ Functions analyzed: 287 ║
║ Duplicates found: 15 ║
║ Estimated savings: ~450 lines ║
╠═══════════════════════════════════════════════════════════╣
║ Clone Type Breakdown: ║
║ Type-1 (exact): 5 groups │ Critical priority ║
║ Type-2 (renamed): 8 groups │ High priority ║
║ Type-3 (modified): 2 groups │ Medium priority ║
╠═══════════════════════════════════════════════════════════╣
║ Top Offenders: ║
║ 1. src/handlers.ts 8 duplicates ║
║ 2. lib/utils.ts 5 duplicates ║
║ 3. components/Form.tsx 3 duplicates ║
╚═══════════════════════════════════════════════════════════╝
Duplicate #1 (Type-2: Renamed identifiers)
Location: src/auth.ts:45-68 ↔ src/admin.ts:120-143
Similarity: 94.2% | Length: 24 lines
...
Dashboard features:
PolyDup uses semantic exit codes for CI/CD integration:
| Exit Code | Meaning | Use Case |
|---|---|---|
0 |
No duplicates found | Clean codebase ✓ |
1 |
Duplicates detected | Quality gate (expected) |
2 |
Error occurred | Configuration/runtime issue |
CI/CD examples:
# Fail build if duplicates found
polydup scan ./src || exit 1
# Warning only (report but don't fail)
polydup scan ./src || true
# Strict quality gate (fail on any duplicates)
if polydup scan ./src --threshold 0.95; then
echo "No duplicates found"
else
echo "⚠️ Duplicates detected - please refactor"
exit 1
fi
.js, .jsx, .ts, .tsx.py.rs.vue.svelteMore languages coming soon (Java, Go, C/C++, Ruby, PHP)
PolyDup classifies duplicates into different types based on the International Workshop on Software Clones (IWSC) taxonomy:
Identical code fragments except for whitespace, comments, and formatting.
Example:
// File 1
function calculateTotal(items) {
let sum = 0;
for (let i = 0; i < items.length; i++) {
sum += items[i].price;
}
return sum;
}
// File 2 (Type-1 clone - only formatting differs)
function calculateTotal(items) {
let sum = 0;
for (let i = 0; i < items.length; i++) { sum += items[i].price; }
return sum;
}
Why they exist: Direct copy-paste without any modifications.
Structurally identical code with renamed identifiers, changed literals, or different types.
Example:
// File 1
function calculateTotal(items) {
let sum = 0;
for (let i = 0; i < items.length; i++) {
sum += items[i].price;
}
return sum;
}
// File 2 (Type-2 clone - renamed variables, same logic)
function computeSum(products) {
let total = 0;
for (let j = 0; j < products.length; j++) {
total += products[j].cost;
}
return total;
}
Why they exist: Copy-paste-modify pattern where developers adapt code for different contexts.
Detection: PolyDup normalizes identifiers and literals (e.g., sum → @@ID, 0 → @@NUM) to detect structural similarity.
Similar code with minor modifications like inserted/deleted statements or changed expressions. Type-3 detection finds code that has evolved differently but still shares significant structure.
Enable Type-3 detection:
polydup scan ./src --enable-type3 --type3-tolerance 0.85
Example:
// File 1
function processOrder(order) {
validateOrder(order);
let total = calculateTotal(order.items);
applyDiscount(total, order.coupon);
return total;
}
// File 2 (Type-3 clone - added logging, changed discount logic)
function processOrder(order) {
validateOrder(order);
console.log("Processing order:", order.id); // ADDED
let total = calculateTotal(order.items);
let discount = order.coupon ? 0.1 : 0; // MODIFIED
total *= (1 - discount); // MODIFIED
return total;
}
Why they exist: Code evolution, bug fixes, or feature additions that slightly modify duplicated logic.
When to use Type-3:
Tolerance setting: The --type3-tolerance flag (0.0-1.0) controls how similar code must be. Higher values = stricter matching.
Functionally equivalent code with different implementations.
Example:
// File 1 - Imperative loop
function sum(arr) {
let total = 0;
for (let i = 0; i < arr.length; i++) {
total += arr[i];
}
return total;
}
// File 2 - Functional approach
function sum(arr) {
return arr.reduce((acc, val) => acc + val, 0);
}
// File 3 - Recursive
function sum(arr, i = 0) {
if (i >= arr.length) return 0;
return arr[i] + sum(arr, i + 1);
}
Why they exist: Different programming paradigms or styles achieving the same result.
Detection Challenge: Requires semantic analysis, control-flow graphs, or ML-based approaches.
When PolyDup reports duplicates, the clone type indicates:
Typical Real-World Distribution:
Performance Note: PolyDup efficiently handles codebases up to 100K LOC. Tested on real-world projects with detection times under 5 seconds for most repos.
Possible causes:
--threshold to 0.70-0.80--min-block-size to 5-10 lines--enable-type3 for gap-tolerant matching# More sensitive scan
polydup scan ./src --threshold 0.70 --min-block-size 5 --enable-type3
Solutions:
--threshold 0.95 for high-confidence matches--exclude-type type-3 to remove noisy matches--min-block-size 50 for substantial duplicates only# Strict, high-quality scan
polydup scan ./src --threshold 0.95 --exclude-type type-3 --min-block-size 50
Fix:
# Check file permissions
ls -la /path/to/scan
# Run with proper permissions
chmod +r /path/to/files
# Use --debug to see detailed error info
polydup scan ./src --debug
Explanation: PolyDup currently supports JavaScript, TypeScript, Python, Rust, Vue, and Svelte. Other file types are skipped automatically.
Workaround:
Solution:
# Disable colors explicitly
polydup scan ./src --no-color
# Or use environment variable
NO_COLOR=1 polydup scan ./src
Solutions:
# Increase minimum block size to reduce memory usage
polydup scan ./src --min-block-size 100
# Scan directories separately
polydup scan ./src
polydup scan ./tests
polydup scan ./lib
# Exclude generated/vendor code
# Create .polyduprc.toml with exclude patterns
For large codebases (>50K LOC):
--min-block-size 50-100 to focus on substantial duplicates--exclude-type type-3 to skip gap-tolerant matching--threshold to 0.95 to reduce candidate matchesFor monorepos:
.polyduprc.toml at root with shared configuration--group-by file to organize results by modulenode_modules, dist, target, etc. in configFor CI/CD:
polydup binary to speed up pipeline--format json for machine-readable outputDebug Mode:
# Enable detailed error traces
polydup scan ./src --debug
Verbose Output:
# Show performance statistics
polydup scan ./src --verbose
Report an Issue:
polydup --version)--debug flagCommunity:
Prerequisites:
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh)CLI:
git clone https://github.com/wiesnerbernard/polydup.git
cd polydup
cargo build --release -p polydup
./target/release/polydup scan ./src
Node.js bindings:
cd crates/polydup-node
npm install
npm run build
npm test
Python bindings:
cd crates/polydup-py
pip install maturin
maturin develop
python -c "import polydup; print(polydup.version())"
Run tests:
# All tests
cargo test --workspace
# Specific crate
cargo test -p polydup-core
# With coverage
cargo install cargo-tarpaulin
cargo tarpaulin --workspace
Recommended: Create releases directly from GitHub UI - fully automated, no local tools required!
v0.2.7)Alternative: Use the release script locally:
./scripts/release.sh 0.2.5
See docs/RELEASE.md for detailed instructions.
Install pre-commit hooks to automatically run linting and tests:
# Install pre-commit (if not already installed)
pip install pre-commit
# Install the git hooks
pre-commit install
pre-commit install -t pre-push
# Run manually on all files
pre-commit run --all-files
The hooks will automatically run:
cargo fmt, cargo clippy, file checks (trailing whitespace, YAML/TOML validation)cargo testTo skip hooks temporarily:
git commit --no-verify
Contributions are welcome! Please:
git checkout -b feature/amazing-feature)pre-commit install)cargo test --workspace)cargo clippy --workspace --all-targets -- -D warnings)git commit -m 'Add amazing feature')git push origin feature/amazing-feature)See CONTRIBUTING.md for detailed guidelines.
MIT OR Apache-2.0