csvdiff

Crates.iocsvdiff
lib.rscsvdiff
version0.3.1
created_at2025-06-06 19:21:23.605855+00
updated_at2025-09-25 13:25:45.989357+00
descriptionCompare two CSV files by key and show differences.
homepage
repositoryhttps://github.com/TahaHachana/csvdiff
max_upload_size
id1703360
size45,781
Taha Hachana (TahaHachana)

documentation

README

CSV Diff Tool

A command-line utility to compare two CSV files based on specified key columns and report the differences.

Features

  • Flexible Comparison: Compares two CSV files using single or composite key columns
  • Smart Truncation: Automatically truncates large outputs (similar to Polars DataFrames) for better readability
  • Excel Report Generation: Creates comprehensive Excel reports with summary, headers comparison, and data differences
  • Header Mismatch Handling: Intelligently handles files with different column structures using name-based comparison
  • Column Filtering: Allows ignoring specific columns during comparison
  • Missing Row Detection: Reports rows present in one file but not the other
  • Cell-Level Differences: Reports cells with differing values for the same key
  • Configurable Output: Control table size with customizable row and cell width limits
  • Summary Statistics: Provides clear summaries with total difference counts

Usage

csvdiff --file1 <path_to_file1.csv> --file2 <path_to_file2.csv> --key <key_column_name> [OPTIONS]

Options

  • --file1 <PATH>: Path to the first CSV file
  • --file2 <PATH>: Path to the second CSV file
  • -k, --key <KEY_COLUMN>: Specifies a key column. Can be repeated for composite keys (e.g., --key id --key name)
  • -i, --ignore <IGNORE_COLUMN>: Specifies a column to ignore during comparison. Can be repeated
  • --max-rows <NUMBER>: Maximum number of rows to display (default: 20)
  • --max-cell-width <NUMBER>: Maximum width for cell content (default: 30)
  • --no-truncate: Show all differences without truncation
  • --excel-output <PATH>: Generate Excel report with summary, headers comparison, and data differences
  • --help: Prints help information
  • --version: Prints version information

Examples

Basic Comparison

# Compare two files using a single key column
csvdiff --file1 products_old.csv --file2 products_new.csv --key product_id

Composite Key Comparison

# Use multiple columns as a composite key
csvdiff --file1 inventory.csv --file2 updated_inventory.csv --key sku --key size --key color

Ignoring Columns

# Ignore timestamp and description columns during comparison
csvdiff --file1 data1.csv --file2 data2.csv --key id --ignore timestamp --ignore description

Controlling Output Size

# Show only 10 rows with cell content limited to 20 characters
csvdiff --file1 large_file1.csv --file2 large_file2.csv --key id --max-rows 10 --max-cell-width 20

# Show all differences without any truncation
csvdiff --file1 file1.csv --file2 file2.csv --key id --no-truncate

Large Dataset Example

# Compare large CSV files with smart truncation (recommended for files with thousands of rows)
csvdiff --file1 dataset_v1.csv --file2 dataset_v2.csv --key sku --key size --key colour --max-rows 15

Excel Report Generation

# Generate a comprehensive Excel report with three sheets
csvdiff --file1 data1.csv --file2 data2.csv --key id --excel-output comparison_report.xlsx

# Combine with other options for customized analysis
csvdiff --file1 large_file1.csv --file2 large_file2.csv --key sku --key size --ignore timestamp --excel-output detailed_report.xlsx

Header Mismatch Handling

# Compare files with different column structures
csvdiff --file1 old_format.csv --file2 new_format.csv --key id

# The tool will automatically:
# - Compare columns by name (not position)
# - Show [column not in file1] for columns unique to file2
# - Show [column not in file2] for columns unique to file1
# - Only compare columns that exist in both files

Output Format

The tool displays differences in a clear tabular format:

+--------------------------------+----------------------------+--------------------------------+------------------------------+
| key                            | column                     | file1                          | file2                        |
+--------------------------------+----------------------------+--------------------------------+------------------------------+
| PROD001|M|Blue                 | price                      | 19.99                          | 24.99                        |
+--------------------------------+----------------------------+--------------------------------+------------------------------+
| PROD002|L|Red                  | availability               | in_stock                       | out_of_stock                 |
+--------------------------------+----------------------------+--------------------------------+------------------------------+
| PROD003|S|Green                | [missing in file2]         | Complete product data...       |                              |
+--------------------------------+----------------------------+--------------------------------+------------------------------+
| PROD004|L|Black                | description                | [column not in file1]         | New product description      |
+--------------------------------+----------------------------+--------------------------------+------------------------------+
| PROD005|M|Red                  | old_category               | Legacy category                | [column not in file2]        |
+--------------------------------+----------------------------+--------------------------------+------------------------------+
| ...                            | ... (1,247 more rows) ... | ...                            | ...                          |
+--------------------------------+----------------------------+--------------------------------+------------------------------+

📊 Summary: 1,250 total differences found
   Showing 20 rows (use --max-rows to adjust or --no-truncate to show all)

Understanding the Output

  • [missing in file2]: Entire row exists only in file1
  • [missing in file1]: Entire row exists only in file2
  • [column not in file1]: Column exists only in file2
  • [column not in file2]: Column exists only in file1
  • Different values: When both files have the column but values differ

Excel Reports

When using --excel-output, the tool generates a comprehensive Excel workbook with three sheets:

📋 Sheet 1: Summary

  • File paths and comparison metadata
  • Total difference counts and statistics
  • Header compatibility analysis
  • Breakdown by difference type (data changes vs missing rows)

📊 Sheet 2: Headers Comparison

  • Side-by-side comparison of all column headers
  • Identification of columns unique to each file
  • Clear status indicators (Match, Only in File 1, Only in File 2)

📈 Sheet 3: Data Differences

  • Complete list of all differences (no truncation)
  • Organized by key, column, and values from both files
  • Proper Excel formatting with headers and auto-sized columns
  • Suitable for further analysis, filtering, and sharing

Example Excel Output:

csvdiff --file1 products.csv --file2 updated_products.csv --key sku --excel-output product_changes.xlsx
# Generates: product_changes.xlsx with professional formatting

Performance

The tool is optimized for large datasets:

  • ✅ Handles CSV files with tens of thousands of rows
  • ✅ Smart memory usage with streaming CSV processing
  • ✅ Polars-style truncation prevents terminal overflow
  • ✅ Configurable output limits for different use cases
  • ✅ Efficient Excel generation for comprehensive reporting
  • ✅ Tested with 45,000+ differences in production datasets
  • ✅ Robust header mismatch handling with name-based column comparison

Installation

From crates.io

cargo install csvdiff

From Source

git clone https://github.com/TahaHachana/csvdiff.git
cd csvdiff
cargo build --release

The binary will be available at target/release/csvdiff.

Prerequisites

  • Rust 1.70 or later

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT OR Apache-2.0 license.

Commit count: 13

cargo fmt