data-doctor-cli

Crates.iodata-doctor-cli
lib.rsdata-doctor-cli
version1.0.4
created_at2025-12-08 15:27:01.508321+00
updated_at2025-12-21 07:34:23.132085+00
descriptionA powerful data validation and cleaning tool for JSON and CSV files
homepage
repositoryhttps://github.com/jeevanms003/data-doctor
max_upload_size
id1973777
size31,590
Jeevan M Swamy (jeevanms003)

documentation

README

DataDoctor CLI 🩺

Crates.io Downloads License: MIT

DataDoctor CLI is your command-line companion for maintaining data health. It brings the power of the DataDoctor engine directly to your terminal, allowing you to validate, analyze, and repair JSON and CSV files instantly.


🚀 Installation

Option 1: Install from Crates.io (Recommended)

If you have Rust installed, this is the easiest way:

cargo install data-doctor-cli

This installs the data-doctor binary to your path.

Option 2: Build from Source

git clone https://github.com/jeevanms003/data-doctor.git
cd data-doctor
cargo install --path cli

🎮 How It Works

DataDoctor provides three primary modes of operation, designed for different workflows:

1. validate (The Checkup)

Best for: CI/CD pipelines, pre-commit hooks, or just checking file integrity.

This command scans your file and reports issues without modifying anything. It returns a non-zero exit code if errors are found, making it perfect for automated scripts.

data-doctor validate users.csv

2. fix (The Surgery)

Best for: Cleaning messy data dumps, fixing "broken" JSON from APIs.

This command actively repairs the file and saves the clean version to a new output path. It applies all available auto-fix strategies (e.g., adding missing quotes, padding columns).

data-doctor fix broken_data.json --out clean_data.json

3. doctor (The Full Treatment)

Best for: Interactive analysis and reporting.

This runs a validation pass, then an auto-fix pass, and generates a comprehensive report comparing the "before" and "after" states.

data-doctor doctor input.csv --out fixed.csv

📋 Command Reference

validate

data-doctor validate <INPUT> [OPTIONS]

Options:

  • --format <json|csv>: Force a specific file format (overrides extension detection).
  • --report-json: Print a machine-readable JSON object instead of the human-readable report.
  • --schema <FILE>: Validate against a custom schema definition.

fix

data-doctor fix <INPUT> --out <OUTPUT> [OPTIONS]

Options:

  • --out <FILE>: (Required) Where to save the fixed file.
  • --format <json|csv>: Force specific file format.

doctor

data-doctor doctor <INPUT> --out <OUTPUT> [OPTIONS]

Combines validate and fix functionalities with detailed logging.


🔍 What Can It Fix?

JSON Fixes (Advanced)

Issue Example (Before) Example (After)
Broken Structure [ { "a": 1 } } [ { "a": 1 } ] (Mismatched bracket fix)
Embedded Keys "desc": "val,"key": "v" "desc": "val", "key": "v"
Numeric Formats {"val": 0xFF, "oct": 0o77} {"val": 255, "oct": 63}
Invalid Booleans {"active": yes} {"active": true}
Leading Zeros {"id": 030} {"id": 30}
Trailing Commas {"a": 1,} {"a": 1}
Missing Commas {"a": 1 "b": 2} {"a": 1, "b": 2}
Unquoted Keys {name: "John"} {"name": "John"}
Single Quotes {'name': 'John'} {"name": "John"}
Unclosed Brackets [1, 2, 3 [1, 2, 3]

CSV Fixes

Issue Before After
Padding Columns A,B,C
1,2
A,B,C
1,2, (Empty added)
Trimming Cols A,B
1,2,3,4
A,B
1,2 (Extras removed)
Booleans Yes, No true, false
Whitespace Value Value

📊 JSON Reports

For integration with other tools (like dashboards), use --report-json.

Command:

data-doctor validate data.csv --report-json

Output:

{
  "success": false,
  "total_records": 100,
  "invalid_records": 5,
  "issues": [
    {
      "severity": "Error",
      "code": "CSV_TYPE_MISMATCH",
      "message": "Invalid Integer value",
      "row": 42,
      "column": 2
    }
  ]
}

📄 License

This project is licensed under the MIT License.

Commit count: 0

cargo fmt