| Crates.io | sql-splitter |
| lib.rs | sql-splitter |
| version | 1.12.6 |
| created_at | 2025-12-20 13:36:17.931339+00 |
| updated_at | 2025-12-27 06:22:03.201542+00 |
| description | High-performance CLI tool for splitting large SQL dump files into individual table files |
| homepage | |
| repository | https://github.com/helgesverre/sql-splitter |
| max_upload_size | |
| id | 1996517 |
| size | 2,638,587 |
Split large SQL dump files into individual table files. Fast, memory-efficient, multi-dialect.
COPY FROM stdin, GO batches)cargo install sql-splitter
git clone https://github.com/helgesverre/sql-splitter
cd sql-splitter
make install # Installs binary + shell completions + man pages
Or download pre-built binaries from GitHub Releases.
After installation, view documentation with man sql-splitter or man sql-splitter-diff.
For cargo install users, install man pages manually:
git clone https://github.com/helgesverre/sql-splitter
cd sql-splitter
make install-man
# MySQL/MariaDB dump (default)
sql-splitter split dump.sql -o tables/
# PostgreSQL pg_dump
sql-splitter split pg_dump.sql -o tables/ --dialect=postgres
# SQLite dump
sql-splitter split sqlite.sql -o tables/ --dialect=sqlite
# MSSQL/T-SQL dump (SSMS "Generate Scripts", sqlcmd)
sql-splitter split mssql_dump.sql -o tables/ --dialect=mssql
# Compressed files (auto-detected)
sql-splitter split backup.sql.gz -o tables/
sql-splitter split backup.sql.zst -o tables/
# Split specific tables only
sql-splitter split dump.sql --tables users,posts,orders
# Schema only (CREATE TABLE, indexes, etc.)
sql-splitter split dump.sql -o schema/ --schema-only
# Data only (INSERT/COPY statements)
sql-splitter split dump.sql -o data/ --data-only
# Merge split files back into single dump
sql-splitter merge tables/ -o restored.sql
# Merge specific tables only
sql-splitter merge tables/ -o partial.sql --tables users,orders
# Merge with transaction wrapper
sql-splitter merge tables/ -o restored.sql --transaction
# Analyze without splitting
sql-splitter analyze dump.sql
# Convert between SQL dialects
sql-splitter convert mysql_dump.sql -o postgres_dump.sql --to postgres
sql-splitter convert pg_dump.sql -o mysql_dump.sql --to mysql
sql-splitter convert dump.sql -o sqlite_dump.sql --to sqlite
sql-splitter convert mssql_dump.sql -o mysql_dump.sql --to mysql
# Convert with explicit source dialect
sql-splitter convert dump.sql --from postgres --to mysql -o output.sql
sql-splitter convert dump.sql --from mssql --to postgres -o output.sql
# Validate SQL dump integrity
sql-splitter validate dump.sql
# Validate with strict mode (warnings = errors)
sql-splitter validate dump.sql --strict
# Validate with JSON output for CI
sql-splitter validate dump.sql --json
# Batch operations with glob patterns
sql-splitter validate "dumps/*.sql" --fail-fast
sql-splitter analyze "**/*.sql"
sql-splitter split "*.sql" -o output/
sql-splitter convert "*.sql" --to postgres -o converted/
# Compare two SQL dumps for changes
sql-splitter diff old.sql new.sql
# Diff with schema-only or data-only
sql-splitter diff old.sql new.sql --schema-only
sql-splitter diff old.sql new.sql --data-only
# Diff with JSON or SQL migration output
sql-splitter diff old.sql new.sql --format json -o diff.json
sql-splitter diff old.sql new.sql --format sql -o migration.sql
# Diff with verbose PK samples and ignore timestamp columns
sql-splitter diff old.sql new.sql --verbose --ignore-columns "*.updated_at,*.created_at"
# Override primary key for tables without PK
sql-splitter diff old.sql new.sql --primary-key logs:timestamp+message
# Redact sensitive data using inline patterns
sql-splitter redact dump.sql -o safe.sql --null "*.ssn" --hash "*.email" --fake "*.name"
# Redact using YAML config file
sql-splitter redact dump.sql -o safe.sql --config redact.yaml
# Generate redaction config by analyzing input file
sql-splitter redact dump.sql --generate-config -o redact.yaml
# Reproducible redaction with seed
sql-splitter redact dump.sql -o safe.sql --null "*.password" --seed 42
# Generate ERD (Entity-Relationship Diagram)
sql-splitter graph dump.sql -o schema.html # Interactive HTML (default)
sql-splitter graph dump.sql -o schema.dot # Graphviz DOT format
sql-splitter graph dump.sql -o schema.mmd # Mermaid erDiagram
sql-splitter graph dump.sql -o schema.json # JSON with full schema details
# Graph with filtering
sql-splitter graph dump.sql --tables "user*,order*" -o filtered.html
sql-splitter graph dump.sql --exclude "log*,audit*" -o clean.html
sql-splitter graph dump.sql --cycles-only # Only tables in circular dependencies
# Focus on specific table and its relationships
sql-splitter graph dump.sql --table orders --transitive # Show all dependencies
sql-splitter graph dump.sql --table users --reverse # Show all dependents
# Reorder SQL dump in topological FK order
sql-splitter order dump.sql -o ordered.sql # Safe import order
sql-splitter order dump.sql --check # Check for cycles
sql-splitter order dump.sql --reverse # Reverse (for DROP operations)
# Query SQL dumps with DuckDB analytics engine
sql-splitter query dump.sql "SELECT COUNT(*) FROM users"
sql-splitter query dump.sql "SELECT * FROM orders WHERE total > 100" -f json
sql-splitter query dump.sql "SELECT * FROM users LIMIT 10" -o results.csv -f csv
sql-splitter query dump.sql --interactive # Start REPL session
sql-splitter query huge.sql "SELECT ..." --disk # Use disk mode for large files
# Query with caching for repeated queries
sql-splitter query dump.sql "SELECT ..." --cache # Cache imported database
sql-splitter query --list-cache # Show cached databases
sql-splitter query --clear-cache # Clear all cached databases
# Generate shell completions (auto-installed with make install)
sql-splitter completions bash >> ~/.bashrc
sql-splitter completions zsh >> ~/.zshrc
sql-splitter completions fish >> ~/.config/fish/completions/sql-splitter.fish
Shell completions are automatically installed when using make install. For manual installation:
# Install for current shell only
make install-completions
# Install for all shells (bash, zsh, fish)
make install-completions-all
sql-splitter is a dump-first, CLI-first tool designed for automation and CI/CD pipelines.
What it's optimized for
| Strength | Description |
|---|---|
| One tool for the workflow | Split → sample → shard → convert → merge in a single binary |
| Works on dump files | No running database or JDBC connection needed (unlike mydumper, Jailer, Condenser) |
| Streaming architecture | 10GB+ dumps with constant memory, 600+ MB/s throughput |
| Multi-dialect + conversion | MySQL, PostgreSQL, SQLite including COPY FROM stdin → INSERT |
| FK-aware operations | Sampling and tenant sharding preserve referential integrity |
When another tool might be better
See docs/COMPETITIVE_ANALYSIS.md for detailed comparisons.
| Flag | Description | Default |
|---|---|---|
-o, --output |
Output directory | output |
-d, --dialect |
SQL dialect: mysql, postgres, sqlite, mssql |
auto-detect |
-t, --tables |
Only split these tables (comma-separated) | — |
-p, --progress |
Show progress bar | — |
--dry-run |
Preview without writing files | — |
--schema-only |
Only DDL statements (CREATE, ALTER, DROP) | — |
--data-only |
Only DML statements (INSERT, COPY) | — |
--fail-fast |
Stop on first error (for glob patterns) | — |
--json |
Output results as JSON | — |
Input can be a file path or glob pattern (e.g., *.sql, dumps/**/*.sql).
| Flag | Description | Default |
|---|---|---|
-o, --output |
Output SQL file | stdout |
-d, --dialect |
SQL dialect for headers/footers | mysql |
-t, --tables |
Only merge these tables (comma-separated) | all |
-e, --exclude |
Exclude these tables (comma-separated) | — |
--transaction |
Wrap in BEGIN/COMMIT transaction | — |
--no-header |
Skip header comments | — |
-p, --progress |
Show progress bar | — |
--dry-run |
Preview without writing files | — |
--json |
Output results as JSON | — |
| Flag | Description | Default |
|---|---|---|
-d, --dialect |
SQL dialect: mysql, postgres, sqlite, mssql |
auto-detect |
-p, --progress |
Show progress bar | — |
--fail-fast |
Stop on first error (for glob patterns) | — |
--json |
Output results as JSON | — |
Input can be a file path or glob pattern (e.g., *.sql, dumps/**/*.sql).
| Flag | Description | Default |
|---|---|---|
-o, --output |
Output SQL file or directory (required for glob) | stdout |
--from |
Source dialect: mysql, postgres, sqlite, mssql |
auto-detect |
--to |
Target dialect: mysql, postgres, sqlite, mssql |
required |
--strict |
Fail on any unsupported feature | — |
-p, --progress |
Show progress bar | — |
--dry-run |
Preview without writing files | — |
--fail-fast |
Stop on first error (for glob patterns) | — |
--json |
Output results as JSON | — |
Input can be a file path or glob pattern (e.g., *.sql, dumps/**/*.sql).
Supported conversions (12 pairs):
Features:
| Flag | Description | Default |
|---|---|---|
-d, --dialect |
SQL dialect: mysql, postgres, sqlite, mssql |
auto-detect |
--strict |
Treat warnings as errors (exit 1) | — |
--json |
Output results as JSON | — |
--max-rows-per-table |
Max rows per table for PK/FK checks (0 = no limit) | 1,000,000 |
--no-limit |
Disable row limit for PK/FK checks | — |
--no-fk-checks |
Skip PK/FK data integrity checks | — |
-p, --progress |
Show progress bar | — |
--fail-fast |
Stop on first error (for glob patterns) | — |
Input can be a file path or glob pattern (e.g., *.sql, dumps/**/*.sql).
Validation checks:
| Flag | Description | Default |
|---|---|---|
-o, --output |
Output SQL file | stdout |
-d, --dialect |
SQL dialect: mysql, postgres, sqlite, mssql |
auto-detect |
--percent |
Sample percentage (1-100) | — |
--rows |
Sample fixed number of rows per table | — |
--preserve-relations |
Preserve FK relationships | — |
-t, --tables |
Only sample these tables (comma-separated) | all |
-e, --exclude |
Exclude these tables (comma-separated) | — |
--root-tables |
Explicit root tables for sampling | — |
--include-global |
Global table handling: none, lookups, all |
lookups |
--seed |
Random seed for reproducibility | random |
-c, --config |
YAML config file for per-table settings | — |
--max-total-rows |
Maximum total rows to sample (0 = no limit) | — |
--no-limit |
Disable row limit | — |
--strict-fk |
Fail if any FK integrity issues detected | — |
--no-schema |
Exclude CREATE TABLE statements from output | — |
-p, --progress |
Show progress bar | — |
--dry-run |
Preview without writing files | — |
--json |
Output results as JSON | — |
| Flag | Description | Default |
|---|---|---|
-o, --output |
Output SQL file or directory | stdout |
-d, --dialect |
SQL dialect: mysql, postgres, sqlite, mssql |
auto-detect |
--tenant-column |
Column name for tenant identification | auto-detect |
--tenant-value |
Single tenant value to extract | — |
--tenant-values |
Multiple tenant values (comma-separated) | — |
--root-tables |
Explicit root tables with tenant column | — |
--include-global |
Global table handling: none, lookups, all |
lookups |
-c, --config |
YAML config file for table classification | — |
--max-selected-rows |
Maximum rows to select (0 = no limit) | — |
--no-limit |
Disable row limit | — |
--strict-fk |
Fail if any FK integrity issues detected | — |
--no-schema |
Exclude CREATE TABLE statements from output | — |
-p, --progress |
Show progress bar | — |
--dry-run |
Preview without writing files | — |
--json |
Output results as JSON | — |
| Flag | Description | Default |
|---|---|---|
-o, --output |
Output file (default: stdout) | stdout |
-d, --dialect |
SQL dialect: mysql, postgres, sqlite, mssql |
auto-detect |
--schema-only |
Compare schema only, skip data | — |
--data-only |
Compare data only, skip schema | — |
--format |
Output format: text, json, sql |
text |
-t, --tables |
Only compare these tables (comma-separated) | all |
-e, --exclude |
Exclude these tables (comma-separated) | — |
--max-pk-entries |
Max PK entries to track (0 = no limit) | 10,000,000 |
-v, --verbose |
Show sample PK values for added/removed/modified rows | — |
--primary-key |
Override PK for tables (format: table:col1+col2) |
auto-detect |
--ignore-order |
Ignore column order differences in schema comparison | — |
--ignore-columns |
Ignore columns matching glob patterns (e.g., *.updated_at) |
— |
--allow-no-pk |
Compare tables without PK using all columns as key | — |
-p, --progress |
Show progress bar | — |
What diff detects:
Output formats:
text: Human-readable summary with optional PK samplesjson: Structured data for automation (includes warnings)sql: Migration script with ALTER/CREATE INDEX/DROP INDEX statements| Flag | Description | Default |
|---|---|---|
-o, --output |
Output file (html, dot, mmd, json, png, svg, pdf) | stdout |
--format |
Output format: html, dot, mermaid, json |
auto-detect |
-d, --dialect |
SQL dialect: mysql, postgres, sqlite, mssql |
auto-detect |
--layout |
Layout direction: lr (left-right), tb (top-bottom) |
lr |
-t, --tables |
Only include tables matching glob patterns | all |
-e, --exclude |
Exclude tables matching glob patterns | — |
--table |
Focus on a specific table | — |
--transitive |
Show all dependencies of focused table | — |
--reverse |
Show all tables that depend on focused table | — |
--max-depth |
Limit traversal depth | unlimited |
--cycles-only |
Only show tables in circular dependencies | — |
--render |
Render DOT to PNG/SVG/PDF using Graphviz | — |
-p, --progress |
Show progress bar | — |
--json |
Output as JSON | — |
Output formats:
html: Interactive diagram with dark/light theme, copy Mermaid buttondot: Graphviz DOT with ERD-style tables (columns, types, PK/FK markers)mermaid: Mermaid erDiagram syntaxjson: Full schema with tables, columns, relationships, and statspng/svg/pdf: Rendered image (requires Graphviz dot command)| Flag | Description | Default |
|---|---|---|
-o, --output |
Output SQL file | stdout |
-d, --dialect |
SQL dialect: mysql, postgres, sqlite, mssql |
auto-detect |
--check |
Check for cycles and report order (don't write) | — |
--dry-run |
Show topological order without writing | — |
--reverse |
Reverse order (children before parents, for DROP) | — |
| Flag | Description | Default |
|---|---|---|
-o, --output |
Output SQL file | stdout |
-d, --dialect |
SQL dialect: mysql, postgres, sqlite, mssql |
auto-detect |
-c, --config |
YAML config file for redaction rules | — |
--generate-config |
Analyze input and generate annotated YAML config | — |
--null |
Columns to set to NULL (glob patterns, comma-separated) | — |
--hash |
Columns to hash with SHA256 (glob patterns) | — |
--fake |
Columns to replace with fake data (glob patterns) | — |
--mask |
Columns to partially mask (format: pattern=column) |
— |
--constant |
Column=value pairs for constant replacement | — |
--seed |
Random seed for reproducible redaction | random |
--locale |
Locale for fake data (en, de_de, fr_fr, etc.) | en |
-t, --tables |
Only redact specific tables (comma-separated) | all |
-e, --exclude |
Exclude specific tables (comma-separated) | — |
--strict |
Fail on warnings (e.g., unsupported locale) | — |
-p, --progress |
Show progress bar | — |
--dry-run |
Preview without writing files | — |
--json |
Output results as JSON | — |
--validate |
Validate config only, don't process | — |
Redaction strategies:
null: Replace value with NULLconstant: Replace with fixed valuehash: SHA256 hash (deterministic, preserves FK relationships)mask: Partial masking with pattern (*=asterisk, X=keep, #=random digit)fake: Generate realistic fake data (25+ generators)shuffle: Redistribute values within column (preserves distribution)skip: No redaction (passthrough)Fake data generators:
email, name, first_name, last_name, phone, address, city, state, zip, country, company, job_title, username, url, ip, ipv6, uuid, date, datetime, credit_card, iban, ssn, lorem, paragraph, sentence
| Flag | Description | Default |
|---|---|---|
-f, --format |
Output format: table, json, jsonl, csv, tsv |
table |
-o, --output |
Write output to file instead of stdout | stdout |
-d, --dialect |
SQL dialect: mysql, postgres, sqlite, mssql |
auto-detect |
-i, --interactive |
Start interactive REPL session | — |
--disk |
Use disk-based storage (for large dumps >2GB) | auto |
--cache |
Cache imported database for repeated queries | — |
-t, --tables |
Only import specific tables (comma-separated) | all |
--memory-limit |
Memory limit for DuckDB (e.g., "4GB") | — |
--timing |
Show query execution time | — |
-p, --progress |
Show import progress | — |
--list-cache |
List cached databases | — |
--clear-cache |
Clear all cached databases | — |
REPL commands:
.tables — List all tables.schema [table] — Show schema (all tables or specific table).describe <table> — Describe a specific table.format <fmt> — Set output format (table, json, csv, tsv).count <table> — Count rows in a table.sample <table> [n] — Show sample rows (default: 10).export <file> <query> — Export query results to file.exit — Exit the REPLSee BENCHMARKS.md for detailed comparisons.
# Unit tests
cargo test
# Verify against real-world SQL dumps (MySQL, PostgreSQL, WordPress, etc.)
make verify-realworld
sql-splitter includes documentation optimized for AI agents:
Install the skill in Claude Code / Amp:
amp skill add helgesverre/sql-splitter
MIT — see LICENSE.md