# Solidify: CSV data consolidator
## Introduction
Solidify is a command line tool that allows to combine CSV/TSV files like so:
Input 1
|
Input 2
|
Country Population
China 1.41B
India 1.39B
US 333M
|
Country Area
Canada 10M km²
US 9.8M km²
China 9.6M km²
|
Output:
```
Country Population Area
China 1.41B 9.6M km²
India 1.39B N/A
US 333M 9.8M km²
Canada N/A 10M km²
```
## Installation
Install [Rust](https://www.rust-lang.org/), then run:
```
cargo install solidify
```
## Usage
### Basic usage
The [introductory example](#introduction) can be reproduced using the following command:
```
solidify -i 1.tsv 2.tsv -o out.tsv --shared 1 --filler N/A
```
Here `--shared 1` refers to the fact that the first column is [shared](#shared-columns) between `1.tsv` and `2.tsv`—and it is this column’s contents that are used to identify and match records across the files.
### Inputs
You can specify two or more input files to be combined using `-i` or `--inputs`:
```
-i 1.tsv 2.tsv
--inputs a.csv b.csv c.csv
```
### Output
You have to specify the output file with `-o` or `--output`:
```
-o out.tsv
--output combined.csv
```
To prevent accidental overriding of data, the output path must be different from all the input paths.
### Delimiter
Solidify does not attempt to autodetect delimiters used in your data, so you need to manually specify one (the same delimiter will also be applied to the output). If a delimiter is not provided, the default will be assumed: the tab character (`" "`). To prevent any mistakes when specifying a delimiter, Solidify will exit with an error if each of the input files appears to have a single column (unless you explicitly [allow](#single-columned-inputs) it).
Only ASCII characters are currently accepted as delimiters. You can provide one with `-d` or `--delimiter`:
```
-d ,
--delimiter " "
```
### Shared columns
Using `-s`, or `--shared`, you can specify which of the columns of your data are shared between input files (in case there are multiple columns, each value has to be provided separately by repeating the option):
```
-s 1
--shared 3
-s 2 -s 3 -s 8
```
These columns will be used to identify which records should be matched and merged.
#### Reverse indexing
Negative values refer to columns in reverse order, that is, `-1` refers to the last column, `-2` to the second-to-last, etc. To guarantee consistency of output data, negatively indexed columns are not allowed to precede any positively indexed column in any of the input files.
#### Merge all vs. merge none
If no shared columns are specified, any pair of records will be considered matching (given [multiway merge](#multiway-merge) is allowed).
For instance, running
```
solidify -i 1.tsv 2.tsv -o out.tsv --multi
```
against the [introductory example](#introduction) would produce the following output:
```
Country Population Country Area
China 1.41B Canada 10M km²
India 1.39B US 9.8M km²
US 333M China 9.6M km²
```
In contrast, if a special value of `0` is provided as the value of `-s`/`--shared`, no two records will be considered matching. Running
```
solidify -i 1.tsv 2.tsv -o out.tsv -s 0 -s 1 --filler N/A
```
will hence produce:
```
Country Population N/A
China 1.41B N/A
India 1.39B N/A
US 333M N/A
Country N/A Area
Canada N/A 10M km²
US N/A 9.8M km²
China N/A 9.6M km²
```
### Single-columned inputs
To prevent any mistakes when specifying a [delimiter](#delimiter), Solidify will exit with an error if each of the input files appears to have a single column. To allow processing such inputs, pass the `--single` flag.
### Multiway merge
When data admits multiple ways to match records, Solidify needs to be passed the `--multi` flag to proceed. If the flag is set, records will be matched in the order they appear in input files (see [Merge all vs. merge none](#merge-all-vs-merge-none) for an example).
### Filler
The value of `--filler` determines the content of unmatched cells (`N/A` in the [introductory example](#introduction)). If not provided, an empty string will be used.
### Warn on similar records
To track records not being matched due to typos, you may set `--warn-similar` to a positive integer. If the combined edit distance between a pair of records does not exceed this value, and yet the records are not identical, a warning will be displayed. Only values in columns declared as [shared](#shared-columns) are compared.
### Warn on unmatched records
When the flag `--warn-unmatched` is set, any records that could not be matched with any records in at least one of the other input files will be reported.