# Performance ## Disclaimer One of frawk's goals is to be efficient. The abundance of such claims notwithstanding, I've found it is very hard to precisely state the degree to which an entire _programming language implementation_ is or is not efficient. I have no doubt that there are programs in frawk that are slower than equivalent programs in Rust or C, or even mawk or gawk (indeed, see the "Group By Key" benchmark). In some cases this will be due to bugs or differences in the quality of the language implementations, in others it will be due to bugs or differences in the quality of the programs themselves, while in still others it may be due to the fact that in some nontrivial sense one language allows you to write the program more efficiently than another. I've found that it can be very hard to distinguish between these three categories of explanations when doing performance analysis. As such, I encourage everyone reading this document to draw at most modest conclusions based on the numbers below. What I do hope this document demonstrates are some of frawk's strengths when it comes to some common scripting tasks on CSV and TSV data. (Awk and frawk can process other data formats as well, but in my experience larger files are usually in CSV, TSV, or some similar standardized format). ## Benchmark Setup All benchmark numbers report the minimum wall time across 5 iterations per hardware configuration, along with the composite system and user times for that invocation as reported by the `time` command. A computation running with wall time of 2.5 seconds, and CPU time of 10.2s for user and 3.4s for system is reported as "2.5s (10.2s + 3.4s)". We also report throughput numbers, which are computed as wall time divided by input file size. This doc includes measurements for both parallel and serial invocations of frawk, and it also provides numbers for frawk using its LLVM and its Cranelift backend, with all optimizations enabled. Note that frawk adaptively chooses the number of workers to launch for parallel invocations, so the ratio of CPU to wall time can vary across invocations. XSV supports parallelism but I noticed no performance benefit from this feature without first building an index of the underlying CSV file (a process which can take longer than the benchmark task itself without amortizing the cost over multiple runs). Similarly, I do not believe it is possible to parallelize the other programs using generic tools like `gnu parallel` without first partitioning the data-set into multiple sub-files. ### `-itsv` vs `-F'\t'` frawk allows you to specify the input format as TSV using the `itsv` option, but it also provides support for traditional Awk field separators using `-F` or by setting the `FS` variable. These two are not the same; they end up invoking two completely separate parsers under the hood. `-itsv` looks for escape sequences (like `\t`) and converts them to their corresponding characters. `-F'\t'`, on the other hand, does a more naive "split by tabs" algorithm. `-F'\t'` tends to perform faster than `-itsv` because parsing is less complex, and field size computations are easier to perform. The latter fact makes it easier to optimize scripts that do not parse a large suffix of columns in the input (e.g. only using columns 3 and 5 of a 30-column file). This is a feature of most of the benchmarks in this document, and, as I understand it, it's a big part of why tsv-utils is a consistent leader in performance here. Because tsv-utils, mawk and gawk are all invoked without the extra escaping behavior, the frawk invocations all use `-F'\t'` on TSV inputs. If you are curious about how frawk performs using `itsv` but don't want to run these benchmarks yourself, it's my experience that `frawk -itsv` achieves similar (albeit slightly higher) throughput to using `icsv` on an equivalent CSV file. ### Test Data These benchmarks run on CSV and TSV versions of two data-sets: * `all_train.csv` from the [HEPMASS dataset](https://archive.ics.uci.edu/ml/datasets/HEPMASS) in the UCI machine learning repository. This is a 7 million-row file and both CSV and TSV versions of the file are roughly 5.2GB, uncompressed. * `TREE_GRM_ESTN.csv` from The Forest Service (data-sets available [here](https://apps.fs.usda.gov/fia/datamart/CSV/datamart_csv.html)). This file is a little over 36 million rows; the CSV version is 8.9GB, and the TSV version is 7.9GB. ### Hardware This doc reports performance numbers from two machines, a new (late 2019) MacOS laptop, and an older (mid-2016, Broadwell-based) workstation running Ubuntu. They're called out as "MacOS" and "Linux" not because I take these numbers to be benchmarks of those operating systems, but as a shorthand for the whole configuration, software and hardware, used in those benchmarks. As the Tools section makes clear, the versions of the tools I used are all slightly different. Comparisons within the same configuration should be safe while comparisons across configurations are less so. **MacOS (Newer Hardware)** This is a 16-inch Macbook Pro running MacOS Big Sur 11.2.1 with an 8-core i9 clocked at 2.3GHz, boosting up to 4.8GHz. One thing to keep in mind about newer Apple hardware is that the SSDs are very fast: read throughput can top 2.5GB/s. None of these benchmarks are IO-bound on this machine, although various portions of the input files might be cached by the OS. **Linux (Older Hardware)** This is a dual-socket workstation with 2 8-core Xeon 2620v4 CPUs clocked at 2.2 GHz or so and boosting up to 2.9GHz on a single core, with 64GB of RAM. Reads are serviced from an NVMe SSD that is pretty fast, but not as fast as the SSD in the Mac. I do not think any of the benchmarks are IO-bound on this machine. The machine is running Ubuntu 18.04 with a 4.15 kernel. While the results are varied from benchmark to benchmark, I tend to find that while frawk has good performance overall, it does noticeably better on the newer hardware running MacOS. The absolute difference in these numbers are probably due to having a newer CPU with a much higher boost clock, but I am less sure about the relative performance differences between frawk and tsv-utils. One contributing factor might be frawk's use of AVX2 driving the clock rate down to a greater degree on the Broadwell-E CPU in Linux than the more recent CPU on the Mac configuration. frawk's LLVM backend generally performs better than Cranelift (this is as expected, as Cranelift does not implement as many optimizations as of yet), but the performance is usually pretty close. ### Tools These benchmarks report values for a number of tools, each with slightly different intended use-cases and strengths. Not all of the tools are set up well to handle each task, but each makes an appearance in some subset of the benchmarks. All benchmarks include `frawk`, of course; in this case both the `use_jemalloc` and `allow_avx2` features were enabled. * [`mawk`](https://invisible-island.net/mawk/) is an Awk implementation due to Mike Brennan. It is focused on the "Awk book" semantics for the language, and on efficiency. I used mawk v1.3.4 on MacOS and Linux. The Linux copy is compiled from source with -march=native. * [`gawk`](https://www.gnu.org/software/gawk/manual/) is GNU's Awk. It is actively maintained and includes several extensions to the Awk language (such as multidimensional arrays). It is the default Awk on many Linux machines. I used gawk v5.0.1 on MacOS and gawk v5.1 on Linux. All `gawk` invocations use the `-b` flag, which disables multibyte character support. This is a bit unfair to xsv, frawk, and tsv-utilities, which all support UTF-8 input and (e.g.) matching regular expressions in a UTF-8-aware manner. However, the benchmarks here do not make any special use of UTF-8, and gawk performs substantially better with this option enabled (sometimes >2x). * [`xsv`](https://github.com/BurntSushi/xsv) is a command-line utility for performing common computations and transformations on CSV data. I used xsv v0.13.0 on both platforms * [`tsv-utils`](https://github.com/eBay/tsv-utils) is a set of command-line utilities offering a similar feature-set to xsv, but with a focus on TSV data. I used tsv-utils `v2.1.1_linux-x86_64_ldc2` downloaded from the repo for MacOS and Linux. There are plenty of other tools (including other Awks) I could put here (see the benchmarks referenced on xsv and tsv-utils' pages for more), but I chose a small set of tools that could execute some subset of the benchmarks efficiently. We benchmark on CSV and TSV data. Some utilities are not equipped for all possible configurations, for example: * `tsv-utils` is explicitly optimized for TSV and not CSV. We only run it on the TSV variants of these datasets, but note that the bundled `csv2tsv` takes longer -though not that much longer- than many of the benchmark workloads (28.7s for TREE_GRM_ESTN and 14.6s for all_train, with the new version v2.1.1). * Benchmarks for the `TREE_GRM_ESTN` dataset are not run on CSV input with `mawk` or `gawk`, as it contains quoted fields and these utilities do not handle CSV escaping out of the box. We still run them on the TSV versions of these files using `\t` as a field separator. * `tsv-utils` and `xsv` are both limited to a specific menu of common functions. When those functions cannot easily express the computation in question, we leave them out. Scripts and output for these benchmarks are [here](https://github.com/ezrosent/frawk/blob/master/info/scripts). They do not include the raw data or the various binaries in question, but they should be straightforward to adapt for a given set of installs on another machine. Outputs from the Linux configuration lives are in the ".2" files. ## Ad-hoc number-crunching Before we start benchmarking purpose-built tools, I want to emphasize the power that Awk provides for basic number-crunching tasks. Not only can you efficiently combine common tasks like filtering, group-by, and arithmetic, but you can build complex expressions as well. Suppose I wanted to take the scaled, weighted sum of the first column, and the maximum of column 4 and column 5, when column 8 had a particular value. In Awk this is a pretty simple script: ```awk function max(x,y) { return x Note: the `+0`s ensure we are always performing a numeric comparison. In Awk > this is unnecessary if all instances of column $4 and column $5 are numeric; > in frawk this is required: max will perform a lexicographic comparison instead > of a numeric one without the explicit conversion. But I have to preprocess the data to run Awk on it, as Awk doesn't properly support quoted CSV fields by default. Supposing I didn't want to do that, the next thing I'd reach for is a short script in Python: ```python import csv import sys def parse_float(s): try: return float(s) except ValueError: return 0.0 accum = 0 with open(sys.argv[1], "r") as f: csvr = csv.reader(f) for row in csvr: if row[7] == "GS": f1 = parse_float(row[0]) f4 = parse_float(row[3]) f5 = parse_float(row[4]) accum += ((0.5*f1 + 0.5*max(f4,f5)) / 1000.0) print(accum) ``` If I was concerned about the running time for the script, I would probably write the script in Rust instead rather than spending time trying to optimize the Python. Here's one I came up with quickly using the [`csv`](https://github.com/BurntSushi/rust-csv) crate: ```rust extern crate csv; use std::fs::File; use std::io::BufReader; fn get_field_default_0(record: &csv::StringRecord, field: usize) -> f64 { match record.get(field) { Some(x) => x.parse().unwrap_or(0.0), None => 0.0, } } fn main() -> std::io::Result<()> { let args: Vec = std::env::args().collect(); let filename = &args[1]; let f = File::open(filename)?; let br = BufReader::new(f); let mut csvr = csv::Reader::from_reader(br); let mut accum = 0f64; for record in csvr.records() { let record = match record { Ok(record) => record, Err(err) => { eprintln!("failed to parse record: {}", err); std::process::abort() } }; if record.get(7) == Some("GS") { let f1: f64 = get_field_default_0(&record, 0); let f4: f64 = get_field_default_0(&record, 3); let f5: f64 = get_field_default_0(&record, 4); let max = if f4 < f5 { f5 } else { f4 }; accum += (0.5 * f1 + 0.5 * max) / 1000f64 }. } println!("{}", accum); Ok(()) } ``` This takes 8.2 seconds to compile for a release build. Depending on the setting this either is or isn't important. We'll leave it out for now, but keep in mind that frawk's and python's runtimes include the time it takes to compile a program. **MacOS** | Program | Format | Running Time | Throughput | | -- | -- | -- | -- | | Python | CSV | 2m48.7s (2m47.4s + 1.3s) | 53.02 MB/s | | Rust | CSV | 25.9s (24.8s + 1.1s) | 345.57 MB/s | | frawk (cranelift) | CSV | 19.9s (18.8s + 1.1s) | 450.13 MB/s | | frawk (cranelift, parallel) | CSV | 4.9s (23.2s + 1.2s) | 1827.84 MB/s | | frawk (llvm) | CSV | 19.6s (18.5s + 1.1s) | 457.12 MB/s | | frawk (llvm, parallel) | CSV | 4.9s (22.9s + 1.2s) | 1842.90 MB/s | **Linux** | Program | Format | Running Time | Throughput | | -- | -- | -- | -- | | Python | CSV | 2m17.1s (2m15.3s + 1.8s) | 65.23 MB/s | | Rust | CSV | 29.9s (28.4s + 1.6s) | 299.05 MB/s | | frawk (cranelift) | CSV | 29.4s (27.3s + 2.2s) | 304.02 MB/s | | frawk (cranelift, parallel) | CSV | 9.8s (35.5s + 3.5s) | 908.81 MB/s | | frawk (llvm) | CSV | 28.4s (26.6s + 1.8s) | 315.46 MB/s | | frawk (llvm, parallel) | CSV | 9.9s (35.2s + 3.8s) | 905.22 MB/s | frawk is a good deal faster than the other options, particularly on the newer hardware, or when run in parallel. Now, the Rust script could of course be optimized substantially (frawk is implemented in Rust, after all). But if you need to do exploratory, ad-hoc computations like this, a frawk script is probably going to be faster than the first few Rust programs you write. With that said, for many tasks you do not need a full programming language, and there are purpose-built tools for computing the particular value or transformation on the dataset. The rest of these benchmarks compare frawk and Awk to some of those. ## Sum two columns _Sum columns 4 and 5 from `TREE_GRM_ESTN` and columns 6 and 18 for `all_train`_ Programs: * Awk: `-F{,\t} {sum1 += ${6,4}; sum2 += ${18,5};} END { print sum1,sum2}` * frawk: `-i{c,t}sv {sum1 += ${6,4}; sum2 += ${18,5};} END { print sum1,sum2}` * tsv-utils: `tsv-summarize -H --sum {6,4},{18,5}` Awk was only run on the TSV version of TREE_GRM_ESTN, as it has quote-escaped columns, tsv-utils was only run on TSV versions of both files, and xsv was not included because there is no way to persuade it to _just_ compute a sum. As can be seen below, frawk in parallel mode was the fastest utility in terms of wall-time on MacOS, but it is slightly slower on the Linux hardware; on a per-core basis frawk was faster than mawk and gawk but slower than tsv-utils on both configurations. **MacOS** | Program | Format | Running Time (TREE_GRM_ESTN) | Throughput (TREE_GRM_ESTN) | Running Time (all_train) | Throughput (all_train) | | -- | -- | -- | -- | -- | -- | | mawk | CSV | NA | NA | 10.8s (9.8s + 1.1s) | 477.98 MB/s | | mawk | TSV | 42.0s (40.5s + 1.5s) | 187.81 MB/s | 10.8s (9.8s + 1.0s) | 478.38 MB/s | | gawk | CSV | NA | NA | 10.4s (9.5s + 0.9s) | 500.19 MB/s | | gawk | TSV | 14.0s (12.7s + 1.3s) | 562.60 MB/s | 10.4s (9.6s + 0.9s) | 496.07 MB/s | | tsv-utils | TSV | 5.6s (5.0s + 0.6s) | 1397.24 MB/s | 2.7s (2.2s + 0.4s) | 1953.39 MB/s | | frawk (llvm) | CSV | 18.0s (16.9s + 1.1s) | 496.91 MB/s | 3.6s (3.0s + 0.6s) | 1436.06 MB/s | | frawk (llvm) | TSV | 10.0s (9.0s + 1.0s) | 790.27 MB/s | 3.1s (2.5s + 0.6s) | 1650.23 MB/s | | frawk (llvm, parallel) | CSV | 4.9s (23.1s + 1.3s) | 1822.99 MB/s | 1.8s (4.7s + 0.7s) | 2846.86 MB/s | | frawk (llvm, parallel) | TSV | 3.4s (12.4s + 1.0s) | 2332.73 MB/s | 1.7s (4.3s + 0.7s) | 3065.98 MB/s | | frawk (cranelift) | CSV | 18.2s (17.0s + 1.1s) | 492.75 MB/s | 3.7s (3.0s + 0.6s) | 1405.66 MB/s | | frawk (cranelift) | TSV | 10.3s (9.3s + 1.0s) | 763.36 MB/s | 3.1s (2.5s + 0.6s) | 1652.34 MB/s | | frawk (cranelift, parallel) | CSV | 4.9s (23.3s + 1.3s) | 1807.89 MB/s | 1.8s (4.7s + 0.6s) | 2896.22 MB/s | | frawk (cranelift, parallel) | TSV | 3.4s (12.4s + 1.0s) | 2350.80 MB/s | 1.7s (4.4s + 0.7s) | 3071.43 MB/s | **Linux** | Program | Format | Running Time (TREE_GRM_ESTN) | Throughput (TREE_GRM_ESTN) | Running Time (all_train) | Throughput (all_train) | | -- | -- | -- | -- | -- | -- | | mawk | CSV | NA | NA | 10.4s (9.3s + 1.1s) | 498.07 MB/s | | mawk | TSV | 54.9s (53.0s + 1.9s) | 143.74 MB/s | 10.5s (9.1s + 1.4s) | 491.41 MB/s | | gawk | CSV | NA | NA | 11.5s (10.2s + 1.3s) | 451.24 MB/s | | gawk | TSV | 23.1s (21.7s + 1.4s) | 341.23 MB/s | 11.4s (10.4s + 0.9s) | 456.21 MB/s | | tsv-utils | TSV | 7.5s (6.3s + 1.2s) | 1047.75 MB/s | 3.6s (2.9s + 0.7s) | 1430.11 MB/s | | frawk (llvm) | CSV | 26.0s (24.3s + 1.8s) | 343.79 MB/s | 5.4s (4.3s + 1.0s) | 960.04 MB/s | | frawk (llvm) | TSV | 14.7s (13.2s + 1.5s) | 536.59 MB/s | 4.6s (3.8s + 0.9s) | 1119.42 MB/s | | frawk (llvm, parallel) | CSV | 9.9s (34.1s + 4.6s) | 899.67 MB/s | 4.0s (9.0s + 2.5s) | 1307.69 MB/s | | frawk (llvm, parallel) | TSV | 8.1s (21.0s + 3.0s) | 978.75 MB/s | 3.5s (8.3s + 2.0s) | 1462.83 MB/s | | frawk (cranelift) | CSV | 26.0s (24.4s + 1.6s) | 344.47 MB/s | 5.3s (4.4s + 0.9s) | 973.02 MB/s | | frawk (cranelift) | TSV | 15.3s (13.8s + 1.4s) | 516.87 MB/s | 4.8s (3.9s + 0.9s) | 1085.85 MB/s | | frawk (cranelift, parallel) | CSV | 9.8s (33.7s + 4.4s) | 913.45 MB/s | 4.2s (8.7s + 2.2s) | 1233.25 MB/s | | frawk (cranelift, parallel) | TSV | 8.4s (25.1s + 5.3s) | 943.75 MB/s | 3.8s (8.6s + 2.6s) | 1355.61 MB/s | ## Statistics _Collect the sum, mean, minimum, maximum, minimum length, maximum length and standard deviation of a numeric column, collect the maximum, minimum, maximum length and minimum length of a string column_. As in the "sum" benchmark, frawk and Awk have the same programs, but with frawk using the `icsv` and `itsv` options. This benchmark is meant to mirror `xsv`'s `summarize` command. The xsv invocation is `xsv summarize -s5,6 [-d\t]`. The Awk program is more involved: ```awk function min(x,y) { return x 1 { # f2 is numeric, f1 is a string f2=$5+0; f2Len = length($5); f1=$6; f1Len = length($6); if (NR==2) { min1=max1=f1; min2=max2=f2; min1L=max1L=f1Len; min2L=max2L=f2Len; } else { min1 = min(min1, f1) min2 = min(min2, f2) min1L = min(min1L, f1Len) min2L = min(min2L, f2Len) max1 = max(max1, f1) max2 = max(max2, f2) max1L = max(max1L, f1Len) max2L = max(max2L, f2Len) } step_sum(f2); step_stddev(f2, NR-1); } END { N=NR-1 # account for header print "field","sum","min","max","min_length","max_length","mean","stddev" print h2,SUM,min2,max2,min2L,max2L,(SUM/N), sqrt(Q/(N-1)) print h1,"NA",min1,max1,min1L,max1L,"NA","NA" } ``` I had to look up the algorithm for computing a running sample standard deviation on Wikipedia. If this is all I wanted to compute, I'd probably go with xsv. This script will not parallelize automatically: to get the correct answer we will need to (a) manually aggregate the min and max values, as they are not sums and (b) apply an algorithm (again, from Wikipedia) to combine the subcomputations of the empirical variance. With that we get the formidable: ```awk function min(x,y) { return x Note: We report parallel numbers for frawk here, but running the `select` > script in parallel mode does not preserve the original ordering of the rows in > the input file. While the ordering of some data-sets are not meaningful, the > comparison is not apples-to-apples. frawk performs this task slower than tsv-utils and faster than the other benchmark programs. While the gap between frawk in parallel mode and tsv-utils is probably in the noise on MacOS (it's still pretty large for the Linux configuration), tsv-utils unquestionably performs better per core than frawk, while preserving the input's row ordering. **MacOS** | Program | Format | Running Time | Throughput | | -- | -- | -- | -- | | gawk | CSV | 7.9s (7.0s + 0.9s) | 657.00 MB/s | | gawk | TSV | 7.9s (7.1s + 0.9s) | 654.26 MB/s | | mawk | CSV | 8.4s (7.4s + 1.0s) | 612.83 MB/s | | mawk | TSV | 8.4s (7.4s + 1.0s) | 616.85 MB/s | | frawk (cranelift) | CSV | 3.8s (3.3s + 0.7s) | 1358.81 MB/s | | frawk (cranelift) | TSV | 3.3s (2.8s + 0.7s) | 1569.70 MB/s | | frawk (cranelift, parallel) | CSV | 1.9s (5.1s + 0.7s) | 2745.72 MB/s | | frawk (cranelift, parallel) | TSV | 1.7s (4.6s + 0.7s) | 2972.70 MB/s | | frawk (llvm) | CSV | 3.8s (3.2s + 0.7s) | 1373.59 MB/s | | frawk (llvm) | TSV | 3.2s (2.7s + 0.7s) | 1598.78 MB/s | | frawk (llvm, parallel) | CSV | 1.9s (5.1s + 0.7s) | 2678.96 MB/s | | frawk (llvm, parallel) | TSV | 1.8s (4.6s + 0.7s) | 2930.64 MB/s | | tsv-utils | TSV | 2.0s (1.6s + 0.4s) | 2589.22 MB/s | | xsv | CSV | 5.4s (4.8s + 0.6s) | 961.28 MB/s | | xsv | TSV | 5.3s (4.7s + 0.6s) | 970.65 MB/s | **Linux** | Program | Format | Running Time | Throughput | | -- | -- | -- | -- | | gawk | CSV | 9.4s (8.3s + 1.1s) | 550.72 MB/s | | gawk | TSV | 9.4s (8.4s + 1.0s) | 551.01 MB/s | | mawk | CSV | 8.3s (7.0s + 1.3s) | 621.14 MB/s | | mawk | TSV | 8.1s (7.0s + 1.1s) | 638.68 MB/s | | frawk (cranelift) | CSV | 5.5s (4.8s + 1.0s) | 942.39 MB/s | | frawk (cranelift) | TSV | 4.7s (4.0s + 1.0s) | 1100.39 MB/s | | frawk (cranelift, parallel) | CSV | 4.0s (9.5s + 2.3s) | 1294.29 MB/s | | frawk (cranelift, parallel) | TSV | 3.5s (8.5s + 2.0s) | 1476.18 MB/s | | frawk (llvm) | CSV | 5.3s (4.6s + 1.0s) | 978.36 MB/s | | frawk (llvm) | TSV | 4.6s (4.0s + 0.9s) | 1115.08 MB/s | | frawk (llvm, parallel) | CSV | 4.1s (9.4s + 2.5s) | 1259.35 MB/s | | frawk (llvm, parallel) | TSV | 3.7s (8.8s + 2.2s) | 1406.04 MB/s | | tsv-utils | TSV | 2.9s (2.0s + 0.9s) | 1756.00 MB/s | | xsv | CSV | 8.1s (7.2s + 0.9s) | 642.73 MB/s | | xsv | TSV | 8.0s (7.2s + 0.8s) | 646.66 MB/s | ## Filter _Print out all records from the `all_train` dataset matching a simple numeric filter on two of the columns_ The Awk script computing this filter is `$4 > 0.000024 && $16 > 0.3`. Because `all_train` has no quoted fields, mawk and gawk can both run using the `-F,` and `-F\t` options. As before, frawk runs the same script with the `icsv` and `itsv` options. The tsv-utils invocation is `tsv-filter -H --gt 4:0.000025 --gt 16:0.3 ./all_train.tsv` (taken from the same benchmark on the tsv-utils repo). xsv does not support numeric filters, so it was omitted from this benchmark. > Note: We report parallel numbers for frawk here, but the same caveat > highlighted in the `select` benchmark holds here. The comparison is not > apples-to-apples because frawk will not preserve the input's ordering of the > rows when run in parallel mode. Again, frawk is slower than tsv-utils and faster than everything else when running serially. In parallel, frawk is slightly faster than tsv-utils in terms of wall time on MacOS, and a still good deal slower in the Linux configuration. **MacOS** | Program | Format | Running Time | Throughput | | -- | -- | -- | -- | | gawk | CSV | 8.2s (7.3s + 0.9s) | 634.61 MB/s | | gawk | TSV | 8.2s (7.4s + 0.9s) | 630.52 MB/s | | mawk | CSV | 10.1s (9.1s + 1.0s) | 511.25 MB/s | | mawk | TSV | 10.0s (9.0s + 1.0s) | 517.69 MB/s | | frawk (cranelift) | CSV | 3.7s (3.2s + 0.7s) | 1390.93 MB/s | | frawk (cranelift) | TSV | 3.1s (2.6s + 0.7s) | 1670.46 MB/s | | frawk (cranelift, parallel) | CSV | 1.9s (5.2s + 0.8s) | 2684.52 MB/s | | frawk (cranelift, parallel) | TSV | 1.7s (4.7s + 0.7s) | 2969.29 MB/s | | frawk (llvm) | CSV | 3.6s (3.1s + 0.7s) | 1422.65 MB/s | | frawk (llvm) | TSV | 3.1s (2.6s + 0.7s) | 1684.04 MB/s | | frawk (llvm, parallel) | CSV | 1.9s (5.2s + 0.8s) | 2670.67 MB/s | | frawk (llvm, parallel) | TSV | 1.8s (4.8s + 0.8s) | 2873.72 MB/s | | tsv-utils | TSV | 2.3s (1.9s + 0.4s) | 2213.95 MB/s | **Linux** | Program | Format | Running Time | Throughput | | -- | -- | -- | -- | | gawk | CSV | 9.4s (8.4s + 1.0s) | 551.01 MB/s | | gawk | TSV | 9.4s (8.4s + 0.9s) | 553.78 MB/s | | mawk | CSV | 10.1s (8.9s + 1.2s) | 512.67 MB/s | | mawk | TSV | 10.2s (9.1s + 1.1s) | 507.39 MB/s | | frawk (cranelift) | CSV | 5.3s (4.7s + 1.0s) | 980.39 MB/s | | frawk (cranelift) | TSV | 4.8s (3.8s + 1.3s) | 1083.13 MB/s | | frawk (cranelift, parallel) | CSV | 4.3s (9.5s + 2.8s) | 1204.29 MB/s | | frawk (cranelift, parallel) | TSV | 3.6s (8.5s + 2.3s) | 1432.09 MB/s | | frawk (llvm) | CSV | 5.2s (4.5s + 1.0s) | 1002.21 MB/s | | frawk (llvm) | TSV | 4.5s (4.0s + 0.9s) | 1146.69 MB/s | | frawk (llvm, parallel) | CSV | 4.4s (9.6s + 2.6s) | 1180.40 MB/s | | frawk (llvm, parallel) | TSV | 3.9s (8.8s + 2.3s) | 1337.75 MB/s | | tsv-utils | TSV | 3.4s (2.7s + 0.7s) | 1532.53 MB/s | ## Group By Key _Print the mean of field 2 grouped by the value in field 6 for TREE_GRM_ESTN_ The tsv-utils command was `tsv-summarize -H --group-by 6 --mean 2`. The Awk script, with the usual settings for gawk, mawk, and frawk, reads: ```awk BEGIN { getline; } { N[$6]++; SUM[$6]+=$2; } END { OFS="\t" for (k in N) { print k, ((SUM[k])/N[k]); } } ``` This is a workload where gawk and frawk are very close in terms of single-threaded performance. Neither program is nearly as fast as TSV-utils though. Even if parallel frawk is able to catch up on MacOS, my guess is that there are serious opportunitities for optimization of arrays here, though part of the slowdown may be due to tsv-utils's superious handling of 'wide rows' where we only read a small number of columns. **MacOS** | Program | Format | Running Time | Throughput | | -- | -- | -- | -- | | gawk | TSV | 14.8s (13.5s + 1.3s) | 534.48 MB/s | | mawk | TSV | 42.1s (40.6s + 1.5s) | 187.61 MB/s | | frawk (cranelift) | CSV | 22.1s (21.0s + 1.1s) | 405.69 MB/s | | frawk (cranelift) | TSV | 15.1s (14.1s + 1.0s) | 523.84 MB/s | | frawk (cranelift, parallel) | CSV | 5.3s (29.5s + 1.3s) | 1701.62 MB/s | | frawk (cranelift, parallel) | TSV | 3.5s (16.6s + 1.0s) | 2225.50 MB/s | | frawk (llvm) | CSV | 21.8s (20.7s + 1.1s) | 410.47 MB/s | | frawk (llvm) | TSV | 14.8s (13.8s + 1.0s) | 534.66 MB/s | | frawk (llvm, parallel) | CSV | 5.2s (29.0s + 1.3s) | 1720.27 MB/s | | frawk (llvm, parallel) | TSV | 3.6s (16.5s + 1.0s) | 2221.74 MB/s | | tsv-utils | TSV | 4.9s (4.3s + 0.6s) | 1614.16 MB/s | **Linux** | Program | Format | Running Time | Throughput | | -- | -- | -- | -- | | gawk | TSV | 23.5s (21.9s + 1.6s) | 335.70 MB/s | | mawk | TSV | 50.6s (48.5s + 2.1s) | 155.86 MB/s | | frawk (cranelift) | CSV | 32.4s (30.5s + 1.9s) | 276.18 MB/s | | frawk (cranelift) | TSV | 22.9s (21.1s + 1.8s) | 343.91 MB/s | | frawk (cranelift, parallel) | CSV | 10.2s (41.5s + 5.5s) | 881.24 MB/s | | frawk (cranelift, parallel) | TSV | 7.9s (32.0s + 5.0s) | 1001.35 MB/s | | frawk (llvm) | CSV | 31.8s (30.1s + 1.7s) | 281.08 MB/s | | frawk (llvm) | TSV | 22.1s (20.5s + 1.6s) | 356.80 MB/s | | frawk (llvm, parallel) | CSV | 10.4s (42.5s + 6.0s) | 859.72 MB/s | | frawk (llvm, parallel) | TSV | 7.5s (27.3s + 2.4s) | 1054.19 MB/s | | tsv-utils | TSV | 5.9s (4.6s + 1.3s) | 1333.27 MB/s |