# Benchmarking dd `dd` is a utility used for copying and converting files. It is often used for writing directly to devices, such as when writing an `.iso` file directly to a drive. ## Understanding dd At the core, `dd` has a simple loop of operation. It reads in `blocksize` bytes from an input, optionally performs a conversion on the bytes, and then writes `blocksize` bytes to an output file. In typical usage, the performance of `dd` is dominated by the speed at which it can read or write to the filesystem. For those scenarios it is best to optimize the blocksize for the performance of the devices being read/written to. Devices typically have an optimal block size that they work best at, so for maximum performance `dd` should be using a block size, or multiple of the block size, that the underlying devices prefer. For benchmarking `dd` itself we will use fast special files provided by the operating system that work out of RAM, `/dev/zero` and `/dev/null`. This reduces the time taken reading/writing files to a minimum and maximises the percentage time we spend in the `dd` tool itself, but care still needs to be taken to understand where we are benchmarking the `dd` tool and where we are just benchmarking memory performance. The main parameter to vary for a `dd` benchmark is the blocksize, but benchmarks testing the conversions that are supported by `dd` could also be interesting. `dd` has a convenient `count` argument, that will copy `count` blocks of data from the input to the output, which is useful for benchmarking. ## Blocksize Benchmarks When measuring the impact of blocksize on the throughput, we want to avoid testing the startup time of `dd`. `dd` itself will give a report on the throughput speed once complete, but it's probably better to use an external tool, such as `hyperfine` to measure the performance. Benchmarks should be sized so that they run for a handful of seconds at a minimum to avoid measuring the startup time unnecessarily. The total time will be roughly equivalent to the total bytes copied (`blocksize` x `count`). Some useful invocations for testing would be the following: ```shell hyperfine "./target/release/dd bs=4k count=1000000 < /dev/zero > /dev/null" hyperfine "./target/release/dd bs=1M count=20000 < /dev/zero > /dev/null" hyperfine "./target/release/dd bs=1G count=10 < /dev/zero > /dev/null" ``` Choosing what to benchmark depends greatly on what you want to measure. Typically you would choose a small blocksize for measuring the performance of `dd`, as that would maximize the overhead introduced by the `dd` tool. `dd` typically does some set amount of work per block which only depends on the size of the block if conversions are used. As an example, made a change to reuse the same buffer between block copies, avoiding the need to reallocate a new block of memory for each copy. The impact of that change mostly had an impact on large block size copies because those are the circumstances where the memory performance dominated the total performance.