Intended security properties ============================ - Indistinguishability from random noise (ie, IND$-CPA): An encrypted file is effectively indistinguishable from random noise of the same length. - This is achieved by having all data output by the encryption process be either random data or the output of a stream cipher. - Committing authenticity: If a file successfully decrypts to a plaintext `PT` using some password `PW`, you can conclude that only `PW` will decrypt that file, that file will only decrypt to `PT`, and someone who knew `PW` created it. - This is achieved by (a) deriving the HMAC and encryption keys via collision-resistant hash functions (`argon2i` and `blake2b`) (b) Authenticating with an HMAC over the whole file. - Any violation of this property _should_ yield a collision in either `argon2i` or `blake2b`. - A related secondary property that `scromble` aims for is that it only outputs any data if that data can be guaranteed to come from the HMAC-authenticated file, even if someone changes the underlying file while `scromble` is reading it. This is achieved via the [HMAC prefix table](#hmac-prefix-table). - Moderate length obfuscation: The exact length of the underlying file is difficult to tell given the length of the encrypted file. - This is the most dubious property, and `scromble` should not be relied upon if the exact size of your data is a _critical_ piece of information to hide. - Scromble randomly pads your file (up to 100% for small files, up to 20% for large files, see [Maximum Pad Scale](#maximum-pad-scale)) but can be given a custom random padding range if need be. - When decrypting, scromble doesn't output the padding data, so an adversary can observe how big the file if you let them observe the time or file system traffic totals of decryption. This kind of problem is basically unavoidable without implementing Oblivious RAM. - This helps somewhat when you have certain very specific file sizes, eg: - 41 byte files (git branch files, a 40-byte hex string + a newline) - exact-power-of-2 files - 1032 byte files (on my system, many wine DLLs are this size) The core spec ============= Cryptographic RNG ----------------- The cryptographic RNG used is `rand_chacha`'s `ChaCha20Rng`, initialized via `from_entropy()` (see https://rust-random.github.io/rand/rand_chacha/struct.ChaCha20Rng.html and https://rust-random.github.io/rand/rand_core/trait.SeedableRng.html#method.from_entropy for more information). Notation -------- `a||b` denotes the concatenation of byte sequences `a` and `b`. `0^n` denotes the byte sequence consisting of `n` zero bytes. Argon2 usage ------------ `Argon2` is used in two modes: `argon2i`, and `argon2id`. `argon2i` is configured to use `2^14` blocks (i.e. 16MB) of memory (`M = 1<<14`), one pass (`T = 1`), a parallelism factor of 2 (`P = 2`). `argon2id` is configured with `M = 1<<13`, `T = 2`, `P = 2`. The output size in both cases is 64 bytes. For rationale, see [Argon2 and the Root Key](#argon2-and-the-root-key). BLAKE2b usage ------------- `mac2b(key,data)` is the result of running BLAKE2b with personalization string "sCrOmB2AuThEnTiC", output length 64 bytes, and key `key` on data `data`. `key2b(L,key,data)` is the result of running BLAKE2b with personalization string "sCrOmB2EnCrYpToR", output length `L` bytes, and key `key` on data `data`. In all cases, `L = 32` or `L = 64`. XChaCha20 usage --------------- `xchacha20(key,nonce,data)` is the result of running XChaCha20 with 256-bit key `key` and 192-bit nonce `nonce` on the data stream `data`. NOTE: `xchacha20(key,nonce,xchacha20(key,nonce,data)) == data`. Salt block ---------- The Salt block `SB` is 64 bytes of data generated from a cryptographic RNG. Root key derivation ------------------- The root key `RK` is derived by RK_i := argon2i( password,key2b(64,SB,"argon2i")) RK_id := argon2id(password,key2b(64,SB,"argon2id")) RK := key2b(64,0,"root"||RK_i||RK_id) Nonce block and nonce --------------------- The nonce block `NB` is 64 bytes of data generated from a cryptographic RNG. The first 24 bytes (ie, 192 bits) of `NB` are the nonce `NONCE`. Subkeys ------- The HMAC key is derived by `HK := key2b(64,RK,"hmac")` The encryption key is derived by `EK := key2b(32,RK,"encrypt")` Plaintext --------- The underlying file is treated as a sequence of bytes `PT` with a 64-bit length (measured in bytes) denoted `length(PT)`. The length can be encoded as a little-endian 8-byte sequence `le_length(PT)`. Maximum Pad Scale ----------------- The maximum pad scale `MAXPAD` used when encrypting can be manually set with `--pad-factor`. If it is not manually set, it is chosen as follows: - `1.0`, if `length(PT) <= 2048` - `1.0 - 0.8*(length(PT)-2048)/(65536 - 2048)`, if `2048 < length(PT) <= 65536` - `0.2`, if `length(PT) > 65536` ie, `MAXPAD` starts at `1.0`, linearly goes down to `0.2` over `[2^12,2^16]` Pad length ---------- The pad length `PADLEN` is some uniformly randomly chosen value between `0` and `MAXPAD*max(64,length(PT))`, ie: - if `length(PT) < 64`, `PADLEN` is sampled from `[0,MAXPAD*64]` - otherwise, `PADLEN` is sampled from `[0,MAXPAD*length(PT)]` Padded Plaintext ---------------- The padded plaintext `PPT` of some plaintext data stream `PT` is: PPT = (PT||0^PADLEN||le_length(PT)) Where `0^PADLEN` denotes a sequence of `PADLEN` zero bytes, and . Core Ciphertext --------------- The core ciphertext `CC` is `NB||xchacha20(EK,NONCE,PPT)` MAC Block --------- The MAC block `MB` is 64 bytes of data formed by running `mac2b(HK,SB||CC)`. Non-shared file format ---------------------- A non-shared encrypted file is `SB||CC||MB`. Non-shared encryption procedure =============================== Using the definitions above, encryption of a plaintext file `PT` is then: - Read the password `password`. - Generate `SB` and `NB`. - Calculate the root key from `password` and `SB`. - Calculate `HK` and `EK` from `RK`. - Calculate `MAXPAD` from `length(PT)` and generate `PADLEN`. - Generate the core ciphertext `CC` from this data and the plaintext `PT`. - Calculate the MAC block `MB` from `HK` and `SB||CC` - Output the encrypted file. The actual implementation does this in a streaming manner. Non-shared decryption procedure =============================== Decryption uses most of the same building blocks as encryption, but there are a few extra principles decryption must follow: - Under no circumstance will we do anything with the ciphertext other than MAC checking before the MAC has been checked. For some reasonable commentary about why this is an important design principle, see [this blog post] (https://moxie.org/2011/12/13/the-cryptographic-doom-principle.html). - No data inconsistent with the MAC will be output to stdout. - No properties of the underlying data will be checked by `scromble`. The second principle requires a secondary data structure I refer to as the "HMAC prefix table". See the [HMAC prefix table](#hmac-prefix-table) section for more details. Using the definitions above, decryption of a ciphertext file is then: Pass 1: - Read the password `password`. - Read `SB`. - Calculate the root key from `password` and `SB`. - Calculate `HK` from `RK`. - Read the file as `SB||CC||MB`, build the HMAC prefix table, save the last 8 bytes of `CC` as `enc_supposed_len`, and check that `length(CC) >= 8` and `MB == mac2b(HK,SB||CC)`. If either check fails, exit with an error. From this point on, all operations involving data read from the file will be checked against the HMAC prefix table before any further processing is done on them. See [HMAC prefix table](#hmac-prefix-table). Pass 2: - Calculate `EK` from `RK`. - Read `SB` and `NB`, and parse `NONCE` out of `NB`. - Calculate `supposed_len` by decrypting `enc_suppposed_len` with `EK` and `NONCE` and interpreting it as a 64-bit little endian integer. This decryption takes advantage of the fact that `XChaCha20` is seekable. - Calculate `length(PT) := min(supposed_len,length(CC)-8)`. - Read `CC`, checking against the HMAC prefix table before doing any processing. If any check fails, exit with an error. - Output the first `length(PT)` bytes of `xchacha20(EK,NONCE,CC)`. - Finally, check that the file ends with `MB`. If not, exit with an error. HMAC prefix table ================= Once `scromble` has read the file and checked its HMAC, it must read through the file a second time in order to actually decrypt the file. However, this leads to a potential issue: what if the file changes between the HMAC check and when scromble reads some particular piece of the file? Then, potentially arbitrary data will get through, even though the user believes the data has been certified as authentic. This does not seem like an easily exploitable flaw, but who knows what crazy situation someone might be using this tool in? It's better to build a tool with rock solid correctness properties than to brush a clear issue aside with "that seems too hard to exploit". `scromble` builds an in-memory data structure during its first pass so that every data read on the second pass can be checked for authenticity, at the cost of some extra memory usage (at most `sqrt(N)*17` bytes of memory usage for an N-byte file). The HMAC prefix table consists of a block size `block_size` and a sequence of "prefix HMACs" `hmac_prefs`. For each `i`, `hmac_prefs[i] == mac2b(HK,data[..(i*block_size)])`. During the HMAC-check pass, the current HMAC result will be inserted at the end of `hmac_prefs` every `block_size` bytes. During the decryption pass, data is buffered until `block_size` bytes have been processed and the HMAC prefix containing those bytes is checked. To minimize the total memory overhead from the HMAC prefix table and the buffer required to check against it, `chunk_size` gets updated incrementally during the HMAC-check pass. If `hmac_prefs.len()*HMAC_LEN >= 2*block_size` (here `HMAC_LEN == 64`), then `block_size` gets updated to `block_size*2` and every even entry in `hmac_prefs` gets deleted -- ie: new_block_size == block_size*2; new_hmac_prefs.len() == hmac_prefs.len()/2; for each i such that 0 <= i and 2*i+1 < hmac_prefs.len(): new_hmac_prefs[i] == hmac_prefs[2*i+1]; This scheme approximates the memory-usage optimal HMAC prefix table size, which is the solution to `(file_size/block_size)*HMAC_LEN == block_size` -- ie, `block_size == sqrt(file_size*HMAC_LEN)`. The computational cost required to double the block size is `O(1)` amortized (ie, every `2^n`th step requires an extra `O(2^n)` work), and the worst-case memory required is `block_size + (file_size/block_size)*HMAC_LEN`. The memory usage is highest when block_size + (file_size/block_size)*HMAC_LEN == 2*block_size + (file_size/(2*block_size))*HMAC_LEN i.e., block_size*block_size == 32*file_size block_size == 4*sqrt(2)*sqrt(file_size) total_mem <= block_size + (file_size/block_size)*HMAC_LEN total_mem <= block_size + 2*block_size total_mem <= 3*4*sqrt(2)*sqrt(file_size) total_mem <= (12*sqrt(2))*sqrt(file_size) Running some numbers, this means that for this data structure, decrypting a: - 1 mebibyte file requires <=17KiB of memory - 1 gibibyte file requires <=544KiB of memory - 1 tebibyte file requires <=16MiB of memory - 1 pebibyte file requires <=544MiB of memory - 1 exbibyte file requires <=17GiB of memory Which seems like pretty reasonable memory overhead to me. NOTE: for speed, `scromble` double-buffers the HMAC prefix table's read buffer, so the actual memory usage is more like `4*block_size`, ie, `(4/3)*total_mem`. So if you want to decrypt a 1 exbibyte file, you'll need to use a computer with ~23GiB of memory. Argon2 and the Root Key ======================= `scromble`'s usage of `Argon2` is not, to the author's knowledge, in compliance with the recommendations of RFC9106 (https://datatracker.ietf.org/doc/html/rfc9106), but there are several contributing reasons for that. Security considerations ----------------------- There are two distinct threats `scromble`'s key derivation is susceptible to. First, an encrypted file may be attacked in an offline fashion by an adversary with large amounts of computing resources. This threat is best mitigated by using a memory-hard KDF, since it is commonly believed that the limiting factor of large-scale compute is memory bandwidth. For the purposes of memory-hardness, `Argon2d` appears to be far better than `Argon2i` -- multiple works have found significant memory-usage savings in versions of `Argon2i` (https://eprint.iacr.org/2016/027.pdf, https://eprint.iacr.org/2016/759.pdf), while `Argon2d` does not have any known such attacks. Second, an adversary with access to fine-grained timing data might discern information about either the password or the derived key. `Argon2d` accesses memory in a data-dependent way, and thus is potentially vulnerable to cache timing side-channel analysis. Such analysis has not been reported, but cache timing attacks were also a purely theoretical problem for AES for many years before the exploit was demonstrated. `Argon2id` is a hybrid algorithm, which does data-independent `Argon2i` during an initial phase, then switches to `Argon2d`. This prevents timing side channel attacks from learning a significant amount about the _input_ of the KDF, but there is no clear reason to believe that side channel analysis cannot reveal the _output_ of the KDF, which is at least as important. In light of that, this recommendation from RFC9106, which was presented without any justification, seems patently absurd: > If you do not know the difference between the types or you consider > side-channel attacks to be a viable threat, choose Argon2id. To address both offline attacks and side-channel analysis, `scromble` uses both `Argon2id` and `Argon2i`, then mixes them to derive the final root key. A side-channel analysis will only be able to attack the `Argon2d` component of `Argon2id`, and the offline attack will only have a low-memory attack against the `Argon2i` portion. Additionally, RFC9106 recommends 128-bit keys and salts. `scromble` aims for a classical security level of 256 bits, so this is not acceptable. Argon2 parameter choices ------------------------ In order to maximize the benefits of the `Argon2d` phase of `Argon2id`, a minimum value of `T = 2` is used for `Argon2id`. Per RFC9106, a minimum value of `T = 1` is used for `Argon2i`. In order to keep `scromble` reasonably performant on low-end devices, a maximum value of `P = 2` is chosen for both. Since `scromble`'s maximum speed is ~200MB/s, a target time of ~0.1s at ~1500MHz was chosen, so key derivation will be less than half the time spent for any file above 20MB. Setting `M = 1<<14` for `Argon2i` and `M = 1<<13` for `Argon2id` takes roughly 0.092s total at 1.5GHz on my machine. After turning off throttling (max frequency of ~4.1GHz), the time with those settings is ~0.033s.