# ML-KEM Intermediate values https://c2sp.org/CCTV/ML-KEM This directory collects resources for testing (and developing) ML-KEM implementations, as specified in FIPS 203. In particular, it provides: * Intermediate values for testing and debugging each intermediate step and partial algorithm. * Negative test vectors for invalid encapsulation keys. * "Unlucky" vectors that require an unusually large number of XOF reads. * Vectors that fail if `strcmp()` is used in ML-KEM.Decaps. * Accumulated vectors (derived from the reference pq-crystals implementation) for testing randomly reachable edge cases without checking in large amounts of data, including an extended run of one million tests. * References to other test vectors. All test vectors are made available under the terms of the [CC0 1.0](http://creativecommons.org/publicdomain/zero/1.0). Implementers might also be interested in ["Enough Polynomials and Linear Algebra to Implement Kyber"](https://words.filippo.io/kyber-math/). ### Changes from the FIPS 203 draft Like the [official intermediate values][NIST vectors] from October 2023, all the vectors in this directory implement the following two changes: 1. The order of the input i and j to the XOF at step 6 in Algorithm 12 (K-PKE.KeyGen) is switched. 2. The order of the input i and j to the XOF at step 6 in Algorithm 13 (K-PKE.Encrypt) is switched. This reverts [an unintentional change][pqc-forum discussion] that will probably be reverted in the final document and makes K-PKE consistent with Kyber round 3. [NIST vectors]: https://csrc.nist.gov/Projects/post-quantum-cryptography/post-quantum-cryptography-standardization/example-files [pqc-forum discussion]: https://groups.google.com/a/list.nist.gov/g/pqc-forum/c/s-C-zIAeKfE/m/eZJmXYsSAQAJ ## Intermediate values The files in the `intermediate/` folder provide vectors for developing, debugging, and testing ML-KEM step-by-step. Each file lists every intermediate value of the ML-KEM.KeyGen, K-PKE.KeyGen, ML-KEM.Encaps, K-PKE.Encrypt, ML-KEM.Decaps, and K-PKE.Decrypt algorithms, all executed on the same set of keys and messages. Byte strings are encoded in hex. Polynomials, NTT representatives, vectors, and matrixes are encoded with ByteEncode12 and then in hex. Some polynomials are also presented as an array of decimal coefficients to aid in the implementation of ByteEncode, NTT, and Compress. Where values appear multiple times across algorithms, they are not repeated in the test files. uᵈ and vᵈ are the u and v values from K-PKE.Decrypt, after they went through a Compress/Decompress cycle. (Props to the spec for maintaining a consistent lexical scope across algorithms! The one exception is that r is reused for the 32-byte K-PKE.Encrypt input and for the vector of polynomials sampled from it. The two are easily distinguished.) ## Bad encapsulation keys Section 6.2 of FIPS 203 ipd (ML-KEM Encapsulation) requires input validation on the encapsulation key, checking that all encoded polynomial coefficients are reduced modulo the field prime (the "*Modulus check*"). The files in the `modulus/` folder provide invalid ML-KEM.Encaps inputs, hex-encoded, one per line. Every value in the range q to 2¹²-1 and every position in the key is tested individually. The vectors share most of the coefficients so that they compress from 1–3 MiB down to 12–28 KiB. ## Unlucky NTT sampling vector The SampleNTT algorithm reads a variable number of bytes from an Extendable Output Function to perform rejection sampling. The files in the `unlucky/` folder provide test vectors that cause many more rejections than usual. In particular, these vectors require reading more than 575 bytes from the SHAKE-128 XOF in SampleNTT, which would ordinarily happen [with probability 2⁻³⁸](https://www.wolframalpha.com/input?i=binomcdf%28384%2C+3329%2F4096%2C+255%29). Note that these vectors can be run through a regular deterministic ML-KEM testing API (i.e. one that injects the `d`, `z`, `m` random values) since they were bruteforced at the level of the `d` value. If for some reason an implementation needs to draw a fixed amount of bytes from the XOF, at least 704 bytes are necessary for [a negligible probability (~ 2⁻¹²⁸)](https://www.wolframalpha.com/input?i=binomcdf%28469%2C+3329%2F4096%2C+255%29) of failure. ## `strcmp` vectors In ML-KEM.Decaps the ciphertext is compared with the output of K-PKE.Encrypt for implicit rejection. If an implementation were to use `strcmp()` for that comparison it would fail to reject some ciphertexts if a zero byte terminates the comparison early. The files in the `strcmp/` folder provide test vectors that exercise this edge case. The chance of it occurring randomly is 2⁻¹⁶, and it is not covered by the pq-crystals vectors. ## Accumulated pq-crystals vectors The `ref/test/test_vectors.c` program in the *standard* branch of github.com/pq-crystals/kyber produces 10 000 randomly generated tests. Thanks to the limited range of fundamental integer types (at most 0–4096), this is sufficient to hit a lot of edge cases that don't need to be deliberately targeted with specific test vectors. The output of the three `test_vectors.c` programs amounts to 300MB. Instead of checking in such a large amount of data, or running a binary as part of testing, implementations can generate the test inputs from the deterministic RNG, and check that the test outputs hash to the expected value. The input format, output format, and output hash are provided below. The deterministic RNG is a single SHAKE-128 instance with an empty input. (The RNG stream starts with `7f9c2ba4e88f827d616045507605853e`.) For each test, the following values are drawn from the RNG in order: * `d` for K-PKE.KeyGen * `z` for ML-KEM.KeyGen * `m` for ML-KEM.Encaps * `ct` as an invalid ciphertext input to ML-KEM.Decaps Then, the following values are written to a running SHAKE-128 instance in order: * `ek` from ML-KEM.KeyGen * `dk` from ML-KEM.KeyGen * `ct` from ML-KEM.Encaps * `k` from ML-KEM.Encaps (which should be checked to match the output of ML-KEM.Decaps when provided with the correct `ct`) * `k` from ML-KEM.Decaps when provided with the random `ct` The resulting hashes for 10 000 consecutive tests are: * ML-KEM-512: `845913ea5a308b803c764a9ed8e9d814ca1fd9c82ba43c7b1e64b79c7a6ec8e4` * ML-KEM-768: `f7db260e1137a742e05fe0db9525012812b004d29040a5b606aad3d134b548d3` * ML-KEM-1024: `47ac888fe61544efc0518f46094b4f8a600965fc89822acb06dc7169d24f3543` The resulting hashes for 1 000 000 consecutive tests are: * ML-KEM-512: `578eeaa1156848cbf7a15bafef963b4ccabe3308ddfb7dbdd20ad965f634e81d` * ML-KEM-768: `70090cc5842aad0ec43d5042c783fae9bc320c047b5dafcb6e134821db02384d` * ML-KEM-1024: `7ccc6d803739d3db3c5ce39c7130f459db32a199c6605e3be210e5a89d4c4b95` ## Other Known Answer Tests The following vectors also target FIPS 203 ipd with the  fix described above. * [NIST's Intermediate Values](https://csrc.nist.gov/Projects/post-quantum-cryptography/post-quantum-cryptography-standardization/example-files) * Random values (such as d, z, and m) are equal. This is not spec compliant. * [pq-crystals](https://github.com/pq-crystals/kyber), *standard* branch * `ref/test/test_vectors.c` generates 10 000 vectors randomly. * Accumulated vectors are available above. * [post-quantum-cryptography/KAT](https://github.com/post-quantum-cryptography/KAT/tree/main/MLKEM) * Each file contains 100 randomly generated vectors. The s2n-tls project includes [vectors](https://github.com/aws/s2n-tls/tree/a6517c5fe97b1aa1898f2233498613dd53735bd8/tests/unit/kats) for Kyber round 3 as well as some of the hybrid KEMs, including those used in the TLS draft.