% copyright 2015 David Roundy daveroundy@gmail.com % All rights reserved. You have permission to distribute this paper % without modification in either source code format, or in PDF format. \documentclass[letterpaper,twocolumn,amsmath,amssymb,pre,aps,10pt]{revtex4-1} \usepackage{graphicx} \usepackage{listings} \begin{document} \title{The Onion Salt Layered Encryption Scheme} \author{David Roundy} \affiliation{Department of Physics, Oregon State University, Corvallis, OR 97331} \begin{abstract} In this paper, I introduce a layered ``onion'' encryption scheme based on the NaCl library. The onion salt protocol uses a slightly modified NaCl decryption for each layer, with a padding scheme that ensures that no intermediate agent can determine their own position in the sequence. Nevertheless, each agent is able to ensure that the message was transmitted without modification of any routing information, although the payload intended for the recipient is only authenticated by the recipient. The protocol allows for a round-trip onion with the recipient including a response to the message, with no router able to discern either who originated the messare or who was the recipient, and with each router only knowing the addresses of the routers before and after them in the sequence. \end{abstract} \maketitle Onion routing was introduced in the nineties as a means to enable anonymity over the network~\cite{reed1998onionrouting}. Since then Tor has moved from a system using self-contained onions, to a telescoping set of channels~\cite{dingledine2004tor}. Nevertheless, I see value in a cryptographic primitive that enables the secure construction of cryptographic onions to enable anonymizing routing of short messages without the latency introduced by handshakes with each intermediate router, and without that state be shared between the routers and the sender. By requiring multiple packets sent back and forth, besides increasing latency, the use of sequential handshakes introduces opportunities for traffic analysis attacks. The use of shared state introduces an additional per-connection cost to each router, as well as the possibility of denial of service (DOS) attacks in which an attacker opens numerous connections. \newcommand\Nrouter{\ensuremath{N_{\text{router}}}} \section{Properties of onion salt} I introduce in this paper the \emph{onion salt} encryption scheme, which uses the Networking and Cryptography library (NaCl)~\cite{bernstein2009cryptography}, with one small modification, for its grunt work. This protocol creates an onion that is encrypted to $\Nrouter$ ``routers,'' one of which is the ``recipient.'' The onion contains plaintext designated for each party: routing information for each router, and in addition a secret message (``payload'') for the recipient. The original onion can only be decrypted by the first router, who thus obtains its routing information and a message designated for the next router. The recipient in general is not the final receiver of the message, but rather has the ability to attach a response payload to the message, which can be received and read by originator of the onion, provided the originator gives the final router his own address as the next router. The onion salt encryption scheme has the following properties. \begin{description} \item[Secrecy] No eavesdropper without access to a given recipient's secret key may determine the plaintext intended for that recipient, even after intercepting every transmitted message. \item[Authenticity] Each recipient can determine that the message received was not modified in any way, with the exception of the \emph{payload}, which is separately authenticated but only by the recipient (and the sender, in case of a response payload). \item[Anonymity] No recipient can determine from the content of the message they received the identity of any other party involved, except for the routing information they may have been provided. Of course, they most likely will also be able to identify who sent them the message by examining the transport used. \item[Ordering ambiguity] No router can determine their place in the sequence of onion layers. This last poses the challenge that motivates this work. The previous three properties can be achieved simply by nesting NaCl encryption. However, that would make each message larger than the previous one, making it easy for routers to establish a relative ordering between their messages. \end{description} In the following paragraphs I will explain the relevance of each of these properties. \emph{Secrecy} and \emph{authenticity} are at the core of any secure communication scheme. These are the two properties provided by the NaCl \verb!crypto_box! function which we use. However, we actually use a modified version, which does \emph{not} authenticate the payload content until its reception by its recipient. This is needed in order to enable the recipient to place information in the payload to be relayed on. \begin{figure*} \begin{center} \includegraphics[width=\textwidth]{decryption-0} \end{center} \caption{A diagram of the decryption process removing one layer of the onion---here the first layer. Blocks of memory are represented by rectangles, and as those blocks are encrypted they are filled with colored hash lines corresponding to each encryption applied, which are removed with each decryption. In step 1, the sender's public key is extracted, and the message is padded with zeros on the left and before the payload. In step 2, the message is decrypted, which at the same time encrypts the zero padding in the middle. Finally, once the routing information $R0$ has been read, the message is truncated to the same size as the original message, and is ready to be passed on.}\label{fig:decryption} \end{figure*} The authenticity property as we implement it eliminates any potential attackes that involve cleverly modifying an intercepted message (e.g. flipping a few specific bits) and observing the resulting behavior of its recipient, and ultimately compromising the secrecy of the message. A simpler approach would separately authenticate to each router just the routing information for that router. However, this would leave open an attack in which one compromised router communicates with a later compromised router in the sequence by modifying a few bits of its routing information, enabling it to identify packets that match. By having each router authenticate \emph{all} the routing information, an intermediate uncompromised router would refuse to relay on such a modified packet. The \emph{anonymity} of the sender is primarily ensured by generating random key pairs for each recipient and transmitting the generated public key along with the message. Because each key pair is used only once, although an eavesdropper can extract the public key from each message, that information cannot be correlated with the sender. In addition, because each key pair is only used once, a fixed nonce may be safely used. Naturally, the secrecy of each router's message is also necessary in order to ensure the sender's anonymity. Also note that my definition of anonymity explicitly made no claim regarding traffic analysis attacks. Naturally, if a user's threat model includes an observer of network patterns (which might, for instance, be performed by compromised routers), then additional care care---such as batching or random delays in transmission---should be used to protect against traffic analysis attacks. Note that the sender may choose whether to remain anonymous to the recipient. To remain anonymous, the sender would generate a random key pair for the payload, while to authenticate her identity, the sender would encrypt the payload itself using her known public key. Finally, \emph{ordering ambiguity} while not strictly necessary, provides an additional measure of protection of anonymity in the presence of malicious routers. Simple padding with random data would provide a limited ordering ambiguity that could prevent a network observer from determining ordering. However, if each router knew how much authenticated data remained, traffic analysis by compromised nodes could operate far more effectively when the ordering of messages can be determined or estimated, easily eliminating half of the false positives. In the extreme case---where latency and ordering are precisely known---a compromise of just a few routers in the sequence could effectively remove the anonymity of a message. This property presents a challenge, because each router must be able to believe that the entire routing portion of the onion consists of actual uncompromised data. Our solution is that each router must pad the data it sends on with pseudorandom padding that is known to the original sender, but not knowable to any other party. Then the sender can create authentication data for each recipient that authenticates not only the actual important content, but also the padding. \section{Decryption as a router} I will begin with the decryption algorithm for an ordinary router, illustrated in Fig.~\ref{fig:decryption}, which is simpler than encryption. Decryption consists of just three steps. \begin{enumerate} \item First the randomly-generated public key (which occupies the first 32 bytes) is read and removed from the message. The message is then padded at the beginning with 16 bytes of zeros (as required by NaCl), and is zero-padded just before the payload with 48 bytes plus the size of the routing information. \item The padded message is passed to a modified version of the NaCl function \verb!crypto_box_open! with a zero nonce, which decrypts the entire message, and authenticates all but the final payload portion of the message, in the process encrypting the zero padding that was inserted in the middle. We note that in practice we modify the TweetNaCl implementation of NaCl for simplicity~\cite{bernstein2014tweetnacl}. \item Finally, the decrypted routing information is read and stripped off, leaving a message of the same size as the original, padded with pseudorandom data in the center, and with the next randomly-generated public key at the beginning. \end{enumerate} One unusual feature of this scheme is the the \emph{ciphertext} is padded with zeros in its interior prior to decryption and authentication. Also unusual is that our authentication does not apply to the final portion of the message. \begin{figure*} \begin{center} \includegraphics[width=\textwidth]{encryption} \end{center} \caption{A diagram of the encryption process, for a three-layer onion. The steps are labelled by numbers in circles along the left-hand side. Blocks of memory are represented by rectangles, and as those blocks are encrypted they are given nested colored layers corresponding to each encryption applied. Steps 0-4 construct the padding needed for the innermost encryption, the core of the onion. Step 5 inserts the secret information into the core. Steps 6-10 encrypt that content in layers, along with routing information (the addresses $a_i$) at each outer layer of the onion. In addiiton, session public keys $P_i$ are included along with the ciphertext sent to each recipient. Finally, I note the authentication data $A_i$ at each level, which is the final contribution to the space overhead introduced with each layer.}\label{fig:algorithm} \end{figure*} \section{The encryption algorithm} The first step in creating an onion salt is to generate one ephemeral key pair for each recipient. These keys will be used for each encryption. A cartoon of the process is shown in Fig.~\ref{fig:algorithm}. The encryption process is naturally more complicated than decryption, since it must deal with all layers. The encryption proceeds in two stages. First, the padding is constructed, and then the plaintext for all the recipients is inserted and encrypted. The first stage consists of the following steps, starting with a message full of zeros. \begin{enumerate} \item Shift the routing information to the left by 48 bytes plus the size of the routing information, and zero-fill the gap that was thus opened up. \item Encrypt the resulting plaintext to the next recipient. \item Repeat this process once for each router, including the recipient. \end{enumerate} Once we have generated the padding, we need to insert the plaintext content (the secret message payload and the routing information) and encrypt. This process involves: \begin{enumerate} \item Insert the plaintext routing information for the current router. If this is the recipient, then also insert the payload, \emph{after} encrypting it to them with a user-specified key pair, using the first 24 bytes of the randomly-generated emphemeral public key as a nonce. The encryption of the payload uses a potentially different key pair (which is only ephemeral if the message is to be anonymous to the recipient), and uses the randomly-generated and ephemeral public key used for routing as the nonce. This is intended to statistically ensure the uniqueness of the nonce in case the keys used are static, to avoid reusing a nonce. \item The message is then encrypted to the recipient of the currently-constructed message. If no mistakes have been made, this encryption will result in zeros at the end of the routing information. \item Shift the routing information to the right to eliminate the gap of zeros, and then add the plaintext content consisting of the routing information for the next recipient and the ephemeral public key that recipient will use when decrypting. \item Return to step 1, if there is another router. \end{enumerate} As in the case of decryption, the NaCl \texttt{crypto\_box} encryption routine (even modified to not authenticate the payload portion of the message) requires zero padding at the beginning of the message in addition to that which we add in the middle. \begin{figure*} \begin{center} Router 1\\ \includegraphics[width=\textwidth]{decryption-1} Recipient\\ \includegraphics[width=\textwidth]{decryption-2} Router 3\\ \includegraphics[width=\textwidth]{decryption-3} Router 4\\ \includegraphics[width=\textwidth]{decryption-4} Router 5\\ \includegraphics[width=\textwidth]{decryption-5} \end{center} \caption{A diagram of the decryption and responding process.}\label{fig:decryption-and-responding} \end{figure*} \begin{figure*} \begin{center} \includegraphics[width=\textwidth]{return-key} \end{center} \caption{Finding the return key.}\label{fig:return-key} \end{figure*} \section{Decryption and responding as recipient} The purpose of \emph{not} authenticating the payload at each layer of the onion is to enable the recipient to send a response without giving the recipient any information about the sender beyond what is included in the payload, and moreover without allowing any other router to determine if it precedes or follows the recipient. Specifically, the recipient cannot determine who the sender is (unless such information is included in the payload). Moreover, no party other than the recipient (which includes the other routers, and any network observer) can determine which router is the recipient, provided care is taken to avoid timing attacks, for instance by incorporating a fixed (or randomized) delay in the forwarding of messages, such that the recipient has time to read the payload and encrypt the response before the packet would otherwise be expected to be sent to the following router. Figure~\ref{fig:decryption-and-responding} illustrates the entire sequence of messages passed between all routers following the first one, which was already shown in Fig.~\ref{fig:decryption}. Note that the colored patterns representing encryption with various keys are consistent across Figs.~\ref{fig:decryption}-\ref{fig:decryption-and-responding}, representing the encryption of a single onion, followed by transmision and decryption through the circuit. The recipient has a somewhat more convoluted process than would otherwise be required to insert the payload, because it is assumed that the recipient will not recognize her role as recipient until after decrypting and reading the routing information, which may be presumed to include a flag indicating that she is the recipient, and should therefore examine the payload. An alternative approach that would be more resistent to timing attacks---but less resistent to denial-of-service (DOS) attacks---would be have each router attempt to decrypt the payload to its secret key, and use the authentication of that decryption to determine whether they are the recipient. On the whole, this process does not seem warranted, when the routers can defeat timing attacks more efficiently by simply sleeping until a determined time (e.g. a random interval after receiving the packet, or a time specified in the routing information). The reciever, once she has identified herself as such, can immediately examine the payload, which presumably is itself encrypted and authenticated to her public key. Having done so, she constructs her response, which is also encrypted and authenticated. The response is encrypted to the same key that encrypted the original payload, but with a random nonce, which is included at the beginning of the response (along with an extra random byte). Because the response payload is encrypted and indistinguishable from random, we simply overwrite the original payload without applying an additional layer of encryption. For simplicity, this encryption is not reflected in the figure. The subsequent routers (routers 3, 4, and 5 in our example) are unaware that they are relaying a response rather than the original payload. In the usual case, the final router is given the sender's address. The sender will decrypt the response by encrypting it (or decrypting without authentication) to each of the keys of the routers following the recipient. This will restore the response to what the recipient inserted, which can then be decrypted and authenticated. \section{Extensions} An obvious extension to this scheme is to enable different routers to recieve and respond to different portions of the payload. To achieve this, the routing information would inform each router which randomized portion of the payload to read and respond to. This could enable collection of information from an entire circuit in one message. The procedure would be similar in essence to the single-recipient case, and we will consider it no further in this paper. Another alternative is that the circuit need not return to the original sender. One could either use the protocol to send information in a single direction, or the sender could cause the recipient to relay information to a third party. The latter would require a second message to inform the third party of the shared secrets needed to decrypt the ``response.'' \section{Analysis} Most of the security properties claimed for onion salt derive from those of the NaCl \texttt{crypto\_box} function (secrecy and authentication), and from the generation of new random key pairs for each recipient (anonymity). The ordering ambiguity derives from the padding scheme. There are, however, a few features of this scheme that bear examination lest they violate the preconditions needed for NaCl to provide the desired secrecy and authentication. One such feature is the use of a zero nonce. I believe this is safe because we only use each key pair to send a single message. Similarly, the use of the ephemeral public key as nonce for the encryption of the payload should be safe because it is an essentially random sequence of 32 bytes, which has negligible chance of collision. Finally, the use of a purely random nonce for the encryption of the response should be safe for the same reason. There are two more challenging risks in this scheme, because they are significant deviations from the standard NaCl scheme. The first is the unusual use of padding to ensure that a block of \emph{ciphertext} is zero. The second is the partial authentication that allows modification of the payload block by the responder. The zero padding means that we encrypt twice with the same key pair and nonce, which has the potential for danger when using stream ciphers. However, the only bytes that are encrypted twice which we use are those that were originally zero. The recipient of padded bytes generated using this encryption gains the same knowledge about our stream cipher that an opponent would gain using a known-plaintext attack. Because I believe the Salsa20 stream cipher is resistent to known-plaintext attacks, I also believe that there should not be an exploit that can take advantage of the nature of this padding. The second risk is in the modification to not authenticate the payload portion of the message. This could lead to an attack in which a router modifies the payload. One such attack would be an attack involving a malicious recipient cooperating with a malicious router. The router xors the payload with a given byte sequence. When the recipient observes that the payload fails to authenticate, then the recipient xors the payload with that byte sequence, and when the message does authenticate, has confirmed that this was the same message that had been forwarded by the router. Thus in spite of one or more honest routers in between them, they could connect a payload sent to the recipient with the node that sent the message to the first router. This is a variant of the same attack that authenticating the total routing information was intended to prevent. However, this attack requires that an attacker control the recipient, which statistically gives considerably less power than the router-router version. This attack does require that the malicious router modify the payload prior to knowing with confidence that the message might be interesting. Moreover, attempts to perform this attack can be detected, since if the recipient turns out to not be malicious, the message will not be received, that the sender will be forced to assume (correctly) that one of the routers was unreliable. Thus it is possible that this attack will be alleviated by having nodes evaluate the reliablility of routers prior to sending sensitive information. Another final question is that of timing attacks. NaCl itself is constructed to be resistant to such attacks, because branches never depend on secret data, and array indexing never depends on secret data. A timing attack on the creation of an onion could certainly reveal how many layers it has, and possibly some information about the slowness of the random byte generator, but should not reveal any more information. Decryption of a single layer by a router should happen in fixed time, independent of the data involved, and thus should be resistant to timing attacks, except that a recipient will need to spend more time to analyze data and respond. Thus a protocol using onion salt encryption should implement additional protection from timing attacks. \section{Computational cost} The cost to create an onion costs approximately $\Nrouter+1$ times the cost to encrypt an equal-sized message to each recipient. A router's decryption costs essentially the same as an NaCl \verb!crypto_box_open! decryption, which is quite fast. The recipient performs one additional decryption, plus \verb!crypto_box! encryption of the response. I anticipate that these computational costs will be entirely within reason, and will be dwarfed by the network costs of relaying a message through multiple routers. \section{Conclusions} I have introduced a new encryption protocol, onion salt, which creates a nested sequence of encryptions suitable for onion-style routing of fixed-size packets without handshake or prior shared secrets. This protocol has the properties of secrecy, authentication of each layer, anonymity, and order ambiguity. Moreover, it allows for a response to be relayed to the original sender without the recipient being able to learn the identity or network address of the sender. Order ambiguity means that routers are unable to determine their position in the sequence, and in particular cannot determine which router is the sender or recipient. The cost of this order ambiguity is in padding of data, as well as increasing the amount of data that need be encrypted. I believe this scheme could be valuable for low-bandwidth anonymous communication, in which the overhead of a hand-shake protocol would both dominate the network traffic and increase vulnerability to attacks by compromised routers or a network observer. %% Tarzan is a low-latency peer-to-peer anonymizing layer that acts at %% the IP level~\cite{freedman2002tarzan}. The second-generation Tor %% router uses a telescaping connection-based encryption %% scheme~\cite{dingledine2004tor} rather than the ``onion-based'' system %% of the original onion routing~\cite{reed1998onionrouting}. Note that %% the latter paper (the old one) is actually a very nice %% read~\cite{reed1998onionrouting}. Aqua is an interesting and recent %% high-bandwidth anonymizing network~\cite{leblond2013towards}. %% Here is a nice paper to read on distributed hash tables and security %% considerations involved~\cite{sit2002security}. And here is a nice %% one about how you need a critical mass in order to ensure %% anonymity~\cite{dingledine2006anonymity}. \bibliography{onionsalt}% Produces the bibliography via BibTeX. %% \appendix %% \begin{widetext} %% \section{onionsalt.h} %% \lstinputlisting[language=C]{../src/onionsalt.h} %% \section{onionsalt.c} %% \lstinputlisting[language=C]{../src/onionsalt.c} %% \end{widetext} \end{document}