\mainsection{System and User Defined Binary Circuits} \label{sec:GC} SCALE-MAMBA allows one to call user defined circuits, both from the byte-code and the MAMBA language. In this section we explain how this is done. We also explain the circuits we provide: These currently are for some symmetric cryptographic operations such as AES, SHA-2 and SHA-3, as well as IEEE floating point arithmetic. \subsection{Defining Circuits} The first task is to define the actual circuit. For this we use a flat file format which we call ``Bristol Fashion''. For the precise details of this format see the file \begin{center} \verb|src/GC/Circuit.h| \end{center} To create such files you can use any tool you want, but here is how we do ours.... If you look in the directory \verb|Circuits/VHDL| you will find VHDL code for various functions. Write your VHDL code and compile it to a netlist using whatever tool chain you have. The key point is that {\em inputs} and {\em outputs} to the function must be a multiple of 64-bits in length. If this is not the case you just need to pad the input/output to the correct size. You then need to convert the netlist into a simple ``Bristol Fashion''. To do this we use a small program called \verb|convert.x| which lives in the \verb|Circuits| directory. This takes a \verb|.net| file in the \verb|VHDL| subdirectory and turns it into a \verb|.txt| file in the \verb|Bristol| subdirectory. {\bf Note 1:} Some optimizations are applied to the circuit on this transfer, which include removing unused wires etc. If your original function needed padding to make 64-bit multiple inputs/outputs you need to stop this optimization. To do this just comment out the line \verb|SC.Simplify()| in the program \verb|convert.cpp|. {\bf Note 2:} The circuits in the subdirectory \verb|Bristol| are in the simple Bristol fashion format, with no MAND gates. You should now have a Bristol Fashion file with an extension \verb|.txt| in the directory \verb|Circuits/Bristol|. To test this file does what you expect it to do you can use similar code to that which is given in the program \verb|Test/Test-Circuit.cpp| or \verb|Test/Test-Convert.cpp|. \subsection{Adding Circuits into the RunTime Engine} So given your new circuit, which we will pretend is called \verb|Foo.txt|, you now need to register this with the run time system. All circuits are given a number by the system, the user defined numbers are from 65536 upwards (inclusive), with numbers for the developers being those less than 65536. So {\em do not use a number less than 65536!!!!!}. Circuits which are used to operate on the basic \verb|sregint| data type to enable 64-bit secret arithmetic are given numbers less than 100. Numbers in the range 100 to 65535 are used to add extra cool (well we think it is cool) functionality into the system. We have added a number of such circuits already into the system, which include AES operations (for AES-128, AES-192 and AES-256), the SHA-256 and SHA-512 compression functions, a circuit for the Keccak-f function, plus circuits for IEEE floating point arithmetic\footnote{If you have a cool circuit which you think others might find cool to use in SCALE-MAMBA, just send it to us and we might include it in a future release with a circuit number less than 65536.}. To register your circuit with the runtime you edit the file \verb|src/GC/Base_Circuits.cpp|. At the bottom of the initialize member function you can add your own circuits. Thus to add your \verb|Foo.txt| circuit you would add the lines \begin{lstlisting} loaded.insert(make_pair(65536, false)); location.insert(make_pair(65536, "Circuits/Bristol/Foo.txt")); \end{lstlisting} What this does is tell the runtime that we are going to use a new circuit numbered 65536, that it is not yet loaded into memory, and that when it does need to be loaded where to get it from. \subsection{Byte Code Operation} The circuit is executed by the \verb|GC| byte-code. This takes as input a single integer which defines the circuit to be evaluated. The arguments to the circuit, as \verb|sregint| values, are then popped off the \verb|sregint|-stack and then passed into the binary circuit engine. On return the resulting result values are then pushed onto the \verb|sregint|-stack. Since a \verb|sregint| register is 64-bits long, this explains why the circuits take inputs/outputs a multiple of 64-bits. When this instruction is executed, if the circuit has not yet been loaded it is loaded into memory. Then the binary circuit engine is called with the given input values to produce the output. Note, that the first time a GC operation is met, the runtime might give a small delay as the various pre-processing queues need to fill up, this is especially true for the HSS based binary circuit engine. In subsequent GC operations this stall disappears (unless you are doing a lot of GC operations one after another). {\bf Note:} When using the binary circuit engine for Shamir and Replicated sharing when loading a {\em user define circuit} for the first time there is also a delay as the circuit gets `optimized' on the fly by adding MAND gate instead of many single AND gates. This can take a while for large circuits. Thus if timing GC operations always do a dummy execution first before executing any timing. Due to the way the queues are produced if you pass in a VERY big circuit the runtime might abort. If this happens please let us know and we will fix this. But as we never meet this limit we have a cludge in place to just catch the issue and throw an exception. The exception says to contact us, so you will know which one it is. \subsection{Using Circuits From MAMBA} MAMBA calling of the circuits follows much like the \verb|GC| byte-code. In the context of executing AES-128 operation with the key \verb+0x00000000FFFFFFFF+ and the message \verb+0xFFFFFFFF00000001+, this becomes the code \begin{lstlisting} AES_128=100 def push_data(stuff,n): for i in range(n): sregint.push(stuff[i]) def pop_data(stuff,n): for i in range(n): stuff[n-i-1]=sregint.pop() # Set key key = [sregint(0), sregint(-1)] mess = [sregint(-1), sregint(1)] ciph = [sregint() for _ in range(2)] print_ln("AES-128") push_data(key,2) push_data(mess,2) # Op GC(AES_128) pop_data(ciph,2) # Now open the values to check all is OK m = [None] * 2 k = [None] * 2 c = [None] * 2 for i in range(2): m[i] = mess[i].reveal() k[i] = key[i].reveal() c[i] = ciph[i].reveal() print_ln("Key") k[1].public_output() k[0].public_output() print_ln("Message") m[1].public_output() m[0].public_output() print_ln("Ciphertext") c[1].public_output() c[0].public_output() \end{lstlisting} Note, that the AES-128 operation takes four input registers (two for the key and two for the message), and produces two output registers as output. Also note bit ordering of the inputs and outputs. The correct output of AES-128 for the above example is \begin{center} \verb|406bab6335ce415f4f943dc8966682aa| \end{center} \subsection{Current System Defined Circuits} The current list of system defined circuits are \begin{center} \begin{tabular}{c|l} Number & Function \\ \hline 100 & AES-128 \\ 101 & AES-192 \\ 102 & AES-256 \\ 103 & Keccak-f \\ 104 & SHA-256 \\ 105 & SHA-512 \\ \hline 120 & IEEE floating point (double) addition \\ 121 & IEEE floating point (double) multiplication \\ 122 & IEEE floating point (double) division \\ 123 & IEEE floating point (double) equality \\ 124 & IEEE floating point (double) to sregint \\ 125 & sregint to IEEE floating point (double) \\ 126 & IEEE floating point (double) sqrt \\ 127 & IEEE floating point (double) lt \\ 128 & IEEE floating point (double) floor \\ 129 & IEEE floating point (double) ceil \\ \hline \end{tabular} \end{center} \subsection{IEEE Floating Point Arithmetic} \label{sec:ieee} As a recap an IEEE double is held in 64-bits as one sign bit, 11 bits to represent the signed exponent, and then 52 bits to represent the normalised mantissa. By normalization one means that there is a hidden 53 bit which is always set to one, namely the most significant bit of the number. Thus we obtain 53 bits of binary precision. So for example the decimal number $3.125$ gets represented as the 64-bit number \begin{lstlisting} 0 10000000000 1001000000000000000000000000000000000000000000000000 \end{lstlisting} In MAMBA we would hold this as the 64-bit integer $4614219293217783808$; notice the bit order! To convert a floating point value to this representation, and then print the value to the standard output device, we have the following commands \begin{lstlisting} r0=convert_to_float("3.125") print_ieee_float(r0) \end{lstlisting} where here \verb|r0| will be a \verb|regint| type. Note, that if you perform arithmetic on the associated \verb|regint| you will not get the equivalent of double arithmetic. There is currently no way to perform arithmetic on `clear' floating point values. However, we can perform arithmetic on `secret' floating point values using the Garbled Circuit engine\footnote{Note one thing we have not yet worked out is how for users to `enter' a secret floating point value, i.e. do the conversion hidden behind the conversion operation above. One presumes an application could do this themselves as it is not that hard programmatically.}. We first need to convert the \verb|regint| to an \verb|sregint|, which is done using the standard type conversion operation. Then to perform arithmetic we use the stack and the garbled circuits defined above. Clearly, as the Garbled Circuit engine is stack based one can be more efficient by transforming any expression to Reverse Polish notation and then executing the expression as this. The following code example, from the example program \verb|Programs/GC_Float|, illustrates how to use the operations. \begin{lstlisting} FP_add=120 FP_mul=121 FP_div=122 FP_eq=123 FP_f2i=124 FP_i2f=125 FP_sqrt=126 r0=convert_to_float("3.125") r1=convert_to_float("1.25") r2=convert_to_float("-6.535353") r3=convert_to_float("199.3231") s0=sregint(r0) s1=sregint(r1) s2=sregint(r2) s3=sregint(r3) # First compute # s4 = (s0+s1)*(s2+s3) # # To do this we convert to reverse polish notation # # s0 s1 + s2 s3 + * # # Then we compute this by executing this as a stack programm... # sregint.push(s0) sregint.push(s1) GC(FP_add) sregint.push(s2) sregint.push(s3) GC(FP_add) GC(FP_mul) # Now test the result s4=sregint.pop() r4=s4.reveal() print_ieee_float(r4) print_ln("\nThe last number should be 843.446393125\n") # Now we are going to create -3.125 from 3.125 # From this it is easy to do subtraction s5=s0 ^ 0x8000000000000000; r5=s5.reveal() print_ieee_float(r5) print_ln("\nThe last number should be -3.125\n") # Now we divide s1 by s2 sregint.push(s1) sregint.push(s2) GC(FP_div) s6=sregint.pop() r6=s6.reveal() print_ieee_float(r6) print_ln("\nThe last number should be 1.25/-6.535353 = -0.191267403612322\n") # Now sqrt of 3.125 sregint.push(s0) GC(FP_sqrt) s7=sregint.pop() r7=s7.reveal() print_ieee_float(r7) print_ln("\nThe last number should be sqrt(3.125) = 1.76776695\n") # Now sqrt of -3.125 sregint.push(s5) GC(FP_sqrt) s8=sregint.pop() r8=s8.reveal() print_ieee_float(r8) print_ln("\nThe last number should be sqrt(-3.125) = NaN\n") # Now conversion between integers and floats # Take a big integer and convert it to a float sa=sregint(9223372036430532566) sregint.push(sa) GC(FP_i2f) sf=sregint.pop() sregint.push(sf) GC(FP_f2i) sb=sregint.pop() a=sa.reveal() b=sb.reveal() f=sf.reveal() print_ln("Conversions...") print_int(a); print_ln("") print_ieee_float(f); print_ln("") print_int(b); print_ln("") \end{lstlisting} Note how negation can be obtained by a bit flip in the sign position. \subsection{What Algorithms are Used} There are two different binary circuit engines. One for Shamir/Replicated sharings and one for Full Threshold/Q2 MSP sharings. In theory one could use the Shamir/Replicated sharing for the Q2 MSP sharings, but currently there is a bug in the conversion which we need to iron out. \begin{itemize} \item It has something to do with creating the correct complete access structure from the Q2 MSP, in order to generate the associated replicated sharing. \end{itemize} \paragraph{Shamir/Replicated Sharings} The base triples are produced using Maurer's simple protocol \cite{Maurer} using the underlying replicated sharing method. Thus this is use a PRSS to generate $a$ and $b$, and then generate $c$ via Schur-Multiplication followed by resharing. The triples are then verified to be correct using Protocol 3.1 from \cite{DBLP:conf/sp/ArakiBFLLNOWW17}. This is all done in a single offline thread. {\bf Note:} This entire method (including the online phase) requires a sufficiently small Replicated representation of the base access structure. Thus for Shamir sharings for which one cannot implement a PRSS (as the size of the associated Replicated scheme is too large) we default to the HSS based method for Full Threshold sharings. As the underlying access structure is Q2 the shares are automatically self-authenticating (see \cite{SW18}), one can run the MPC protocol from \cite{SW18} to evaluate the circuit. Since this protocol is not constant round we need to `massage' the Bristol Fashion circuit into its extended format, which is AND-depth oriented and includes the MAND gates. This results in a delay when loading a circuit into SCALE for the first time. The protocol for multiplication within the circuit makes use of the reduced communication methodology of \cite{KRSW}. With authentication performed by hashing the opened sharings, and then comparing the hash value at appropriate moments. \paragraph{Full Threshold/Q2 MSP Sharings} Authenticated bits (called aBits) and authenticated triples (called aANDs) are generated with BDOZ style MACs using an OT based pre-processing. The underlying OT protocol is from \cite{CO15}. The base random OTs are converted into a large number of correlated COT's, for which we use \cite{AC:FKOS15}[Full Version, Figure 19] and \cite{C:KelOrsSch15}[Full Version, Figure 7]. These correlated OTs are then converted into random sharings of authenticated bits (so called aShares/aBits), for this step we use \cite{AC:HazSchSor17}[Full Version, Figure 16]. Finally these aBits are converted into aANDs using \cite{CCS:WanRanKat17b}[Full Version, Figures 16, 8 and 18 in order]. The hash function in the protocol to generated HaANDs from \cite{CCS:WanRanKat17b} is implemented using the CCR-secure MMO construction from \cite{GKWY19}. Once the aBits and the aANDs are produced (in the offline threads) the protocol to process a circuit utilizes the garbled-circuit BMR approach of \cite{AC:HazSchSor17}.