\mainsection{System and User Defined Binary Circuits}
\label{sec:GC}
SCALE-MAMBA allows one to call user defined circuits, both
from the byte-code and the MAMBA language.
In this section we explain how this is done.
We also explain the circuits we provide: These currently
are for some symmetric cryptographic operations such as
AES, SHA-2 and SHA-3, as well as IEEE floating point
arithmetic.


\subsection{Defining Circuits}
The first task is to define the actual circuit.
For this we use a flat file format which we call 
``Bristol Fashion''.
For the precise details of this format see the
file 
\begin{center}
\verb|src/GC/Circuit.h|
\end{center}
To create such files you can use any tool you want,
but here is how we do ours....

If you look in the directory \verb|Circuits/VHDL| you
will find VHDL code for various functions. Write your
VHDL code and compile it to a netlist using whatever
tool chain you have.
The key point is that {\em inputs} and {\em outputs}
to the function must be a multiple of 64-bits in length.
If this is not the case you just need to pad the
input/output to the correct size.

You then need to convert the netlist into a
simple ``Bristol Fashion''. To do this we use a small program
called \verb|convert.x| which lives in the 
\verb|Circuits| directory. This takes a \verb|.net|
file in the \verb|VHDL| subdirectory and turns it
into a \verb|.txt| file in the \verb|Bristol| subdirectory.


{\bf Note 1:} Some optimizations are applied to the
circuit on this transfer, which include removing unused
wires etc. If your original function needed padding to
make 64-bit multiple inputs/outputs you need to stop
this optimization. To do this just comment out the
line \verb|SC.Simplify()| in the program \verb|convert.cpp|.

{\bf Note 2:} The circuits in the subdirectory
\verb|Bristol| are in the simple Bristol fashion format,
with no MAND gates.

You should now have a Bristol Fashion file with an 
extension \verb|.txt| in the directory \verb|Circuits/Bristol|.
To test this file does what you expect it to do you
can use similar code to that which is given in the
program \verb|Test/Test-Circuit.cpp| or
\verb|Test/Test-Convert.cpp|.

\subsection{Adding Circuits into the RunTime Engine}
So given your new circuit, which we will pretend is
called \verb|Foo.txt|, you now need to register this
with the run time system.
All circuits are given a number by the system,
the user defined numbers are from 65536 upwards (inclusive),
with numbers for the developers being those less
than 65536.
So {\em do not use a number less than 65536!!!!!}.

Circuits which are used to operate on the basic \verb|sregint|
data type to enable 64-bit secret arithmetic are
given numbers less than 100.
Numbers in the range 100 to 65535 are used to add extra
cool (well we think it is cool) functionality into the
system.
We have added a number of such circuits already into the
system, which include AES operations (for AES-128, AES-192
and AES-256), the SHA-256 and SHA-512 compression functions,
a circuit for the Keccak-f function, plus circuits for
IEEE floating point arithmetic\footnote{If
you have a cool circuit which you think others might find
cool to use in SCALE-MAMBA, just send it to us and we
might include it in a future release with a circuit
number less than 65536.}.

To register your circuit with the runtime you edit the
file \verb|src/GC/Base_Circuits.cpp|. At the bottom of
the initialize member function you can add your
own circuits.
Thus to add your \verb|Foo.txt| circuit you would add
the lines
\begin{lstlisting}
  loaded.insert(make_pair(65536, false));
  location.insert(make_pair(65536, "Circuits/Bristol/Foo.txt"));
\end{lstlisting}
What this does is tell the runtime that we are going to
use a new circuit numbered 65536, that it is not yet loaded
into memory, and that when it does need to be loaded
where to get it from.

\subsection{Byte Code Operation}
The circuit is executed by the \verb|GC| byte-code.
This takes as input a single integer which defines the
circuit to be evaluated.
The arguments to the circuit, as \verb|sregint| values,
are then popped off the \verb|sregint|-stack and then passed
into the binary circuit engine.
On return the resulting result values are then pushed
onto the \verb|sregint|-stack.
Since a \verb|sregint| register is 64-bits long, this
explains why the circuits take inputs/outputs a multiple
of 64-bits.

When this instruction is executed, if the circuit has
not yet been loaded it is loaded into memory. Then the
binary circuit engine is called with the given input
values to produce the output.
Note, that the first time a GC operation is met, the
runtime might give a small delay as the various pre-processing
queues need to fill up, this is especially true for the
HSS based binary circuit engine.
In subsequent GC operations this stall disappears (unless
you are doing a lot of GC operations one after another).

{\bf Note:} When using the binary circuit engine for
Shamir and Replicated sharing when loading a 
{\em user define circuit} for the first time there is
also a delay as the circuit gets `optimized' on the
fly by adding MAND gate instead of many single AND
gates. This can take a while for large circuits.
Thus if timing GC operations always do a dummy execution
first before executing any timing.

Due to the way the queues are produced if you pass in a
VERY big circuit the  runtime might abort. If this happens
please let us know and we will fix this. But as we never
meet this limit we have a cludge in place to just catch
the issue and throw an exception. The exception says
to contact us, so you will know which one it is.

\subsection{Using Circuits From MAMBA}
MAMBA calling of the circuits follows much like the
\verb|GC| byte-code.
In the context of executing AES-128 operation with
the key \verb+0x00000000FFFFFFFF+ and the message
\verb+0xFFFFFFFF00000001+, this becomes the code
\begin{lstlisting}
AES_128=100

def push_data(stuff,n):
  for i in range(n):
    sregint.push(stuff[i])

def pop_data(stuff,n):
  for i in range(n):
    stuff[n-i-1]=sregint.pop()

# Set key
key = [sregint(0), sregint(-1)]
mess = [sregint(-1), sregint(1)]
ciph = [sregint() for _ in range(2)]

print_ln("AES-128")
push_data(key,2)
push_data(mess,2)
# Op
GC(AES_128)
pop_data(ciph,2)

# Now open the values to check all is OK
m = [None] * 2
k = [None] * 2
c = [None] * 2
for i in range(2):
	m[i] = mess[i].reveal()
	k[i] = key[i].reveal()
	c[i] = ciph[i].reveal()

print_ln("Key")
k[1].public_output()     
k[0].public_output()

print_ln("Message")
m[1].public_output()     
m[0].public_output()

print_ln("Ciphertext")
c[1].public_output()     
c[0].public_output()
\end{lstlisting}

Note, that the AES-128 operation takes four
input registers (two for the key and two for
the message), and produces two output registers
as output.
Also note bit ordering of the inputs and outputs.
The correct output of AES-128 for the above
example is
\begin{center}
  \verb|406bab6335ce415f4f943dc8966682aa|
\end{center}


\subsection{Current System Defined Circuits}
The current list of system defined circuits are
\begin{center}
\begin{tabular}{c|l}
Number & Function \\
\hline
100 & AES-128 \\
101 & AES-192 \\
102 & AES-256 \\
103 & Keccak-f \\
104 & SHA-256 \\
105 & SHA-512 \\
\hline
120 & IEEE floating point (double) addition \\
121 & IEEE floating point (double) multiplication \\
122 & IEEE floating point (double) division \\
123 & IEEE floating point (double) equality \\
124 & IEEE floating point (double) to sregint \\
125 & sregint to IEEE floating point (double) \\
126 & IEEE floating point (double) sqrt \\
127 & IEEE floating point (double) lt \\
128 & IEEE floating point (double) floor \\
129 & IEEE floating point (double) ceil \\
\hline
\end{tabular}
\end{center}

\subsection{IEEE Floating Point Arithmetic}
\label{sec:ieee}
As a recap an IEEE double is held in 64-bits as one sign bit,
11 bits to represent the signed exponent, and then 52
bits to represent the normalised mantissa. By normalization
one means that there is a hidden 53 bit which is always set
to one, namely the most significant bit of the number.
Thus we obtain 53 bits of binary precision.
So for example the decimal number $3.125$ gets represented as
the 64-bit number
\begin{lstlisting}
   0 10000000000 1001000000000000000000000000000000000000000000000000
\end{lstlisting}
In MAMBA we would hold this as the 64-bit integer $4614219293217783808$; notice
the bit order!
To convert a floating point value to this representation, and then print
the value to the standard output device, we have the following
commands
\begin{lstlisting}
  r0=convert_to_float("3.125")
  print_ieee_float(r0)
\end{lstlisting}
where here \verb|r0| will be a \verb|regint| type.
Note, that if you perform arithmetic on the associated \verb|regint|
you will not get the equivalent of double arithmetic.
There is currently no way to perform arithmetic on `clear'
floating point values.

However, we can perform arithmetic on `secret' floating point values
using the Garbled Circuit engine\footnote{Note one thing we have not
yet worked out is how for users to `enter' a secret floating point
value, i.e. do the conversion hidden behind the conversion
operation above.
One presumes an application could do this themselves as it is not
that hard programmatically.}.
We first need to convert the \verb|regint| to an \verb|sregint|, which
is done using the standard type conversion operation.
Then to perform arithmetic we use the stack and the garbled circuits
defined above.

Clearly, as the Garbled Circuit engine is stack based one can
be more efficient by transforming any expression to Reverse
Polish notation and then executing the expression as this.
The following code example, from the example program
\verb|Programs/GC_Float|, illustrates how to use the operations.
\begin{lstlisting}
FP_add=120
FP_mul=121
FP_div=122
FP_eq=123
FP_f2i=124
FP_i2f=125
FP_sqrt=126

r0=convert_to_float("3.125")
r1=convert_to_float("1.25")
r2=convert_to_float("-6.535353")
r3=convert_to_float("199.3231")

s0=sregint(r0)
s1=sregint(r1)
s2=sregint(r2)
s3=sregint(r3)

# First compute 
#   s4 = (s0+s1)*(s2+s3)
#
# To do this we convert to reverse polish notation 
#
#   s0 s1 + s2 s3 + *
#
# Then we compute this by executing this as a stack programm...
#

sregint.push(s0)
sregint.push(s1)
GC(FP_add)
sregint.push(s2)
sregint.push(s3)
GC(FP_add)
GC(FP_mul)

# Now test the result
s4=sregint.pop()
r4=s4.reveal()
print_ieee_float(r4)
print_ln("\nThe last number should be 843.446393125\n")

# Now we are going to create -3.125 from 3.125
#   From this it is easy to do subtraction
s5=s0 ^ 0x8000000000000000;
r5=s5.reveal()
print_ieee_float(r5)
print_ln("\nThe last number should be -3.125\n")

# Now we divide s1 by s2
sregint.push(s1)
sregint.push(s2)
GC(FP_div)
s6=sregint.pop()
r6=s6.reveal()
print_ieee_float(r6)
print_ln("\nThe last number should be 1.25/-6.535353 = -0.191267403612322\n")

# Now sqrt of 3.125
sregint.push(s0)
GC(FP_sqrt)
s7=sregint.pop()
r7=s7.reveal()
print_ieee_float(r7)
print_ln("\nThe last number should be sqrt(3.125) = 1.76776695\n")

# Now sqrt of -3.125
sregint.push(s5)
GC(FP_sqrt)
s8=sregint.pop()
r8=s8.reveal()
print_ieee_float(r8)
print_ln("\nThe last number should be sqrt(-3.125) = NaN\n")

# Now conversion between integers and floats
# Take a big integer and convert it to a float
sa=sregint(9223372036430532566)
sregint.push(sa)
GC(FP_i2f)
sf=sregint.pop()
sregint.push(sf)
GC(FP_f2i)
sb=sregint.pop()

a=sa.reveal()
b=sb.reveal()
f=sf.reveal()
print_ln("Conversions...")
print_int(a);        print_ln("")
print_ieee_float(f); print_ln("")
print_int(b);        print_ln("")
\end{lstlisting}
Note how negation can be obtained by a bit flip in the sign position.


\subsection{What Algorithms are Used}
There are two different binary circuit engines. One for Shamir/Replicated
sharings and one for Full Threshold/Q2 MSP sharings.
In theory one could use the Shamir/Replicated sharing for the Q2 MSP
sharings, but currently there is a bug in the conversion which we need
to iron out.
\begin{itemize}
\item It has something to do with creating the correct complete access
structure from the Q2 MSP, in order to generate the associated
replicated sharing.
\end{itemize}

\paragraph{Shamir/Replicated Sharings}
The base triples are produced using Maurer's simple protocol
\cite{Maurer} using the underlying replicated sharing method.
Thus this is use a PRSS to generate 
$a$ and $b$, and then generate $c$ via Schur-Multiplication
followed by resharing.
The triples are then verified to be correct using Protocol 3.1
from \cite{DBLP:conf/sp/ArakiBFLLNOWW17}.
This is all done in a single offline thread.

{\bf Note:} This entire method (including the online phase)
requires a sufficiently small Replicated representation of 
the base access structure. Thus for Shamir sharings for which 
one cannot implement a PRSS (as the size of the associated 
Replicated scheme is too large) we default to the HSS based
method for Full Threshold sharings. 

As the underlying access structure is Q2 the shares are automatically
self-authenticating (see \cite{SW18}), one can run the 
MPC protocol from \cite{SW18} to evaluate the circuit.
Since this protocol is not constant round we need to `massage'
the Bristol Fashion circuit into its extended format, which
is AND-depth oriented and includes the MAND gates. This
results in a delay when loading a circuit into SCALE for the
first time.
The protocol for multiplication within the circuit
makes use of the reduced communication methodology of
\cite{KRSW}.
With authentication performed by hashing the opened sharings,
and then comparing the hash value at appropriate moments.

\paragraph{Full Threshold/Q2 MSP Sharings}
Authenticated bits (called aBits) and authenticated triples (called aANDs) 
are generated with BDOZ style MACs using an OT based pre-processing.
The underlying OT protocol is from \cite{CO15}.
The base random OTs are  converted into a large number of
correlated COT's, for which we use \cite{AC:FKOS15}[Full Version, Figure 19]
and \cite{C:KelOrsSch15}[Full Version, Figure 7].
These correlated OTs are then converted into random sharings of authenticated
bits (so called aShares/aBits), for this step we use \cite{AC:HazSchSor17}[Full Version, Figure 16].
Finally these aBits are converted into aANDs using
\cite{CCS:WanRanKat17b}[Full Version, Figures 16, 8 and 18 in order].
The hash function in the protocol to generated HaANDs from \cite{CCS:WanRanKat17b}
is implemented using the CCR-secure MMO construction from \cite{GKWY19}.
Once the aBits and the aANDs are produced (in the offline threads) 
the protocol to process a circuit utilizes the garbled-circuit BMR approach of
\cite{AC:HazSchSor17}.