# F-BLEAU
[![Build Status](https://travis-ci.org/gchers/fbleau.svg?branch=master)](https://travis-ci.org/gchers/fbleau) ![Version](https://img.shields.io/crates/v/fbleau.svg) F-BLEAU is a tool for estimating the leakage of a system about its secrets in a black-box manner (i.e., by only looking at examples of secret inputs and respective outputs). It considers a generic system as a black-box, taking secret inputs and returning outputs accordingly, and it measures how much the outputs "leak" about the inputs. It was proposed in [2]. F-BLEAU is based on the equivalence between estimating the error of a Machine Learning model of a specific class and the estimation of information leakage [1,2,3]. This code was also used for the experiments of [2] on the following evaluations: Gowalla, e-passport, and side channel attack to finite field exponentiation. # Getting started F-BLEAU is provided as a command line tool, `fbleau`. Python bindings also exist (see below). `fbleau` takes as input CSV data containing examples of system's inputs and outputs. It currently requires two CSV files as input: a _training_ file and a _validation_ (or _test_) file, such as: 0, 0.1, 2.43, 1.1 1, 0.0, 1.22, 1.1 1, 1.0, 1.02, 0.1 ... where the first column specifies the secret, and the remaining ones indicate the output vector. It runs a chosen method for estimating the Bayes risk (smallest probability of error of an adversary at predicting a secret given the respective output), and relative security measures. The general syntax is: ``` fbleau [--knn-strategy=] [options] Arguments: estimate: nn Nearest Neighbor. Converges only if the observation space is finite. knn k-NN rule. Converges for finite/continuous observation spaces. frequentist Frequentist estimator. Converges only if the observation space is finite. knn-strategy: ln k-NN with k = ln(n). log10 k-NN with k = log10(n). train Training data (.csv file). eval Validation data (.csv file). ``` ## Example This example considers 100K observations generated according to a Geometric distribution with privacy level `nu=4` (see [2] for details); the true value of the Bayes risk is `R*=0.456`, computed analytically. The observations are split into training (80%) and test sets (`examples/geometric-4.train.csv` and `examples/geometric-4.test.csv` respectively). One can run `fbleau` to compute the `knn` estimate with `ln` strategy (see below for details about estimation methods) as follows: ```console $ fbleau knn --knn-strategy ln examples/geometric-4.train.csv examples/geometric-4.test.csv Random guessing error: 0.913 Estimating leakage measures... Minimum estimate: 0.473 Multiplicative Leakage: 6.057471264367819 Additive Leakage: 0.44000000000000006 Bayes security measure: 0.5180722891566265 Min-entropy Leakage: 2.5987156557884865 You have new mail in /var/mail/joker ``` NOTE: depending on your machine's specs this may take a while. By default, F-BLEAU runs the estimator on an increasing number of training examples, and it computes the estimate at every step. The returned estimate of R* (here, 0.473) is the smallest one observed in this process. To log the estimates at every step, specify a log file with `--logfile `. ## Estimates In principle, one should try as many estimation methods as possible, and select the one that produced the smallest estimate [2]. However, some estimators are better indicated for certain cases. The following table shows: i) when an estimator is guaranteed to converge to the correct value (provided with enough data), and ii) if they're indicated for small or large systems. Indicatively, a small system has up to 1K possible output values; a large system may have much larger output spaces. | Estimate | Options | Convergence | Use cases | | -------- | ---------------- | ----------- | --------- | | **frequentist** | | If the output space is finite | Small systems | | **nn** | | If the output space is finite | Small/large systems | | **knn** | `--knn-strategy` | Always | Small/large systems | | **nn-bound** | | Always (Note, however, that this is a lower bound) | Small/large systems | For example: ``` fbleau nn ``` Further details are in [2]. ### k-NN strategies k-NN estimators also require defining a "strategy". Currently implemented strategies are: **ln** k-NN estimator with `k = ln(n)`, where `n` is the number of training examples. **log 10** k-NN estimator with `k = log10(n)`, where `n` is the number of training examples. For example, you can run: ``` fbleau knn --knn-strategy log10 ``` ## Further options By default, `fbleau` runs for all training data. However, one can specify a stopping criterion, in the form of a (delta, q)-convergence: `fbleau` stops when the estimate's value has not changed more than delta (`--delta`), either in relative (default) or absolute (`--absolute`) sense, for at least q steps (`--qstop`). `fbleau` can scale the individual values of the system's output ("features") in the `[0,1]` interval by specifying the `--scale` flag. An option `--distance` is available to select the desired distance metric for nearest neighbor methods. Further options are shown in the help page: ```console fbleau -h ``` # Installation The code is written in `Rust`, but it is thought to be used as a standalone command line tool. Install [rustup](https://rustup.rs), which will make `cargo` available on your path. Then run: ``` cargo install fbleau ``` You should now find the binary `fbleau` in your `$PATH` (if not, check out [rustup](https://rustup.rs) again). If `rustup` is not available on your system (e.g., some \*BSD systems), you should still be able to install `cargo` with the system's package manager, and then install `fbleau` as above. If this doesn't work, please open a ticket. # Python bindings If you prefer using F-BLEAU via Python, we now provide basic functionalities via a Python module. To install: ```console pip install fbleau ``` Usage: ```console >>> import fbleau >>> fbleau.run_fbleau(train_x, train_y, test_x, test_y, estimate, ... knn_strategy, distance, logfile, delta, qstop, absolute, scale) ``` Where the parameters follow the above conventions. ``` train_x : training observations (2d numpy array) train_y : training secrets (1d numpy array) test_x : test observations (2d numpy array) test_y : test secrets (1d numpy array) estimate : estimate, value in ("nn", "knn", "frequentist", "nn-bound") knn_strategy : if estimate is "knn", specify one in ("ln", "log10") distance : the distance used for NN or k-NN log_errors : if `true`, also return the estimate's value (error) for each step log_individual_errors : if `true`, log the individual errors for each test object, for the best estimator (i.e., for the smallest error estimate) delta : use to stop fbleau when it reaches (delta, qstop)-convergence qstop : use to stop fbleau when it reaches (delta, qstop)-convergence absolute : measure absolute instead of relative convergence scale : scale observations' features in [0,1] ``` The function `run_fbleau()` returns a dictionary, containing: - *min-estimate*: the minimum Bayes risk estimate (should be the one used) - *last-estimate*: the estimate computed with the full training data - *random-guessing*: an estimate of the random guessing error (~baseline, see [2]) - *estimates*: (if `log_errors=true`) a vector containing the value of the estimate at every step - *min-individual-errors*: (if `log_individual_errors=true`) a vector containing the individual errors (`true` if error, `false` otherwise) for each test object, corresponding to the best (i.e., smallest) estimate Simple example: ```python fbleau.run_fbleau(train_x, train_y, test_x, test_y, estimate='knn', knn_strategy='ln', distance='euclidean', log_errors=false, log_individual_errors=false, delta=None, qstop=None, absolute=false, scale=false) ``` # TODO Currently, the code provided here: - is based on frequentist and nearest neighbor methods; in the future we hope to extend this to other ML methods; note that this does not affect the generality of the results, which hold for any "universally consistent" classifier, - computes one estimate at the time (i.e., to compute multiple estimates one needs to run `fbleau` several times); this can change in the future. ### Short term - [x] return various leakage measures (instead of just R*) - [ ] resubstitution estimate ### Mid term - [ ] predictions for multiple estimators at the same time - [ ] get training data from standard input (on-line mode) ### Maybe - [ ] other ML methods (e.g., SVM, neural network) - [x] Python bindings # Hacking If you want to play with this code, you can compile it (after cloning the repo) with: ``` cargo build ``` To compile the Python module, you need to enable the optional feature `python-module`; this requires nightly Rust. Install maturin (`pip install maturin`), and then compile with: ``` maturin build --cargo-extra-args="--features python-module" ``` # References [1] 2017, "Bayes, not Naïve: Security Bounds on Website Fingerprinting Defenses". _Giovanni Cherubin_ [2] 2018, "F-BLEAU: Fast Black-Box Leakage Estimation". _Giovanni Cherubin, Konstantinos Chatzikokolakis, Catuscia Palamidessi_. [3] (Blog) "Machine Learning methods for Quantifying the Security of Black-boxes". https://giocher.com/pages/bayes.html