# BoolNetEvo Evolve populations of boolean networks to approximate bitstring functions and their (unknown) inverses.
Machinery (click to show/hide)

```text Layer k+1 ● ● ↑ ┌───────┼───────┐ ┌───────┼───────┐ ↑ ○ ○ ○ ○ ○ ○ ↑ / | \ / | \ / | \ / | \ / | \ / | \ Layer k ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ``` [Boolean network](https://en.wikipedia.org/wiki/Boolean_network), or *boolnet*, is a well-known particular case of an [artificial neural network](https://en.wikipedia.org/wiki/Artificial_neural_network) where inputs/outputs of each cell, or *boolon*, are bits — 0 (false) and 1 (true) — rather than values from a range, say, (-∞; +∞). Output of a boolon is a boolean function of its inputs, which are outputs of boolons from previous layer. In this implementation, boolean functions have 3 variables and are encoded by `u8` values, which represent the truth tables of such functions. Each boolon has input tree, whose leaves are its input boolons, and then, at each level of that tree, a boolean function of 3 values at current level gives 1 value at next level. Single value at root level is the output of this boolon. On the figure above, input tree has height 2, so each boolon's output is affected by 32 = 9 outputs of boolons from previous layer (some may coincide). Number of boolons is the same in all layers, and the signal can propagate through a boolnet for few times, called *cycles*: at the end of each cycle, outputs of the last layer become inputs for the first layer. Under these constraints, layers can be viewed as steps residing in time rather than space, if you identify i-th boolon of all layers, which changes its input tree at every step within cycle. Layers may have *rands* boolons whose outputs are random. Enter robustness and free (unpredictable) will... at the cost of slower learning. Array of 256 possible 3-variable boolean functions is used to perform simultaneous calculation for 128 data flows, in a *batch*. Several boolnets of the same architecture make a *population*, which evolves in time by means of the [genetic algorithm](https://en.wikipedia.org/wiki/Genetic_algorithm). At each *epoch*: - all boolnets are scored w.r.t. how they calculate some bitstring function or its inverse, on the average (the more wrong bits, the less the score), - boolnets with the lowest scores are replaced by inheritants of boolnets with the highest scores. Inheritance, as usually, is [crossover](https://en.wikipedia.org/wiki/Crossover_(genetic_algorithm)) and [mutation](https://en.wikipedia.org/wiki/Mutation_(genetic_algorithm)), in this case for 1) indices of input boolons and 2) boolean functions in the input tree, for each boolon. Note that a single boolnet does not evolve; a population does. The purpose of the crate is to provide controls for this process and make it more interactive.
## Quick Start **Warning**: use `release` profile, without optimizations the execution is 20–30 times slower. In your `code.rs`, bring `BitransformRegister` and `Evolver` structs into scope: ```rust extern crate boolnetevo; use boolnetevo::{ BitransformRegister, Evolver }; ``` and create their default instances: ```rust fn app() -> Result<(), String> { let register = BitransformRegister::new_default(); let mut evolver = Evolver::new_default(®ister)?; ... } ``` Now, you can run the built-in shell... ```rust evolver.shell(®ister)?; ``` and get something like this (with colors) on your terminal: ```text BITRANSFORM: InvXor { direct: Xor } POPULATION: 250 ARCHITECTURE: height = 1, size = 48, rands = 2, layers = 1, cycles = 1 PARAMETERS: batches = 8, replace_ratio = 0.7, par_ratio = 0.8, parents = 2, par_sw_prob = 0.9, mutat_prob = 0.1 SETTINGS: log off, test new, print improve Type "?" to see the list of available commands... $ evol Press any key to stop... Epoch 1 ( 41 ms): min = -9.0586, average = -7.9585, max = -6.8008 Epoch 2 ( 29 ms): min = -8.9746, average = -7.7293, max = -6.7021 Epoch 3 ( 29 ms): min = -8.9043, average = -7.5695, max = -6.4512 Epoch 4 ( 18 ms): min = -8.9697, average = -7.4082, max = -5.7324 Epoch 6 ( 17 ms): min = -8.7197, average = -7.1513, max = -5.7129 ... Epoch 58 ( 29 ms): min = -5.0742, average = -1.9552, max = -0.4736 Epoch 64 ( 16 ms): min = -4.5156, average = -1.8883, max = -0.2500 Epoch 65 ( 17 ms): min = -5.0068, average = -1.8879, max = 0.0000 BITRANSFORM: InvXor { direct: Xor } POPULATION: 250 ARCHITECTURE: height = 1, size = 48, rands = 2, layers = 1, cycles = 1 PARAMETERS: batches = 8, replace_ratio = 0.7, par_ratio = 0.8, parents = 2, par_sw_prob = 0.9, mutat_prob = 0.1 SETTINGS: log off, test new, print improve $ run 0 a7b8 e9924e2a ``` (To verify that `0xe992 XOR 0x4e2a = 0xa7b8`, ```text $ dir e9924e2a a7b8 ``` or use any calculator in programming mode.) ...*OR* you can call `Evolver`'s methods directly: ```rust evolver.bitran(®ister, "inv-sha1", &[1, 4])?; // 1 round of SHA-1 with 4-byte message evolver.arch(2, 240, 8, 2, 1)?; evolver.par("mutat_prob", "0.2")?; evolver.evol(100)?; evolver.par("mutat_prob", "0.1")?; evolver.evol(50)?; evolver.save("evolvers/invsha1-demo")?; ``` See `src/bin/shell.rs` and `src/bin/script.rs`. ## Custom Bitransforms The evolution optimizes a population towards precise calculation of some bitstring function. The related methods — constructing the function's parameters block, providing input and output size in bits, scoring the input-output batch pair in regard to the function — are gathered in `Bitransform` trait. The crate implements it for `xor`, `add`, `mult`, [`CRC32`](https://en.wikipedia.org/wiki/Cyclic_redundancy_check#CRC-32_algorithm), [`MD5`](https://en.wikipedia.org/wiki/MD5), [`SHA1`](https://en.wikipedia.org/wiki/SHA-1), [`SHA2`](https://en.wikipedia.org/wiki/SHA-2), [`SHA3/Keccak`](https://en.wikipedia.org/wiki/SHA-3) functions and their inverses, and you can implement it for your own functions. As an example, see `src/bitransform/xor.rs` and `src/bitransform/sha1.rs`. The difference between direct and inverse bitransforms is that in the former case, the *function of input* is compared bitwise with the *output*, while in the latter case, the same *function of output* is compared with the *input*. In other words, you do not need explicit inverse function to score something that tries to perform inversion... actually, "something" should become an inverter. Assume you have done it for `Func` in `func` module, then you add it to register: ```rust mod func; use func::{Func, InvFunc}; fn app() -> Result<(), String> { let mut register = BitransformRegister::new_default(); register.add(vec![ Func::reg_entry(), InvFunc::reg_entry() ])?; ... } ``` and now you can use it with evolver: ```rust let mut evolver = Evolver::new_default(®ister)?; evolver.bitran(®ister, "inv-func", &[7, 5, 3])?; evolver.evol(1000)?; ``` (Here `7, 5, 3` are your function's parameters.) ## Modifying the crate Well, copy its source and do whatever you like: extend architecture, apply another genetic algorithm, optimize performance, ... It is something else then, and you should give it a different name. ## Kin Some resembling frameworks, packages, projects: - [Boolean-Circuit-Evolution](https://github.com/JBQuim/Boolean-Circuit-Evolution) - [BoolNet](https://cran.r-project.org/web/packages/BoolNet/) - [PyBoolNet](https://github.com/hklarner/PyBoolNet) - [revonet](https://crates.io/crates/revonet) - [rsgenetic](https://crates.io/crates/rsgenetic) There are probably more... When searching for those, keep in mind that the "evolution" term in this area sometimes means change of a single boolnet state in time rather than optimization of a boolnet population. ## A*s*hievements So far, well... While populations relatively quickly learn to invert simple bitstring functions like bitwise XOR, they struggle with more complex ones, especially those created to prevent inversion, like rounds of cryptographic hashes SHA-x. Even ADD, due to carry, requires more "advanced" boolnet architecture (more layers etc.), which makes learning significantly slower. There seem to be non-zero bounds for the best score, and, of course, there is no guarantee to reach score 0, that is, to breed a boolnet that *always* calculates a function precisely. There may be a *proof* somewhere that this approach has insurmountable limitations when a function's complexity exceeds certain threshold, and then to apply it would be like trying to make a bridge of ashes. On the other hand, it can be a part of some more sophisticated scheme. ## A glitch **Case A.** Set the following evolver: - bitransform is `inv-sha2` with 64 rounds (i.e. full SHA2-256) and message length 60 bytes (480 bits) *OR* `inv-keccak` with hash length 256, 24 rounds (i.e. full SHA3-256), and the same message length; - population is 500; - architecture is `height = 1`, `size = 8000` (yes, most will be unused), `rands = 0`, `layers = 1` (single layer...), `cycles = 1`; - evolution parameters are default except for `batches = 16` (or `batches = 32`) and `mutat_prob = 0.001`; - settings: change `test` to `all` and `print` to `all`. Now, start the evolution, `$ e` in shell, and watch the `average` score column. It remains very close to `-128.0` however many epochs pass, you should never get more than `-127.95` or so (the more the `batches`, the more precision the law of large numbers provides). *This* outcome is quite expectable: not a single network of such simple architecture, — where each output bit is a function of 3 or less hash bits, — is able to invert these *preimage-resistant* cryptographic hashes statistically better than guessing a message (and thus its hash) randomly. The population as a whole shows the absence of improvement as well, of course. **Case B.** Same as Case A with a single difference: decrease message length to 40 bytes (320 bits — notice it is still larger than hash length); do not forget to `reset` the population after. Now, start the evolution again. After 100 or so epochs, the `average` score increases to nearly `-127.0` and remains there. It seemingly begins to improve at 20–30th epoch, when some networks gain (?) a very, very tiny ability to invert hashes better (?) than a random guessing; then their descendants spread in the population. Reset populations and repeat evolutions from Cases A and B several times to verify their stability. Or try it for SHA1, MD5 with similar results. What would that 1 bit (which is >10σ-deviation from mean of approximated normal distribution of the score) mean, were it not the header of this section... Hint: consider the inverter that, given a hash, tries several (even 2) fixed messages and selects one whose hash has maximum number of coinciding bits.