[[enprot]] == Engyon: enprot image:https://github.com/riboseinc/enprot/workflows/tests/badge.svg["Build Status", link="https://github.com/riboseinc/enprot/actions?workflow=tests"] Enprot is a confidentiality processor for text and source code files. === Introduction and Tutorial Engyon Protected Text (EPT) is a human-editable annotation method that allows a text format document to contain finely grained cryptographic confidentiality and integrity features. Enprot requires a recent version of https://github.com/randombit/botan[botan]. Use `cargo build --release` to build the release binary `target/release/enprot` or `cargo run --` to build and run the debug version. Some dependencies may require the latest compiler. There is a simple help built-in with `-h` flag: [source,sh] ---- enprot$ cargo run -- -h Finished dev [unoptimized + debuginfo] target(s) in 0.04s Running `target/debug/enprot -h` enprot USAGE: enprot [FLAGS] [OPTIONS] ... FLAGS: -v, --verbose Produce more verbose output -q, --quiet Suppress unnecessary output -h, --help Prints help information -V, --version Prints version information OPTIONS: -l, --left-separator Specify left separator in parsing -r, --right-separator Specify right separator in parsing -s, --store ... Store (unencrypted) WORD segments to CAS -f, --fetch ... Fetch (unencrypted) WORD segments to CAS -k, --key ... Specify a secret PASSWORD for WORD -e, --encrypt ... Encrypt WORD segments -E, --encrypt-store ... Encrypt and store WORD segments -d, --decrypt ... Decrypt WORD segments -c, --casdir Directory for CAS files (default "cas" if exists, else ".") -p, --prefix Use PREFIX for output filenames -o, --output ... Specify output file for previous input ARGS: ... The input file(s) enprot$ ---- All of the commands also have have long variants; see `src/main.rs`. A simple example is contained in `sample/test.ept`: ---- hello, this is a test file // <( BEGIN GEHEIM )> Secret line 1 Secret line 2 // <( BEGIN Agent_007 )> James Bond // <( END Agent_007 )> // <( END GEHEIM )> // <( BEGIN Agent_007 )> Super secret line 3 // <( END Agent_007 )> ---- As can be seen, the most elementary EPT markup is in comments of the "`host`" language (which is C or AsciiDoc in this case) and consists of BEGIN..END segments. Each such segment has a keyword (or WORD) associated with it. Keywords can be used inside other keywords to form a tree-like structure. When invoke `enprot` on a syntactically correct file such as this one, nothing happens: [source,sh] ---- enprot$ ./target/debug/enprot sample/test.ept enprot$ ---- In fact the markup has been chosen specifically in a way that most input files would be syntactically correct without modification. The markup is contained between left and right separators that can be specified with `-l` and `-r` flags. Here are some comment-hiding suggestions for different languages: |=== | Format | `LEFT_SEP` | `RIGHT_SEP` | Raw text format | `"<("`, | `")>"` | AsciiDoc and C++ code | `"// <("` | `")>"` | MarkDown, XML, HTML | `"<-- <("` | `")> -->"` | (La)TeX and similar | `"\n% <("` | `")>"` |=== Note that the left separator must start the line (after whitespace). The right separator must currently also be on the same line. Adding verbosity with `-v` reveals what the system is doing: [source,sh] ---- enprot$ ./target/debug/enprot sample/test.ept -v LEFT_SEP='// <(' RIGHT_SEP=')>' casdir = 'cas' Reading sample/test.ept Transforming sample/test.ept Writing sample/test.ept enprot$ ---- We see the default setting for left and right separators. Furthermore we see that the file is read in (parsed), transformed (which is a no-op in this case), and then written back (synthesized), overwriting the original file. This transformation behavior is controlled by a multitude of flags (as seen before). ==== Content Addressed Storage In above example we see `casdir = 'cas'`. This means that the default directory for CAS (Content Addressed Storage) is `cas` under current directory. If you do not have a `cas` directory set up, these work files are written to the current directory. This is not a problem, but creates clutter. We suggest that you do a `mkdir cas`. The operation of CAS can be demonstrated simply by using the `-s` (store) flag, which hides away (sanitizes) a part of the text file. ---- enprot$ ./target/debug/enprot sample/test.ept -s GEHEIM enprot$ cat sample/test.ept hello, this is a test file // <( STORED GEHEIM cea67c3ef34ff899793b557e9178c1b97bbcfe9722df2f6d35d2d0c91d2c1fe4 )> // <( BEGIN Agent_007 )> Super secret line 3 // <( END Agent_007 )> enprot$ ls cas cea67c3ef34ff899793b557e9178c1b97bbcfe9722df2f6d35d2d0c91d2c1fe4 enprot$ ---- We can see that the entire section between BEGIN GEHEIM and END GEHEIM has disappeared and has been replaced with a single STORED GEHEIM. The contents are actually stored in a file with a long hexadecimal filename. The properties of cryptographic functions guarantee that no two different contents can have the same hash. This removes much of the problem of version control as the file can be referred directly by it's content. One could now send this file for editing, and new text could be added around the GEHEIM sections. If the original CAS files are around, the same hidden part can be recovered simply with a `-f` (fetch) instruction: [source,sh] ---- enprot$ ./target/debug/enprot sample/test.ept -f GEHEIM ---- Now the contents of `test.ept` are exactly as they were before, and the GEHEIM section is again contained in a BEGIN.. END enclosure. With all parameters, multiple keywords can be joined with a comma: [source,sh] ---- enprot sample/test.ept -s Agent_007,GEHEIM enprot$ cat sample/test.ept hello, this is a test file // <( STORED GEHEIM cea67c3ef34ff899793b557e9178c1b97bbcfe9722df2f6d35d2d0c91d2c1fe4 )> // <( STORED Agent_007 575d69f5b0034279bc3ef164e94287e6366e9df76729895a302a66a8817cf306 )> enprot$ ---- We see the the first GEHEIM is again stored under the same filename. In fact it was not even overwritten because the system checked that a file with that name already existed in the CAS directory, so there is no need. Such determinism is a important property of the CAS. Even if you lose the CAS files related to some sanitized version of the document, you may regenerate the exactly same ones if you have the original unsanitized document. Now the original document can be restored with [source,sh] ---- enprot$ ./target/debug/enprot sample/test.ept -f Agent_007 -f notexistent,GEHEIM ---- You see that `-f` parameter can be given multiple times. In fact it is possible to even mix `-s` and `-f` statements on the same command if you want to sanitize some keywords while unsanitizing others. However specifying both `-s` and `-f` for the _same_ keyword isn't very helpful; the keyword will be unsanitized and resanitized on alternative runs. ==== Encryption and Decryption We may encrypt sections in a way that keeps the ciphertext entirely in the document itself. Assuming that `sample/test.ept` is at it's original state: [source,sh] ---- enprot$ ./target/debug/enprot sample/test.ept -e Agent_007 Password for Agent_007: Repeat password for Agent_007: enprot$ cat sample/test.ept hello, this is a test file // <( BEGIN GEHEIM )> Secret line 1 Secret line 2 // <( ENCRYPTED Agent_007 )> // <( DATA lEsVpN3ES6rj0sbxrDm30EgMpYCc+yKM2i2Z )> // <( END Agent_007 )> // <( END GEHEIM )> // <( ENCRYPTED Agent_007 )> // <( DATA C0nBhV6V5yVExLOgvpK8xzUluc08lsr7wwBhx4ENMDrJU3pA )> // <( END Agent_007 )> enprot$ ---- In the above example I entered "bond" in both password prompts. Keys can also be passed from command line with the `-k` flag: [source,sh] ---- enprot$ ./target/debug/enprot sample/test.ept -e GEHEIM -k GEHEIM=james enprot$ cat sample/test.ept hello, this is a test file // <( ENCRYPTED GEHEIM )> // <( DATA 4reYea85vTqNzzf7eon3x/LHs6iLy3GPgSZvsX7l0MhqdVnuIe5y3poxqvQxFqYT )> // <( DATA B1np55+m8WlPDtzMt+SMPEyfPIKAeqo+tAWS7ftfJmAqSswibIqRJh0jXO6nBDvK )> // <( DATA 4EclPifsb89G2i5vu8dfFkmQT8uj2o71UAohLPeY8vX2qksDJGm99pzZwm5hoXUm )> // <( DATA VVYf )> // <( END GEHEIM )> // <( ENCRYPTED Agent_007 )> // <( DATA C0nBhV6V5yVExLOgvpK8xzUluc08lsr7wwBhx4ENMDrJU3pA )> // <( END Agent_007 )> enprot$ ---- Decryption can be performed exactly the same way using the `-d` command: [source,sh] ---- enprot$ ./target/debug/enprot sample/test.ept -d Agent_007,GEHEIM -k GEHEIM=james -k Agent_007=bond enprot$ cat sample/test.ept hello, this is a test file // <( BEGIN GEHEIM )> Secret line 1 Secret line 2 // <( ENCRYPTED Agent_007 )> // <( DATA lEsVpN3ES6rj0sbxrDm30EgMpYCc+yKM2i2Z )> // <( END Agent_007 )> // <( END GEHEIM )> // <( BEGIN Agent_007 )> Super secret line 3 // <( END Agent_007 )> enprot% ---- We see that only one layer of encryption was removed from GEHEIM. You may use the exactly same command for second iteration to reveal the original file. ==== Working on Source Code The system allows one work on text-format documents, but also on program source code. For example the source code of Enprot has an encrypted portion in its help message: [source,sh] ---- enprot$ ./target/debug/enprot -d AUTHOR -k AUTHOR=markku src/lib.rs enprot$ cargo run -- -h Compiling enprot v0.1.0 (file:///home/mjos/Desktop/lab/enprot) Finished dev [unoptimized + debuginfo] target(s) in 2.17s Running `target/debug/enprot -h` Written 2018 by Markku-Juhani O. Saarinen [...] enprot$ ---- Notice how that authorship information appeared at the end of help text when cargo recompiled the source code (since it was "touched"). This is because some source lines originally read: ---- // <( ENCRYPTED AUTHOR )> // <( DATA X417HVMRRAs6Z1xGo5yY4TxUQ2tpAHEKQ1sg9+kfku5uUikK3y2tODtsUiGqfRGW )> // <( DATA xUCGYFu02BCdqPM7uuX5UNvbfrLvKkj6gLYwg/cr42PJmr4o5xnw1qo= )> // <( END AUTHOR )> ---- Which was decrypted to ---- // <( BEGIN AUTHOR )> println!("Written 2018 by Markku-Juhani O. Saarinen "); // <( END AUTHOR )> ---- Without modifying anything else in the source code. ==== Encrypted Stashing If we combine encryption `-e WORD` and CAS storage `-s WORD`, the ciphertext is stored into CAS in encryption form. One may use `-E` flag to specify both predicates at once. [source,sh] ---- enprot$ ./target/debug/enprot sample/test.ept -E GEHEIM Password for GEHEIM: Repeat password for GEHEIM: enprot$ cat sample/test.ept hello, this is a test file // <( ENCRYPTED GEHEIM 12d24bf3dbebfe5feb7684efdb1d98391c4b0afd809a8bc87f3f8e6f75e59651 )> // <( BEGIN Agent_007 )> Super secret line 3 // <( END Agent_007 )> enprot$ ---- Here I left out the `-k` definition so Enprot asked me to enter a password. The `-d` flag will work the same way when the ciphertext is in CAS or in local DATA clauses. [source,sh] ---- enprot$ ./target/debug/enprot sample/test.ept -d GEHEIM Password for GEHEIM: enprot$ ---- ==== Multi-File Processing Since files are transformed in place, you can use wildcards to process a large number of files at once. You will be asked for passwords only once. To process a file and output to a different filename, use `-o`: [source,sh] ---- enprot$ ./target/debug/enprot input.ept -o output.ept ---- To direct output to an another directory, or to add a prefix flag `-p PREFIX`. The PREFIX is literally added before each output file. Note that if input filenames have a relative path, that remains unchanged. ---- enprot$ ./target/debug/enprot -p output/ file.* ---- Will read files `file.1`, `file.2`, etc and write them into directory `output` (if it exists). However [source,sh] ---- enprot$ ./target/debug/enprot -p output file.* ---- Will produce files `outputfile.1`, `outputfile.2`, etc. ==== Cryptography: Symmetric Authenticated Encryption Due to its minimal message expansion and non-sequential nature of data being encrypted, a nonce-reuse/misuse resistant Authenticated Encryption with Associated Data (AEAD) mechanism is used. We have chosen to use AES-256 in SIV (Synthetic Initialization Vector) mode [RFC5297]. A SIV ciphertext is always 16 bytes larger than plaintext and the 16-byte authentication tag also serves as the "`synthetic IV`". All hash function computations for CAS utilize SHA-3 [FIPS202] variants. It is also used to derive keying material from passwords.