steg86 ====== ![license](https://raster.shields.io/badge/license-MIT%20with%20restrictions-green.png) [![CI](https://github.com/woodruffw/steg86/actions/workflows/ci.yml/badge.svg)](https://github.com/woodruffw/steg86/actions/workflows/ci.yml) *steg86* is a format-agnostic [steganographic](https://en.wikipedia.org/wiki/Steganography) tool for x86 and AMD64 binaries. You can use it to hide information in compiled programs, regardless of executable format (PE, ELF, Mach-O, raw, &c). It has no performance *or* size impact on the files that it modifies (adding a message does *not* increase binary size or decrease execution speed). For more details on how *steg86* works, see the [Theory of Operation](#theory-of-operation) section. ## Installation `steg86` can be installed via `cargo`: ```bash $ cargo install steg86 ``` Alternatively, you can build it in this repository with `cargo build`: ```bash $ cargo build ``` ## Usage See `steg86 --help` for a full list of flags and subcommands. ### Profiling To profile a binary for steganographic suitability: ```bash $ steg86 profile /bin/bash Summary for /bin/bash: 175828 total instructions 27957 potential semantic pairs 19 potential commutative instructions 27944 bits of information capacity (3493 bytes, approx. 3KB) ``` ### Embedding To embed a message into a binary: ```bash $ steg86 embed /bin/bash ./bash.steg <<< "here is my secret message" ``` By default, `steg86 embed` writes its output to `$input.steg`. For example, `/lib64/ld-linux-x86-64.so.2` would become `/lib64/ld-linux-x86-64.so.2.steg`. `steg86 embed` will exit with a non-zero status if the message cannot be embedded (e.g., if it's too large). ### Extraction To extract a message from a binary: ```bash $ steg86 extract bash.steg > my_message $ cat message here is my secret message ``` `steg86 extract` will exit with a non-zero status if a message cannot be extracted (e.g., if it can't find one). ## Theory of Operation *steg86* takes advantage of one of x86's encoding peculiarities: the R/M field of the ModR/M byte: ``` 7 6 5 4 3 2 1 0 ------------------------- | MOD | REG | R/M | ------------------------- ``` The ModR/M byte is normally used to support both register-to-memory and memory-to-register variants of the same instruction. For example, the `MOV` instruction has the following variants (among many others): | opcode | mnemonic | ----------|------------------ | `89 /r` | `MOV r/m32,r32` | | `8B /r` | `MOV r32,r/m32` | Because the ModR/M field can encode *either* a memory addressing operation *or* a bare register, opcodes that support both register-to-memory and memory-to-register operations *also* support multiple encodings of register-to-register operations. For example, `mov eax, ebx` can be encoded as *either* `89 d8` *or* `8b c3` *without any semantic changes*. This gives us one bit of information per duplicated instruction semantic. Given enough register-to-register instructions with multiple encodings, we can hide entire messages with those bits. Additionally, because these semantically identical encodings are frequently the same size, we can modify *preexisting* binaries without having to fix relocations or RIP-relative addressing. *steg86* does primitive [binary translation](https://en.wikipedia.org/wiki/Binary_translation) to accomplish these goals. It uses [iced-x86](https://github.com/0xd4d/iced) for encoding and decoding, and [goblin](https://github.com/m4b/goblin) for binary format wrangling. ### Prior work The inspiration for *steg86* came from [@inventednight](https://github.com/inventednight), who described it as an adaptation of a similar idea (also theirs) for RISC-V binaries. The technique mentioned above is discussed in detail in [*Hydan: Hiding Information in Program Binaries*](http://web4.cs.columbia.edu/~angelos/Papers/hydan.pdf) (2004). *steg86* constitutes a separate discovery of Hydan's technique and was written entirely independently; the refinements discussed in the paper may or may not be more optimal than the ones implemented in *steg86*. ### Future improvements * *steg86* currently limits the embedded message to 16KB. This is a purely artificial limitation that could be resolved with some small format changes. * x86 (and AMD64) both have multi-byte NOPs, for alignment purposes. Additional information can be hidden in these in a few ways: * The `OF 1F /0` multi-byte NOP can be up to 9 bytes, of which up to 5 are free (SIB + 4-byte displacement). * There are longer NOPs (11, 15 bytes) that may also be usable. * Going beyond register-to-register duals and rewriting `add`/`sub`, as Hydan does.