A re-imagined OCI image builder. - Take advantage of native snapshot/diff/overlay functionality of filesystems. Cheap calculation of multiple changesets/layers in a build history enable more granular layers. - Parallel builds based on a dataflow graph. - Selectively add, _remove_, mix-and-match arbitrary base layers below the current build task. Forget about amalgamation images to support your mixed toolchains, apply tools from multiple pre-built images one after another. - Define custom image manifests. Unlocked via flexible build tool layers, manifest files are built by the configuration via 'just another' step in a task. Select layers with your own code logic, cross-build multi-platform images to your hearts content, and more. ## How to use 1. You will need a fresh BTRFs subvolume mounted and owned by your current user. Additionally, unprivileged_userns_clone should be enabled and the kernel compiled with support for userns. ```bash # mount -t btrfs -o rw,space_cache,user_subvol_rm_allowed,noacl,noatime,subvol=/stromatekt /dev/sdx /home/stromatekt btrfs filesystem df /home/stromatekt cat /proc/sys/kernel/unprivileged_userns_clone | grep 1 cat /proc/config.gz | gunzip -c | grep CONFIG_USER_NS=y ``` 2. Create `~/.config/stromatekt/config.json` with the path to the subvolume mount adjusted accordingly. It should look similar to: ```json { "btrfs_root": "/home/stromatekt" } ``` 3. Prepare the example binary: ```bash pushd examples/prime && cargo build --release && popd ``` 4. Execute the example build: ```bash cargo run -- ./examples/parallel-dependency.json --no-dry-run ``` ## Motivation `docker build` is slow. The structure of a `Dockerfile` only permits a linear sequence of instructions. Moreover, `docker compose` is even slower. It will send, unpack, repack layers of images and local file system a _lot_. This can take a significant amount of time. The author has observed builds, with `Dockerfile` containing a single line of adding one link in the file system, taking >4 minutes. This is unacceptable as development latency. Further, caching of layers is inextricably bad due to the linear sequence logic. Let's address both. ## Structure of an OCI file The main data within an OCI container is an ordered collection of layers. Each layer is essentially a _diff_ of the last, usually in the form of a `tar` archive. (For slightly surprising reasons, a deletion is encoded as a file with special naming rules). When running a build, the builder will checkout the layers of the underlying container, run its commands, and finally find the diff to encode into a new layer. The two highly expensive filesystem tasks—checkout and diff—can be implemented much more efficiently if we can utilize the checkpoint and incremental diff logic of the filesystem itself. Furthermore, this task is probably IO-bound. Meaning, we _should_ seek to perform much of it in parallel wherever possible. Note that the layer sequence of an OCI image is not commutative. However, as long as the task definition itself opts-in by providing a canonical recombination order there shouldn't be any reproducibility problem from creating layers via a _different_ order. Example: - `A --(proc0)-> B0` yielding diff `C0` - `A --(proc1)-> B1` yielding diff `C1` - => export layers as: `[A, C0, C1]` Actually, we could even allow swapping `A` for a totally unrelated `A*` as long as the build manifest makes this explicit. For instance, to provide a security patch of an _underlying_ layer. Also, `proc0` and `proc1` can be executed with _entirely different_ underlying technologies (i.e. one as a x86 process, one a WASI executable). ## Planned extensions 1. Library files for build dependencies and maintainability. Define additional tasks in a separate file, then import specific changesets they define into another specification and let the dataflow resolver figure out a solution. 2. Reproducibility assertions via hashes, used for incremental builds.