| Crates.io | embeddenator |
| lib.rs | embeddenator |
| version | 0.20.0-alpha.1 |
| created_at | 2026-01-09 23:04:02.043032+00 |
| updated_at | 2026-01-09 23:04:02.043032+00 |
| description | Sparse ternary VSA holographic computing substrate |
| homepage | https://github.com/tzervas/embeddenator |
| repository | https://github.com/tzervas/embeddenator |
| max_upload_size | |
| id | 2033120 |
| size | 20,682,828 |
Version 0.20.0 | Production Rust implementation of sparse ternary VSA (Vector Symbolic Architecture) holographic filesystem and computing substrate.
Author: Tyler Zervas tz-dev@vectorweight.com
License: MIT
Embeddenator has been refactored into a modular component architecture with 6 independent library crates:
๐ Documentation: Component Architecture | Local Development | Versioning
๐ณ Docker: Multi-arch images available at ghcr.io/tzervas/embeddenator (amd64 + arm64)
.engram files (holographic root state)--max-chunks-per-node cap for bounded per-node indexing costEmbeddenator uses sparse ternary vectors to represent data holographically:
(A โ B) โ C โ A โ (B โ C)A โ A โ I (self-inverse)The ternary representation {-1, 0, +1} is hardware-optimized for 64-bit CPUs:
Scalability through Adaptive Sparsity:
An engram is a holographic encoding of an entire filesystem or dataset:
Security: The codebook does NOT store plaintext data. Chunks are encoded using a VSA-lens reversible encoding mechanism that is:
See ADR-007 for details on the VSA-as-a-lens security model.
Package factoralization enables selective manipulation of packages within holographic containers:
See ADR-005 for technical details on hologram factoralization, balanced ternary encoding, and 64-bit register optimization.
# Clone the repository
git clone https://github.com/tzervas/embeddenator.git
cd embeddenator
# Build with Cargo
cargo build --release
# Or use the orchestrator
python3 orchestrator.py --mode build --verbose
# Ingest a directory into an engram
cargo run --release -- ingest -i ./input_ws -e root.engram -m manifest.json -v
# Extract from an engram
cargo run --release -- extract -e root.engram -m manifest.json -o ./output -v
# Query similarity
cargo run --release -- query -e root.engram -q ./test_file.txt -v
The orchestrator provides unified build, test, and deployment workflows:
# Quick start: build, test, and package everything
python3 orchestrator.py --mode full --verbose -i
# Run integration tests
python3 orchestrator.py --mode test --verbose
# Build Docker image
python3 orchestrator.py --mode package --verbose
# Display system info
python3 orchestrator.py --mode info
# Clean all artifacts
python3 orchestrator.py --mode clean
Embeddenator provides the following commands for working with holographic engrams:
embeddenator --helpGet comprehensive help information:
# Show main help with examples
embeddenator --help
# Show detailed help for a specific command
embeddenator ingest --help
embeddenator extract --help
embeddenator query --help
embeddenator query-text --help
embeddenator bundle-hier --help
ingest - Create Holographic EngramProcess one or more files and/or directories and encode them into a holographic engram.
embeddenator ingest [OPTIONS] --input <PATH>...
Required:
-i, --input <PATH>... Input file(s) and/or directory(ies) to ingest
Options:
-e, --engram <FILE> Output engram file [default: root.engram]
-m, --manifest <FILE> Output manifest file [default: manifest.json]
-v, --verbose Enable verbose output with progress and statistics
-h, --help Print help information
Examples:
# Basic ingestion
embeddenator ingest -i ./myproject -e project.engram -m project.json
# Mix files and directories (repeat -i/--input)
embeddenator ingest -i ./src -i ./README.md -e project.engram -m project.json
# With verbose output
embeddenator ingest -i ~/Documents -e docs.engram -v
# Custom filenames
embeddenator ingest --input ./data --engram backup.engram --manifest backup.json
What it does:
extract - Reconstruct FilesBit-perfect reconstruction of all files from an engram.
embeddenator extract [OPTIONS] --output-dir <DIR>
Required:
-o, --output-dir <DIR> Output directory for reconstructed files
Options:
-e, --engram <FILE> Input engram file [default: root.engram]
-m, --manifest <FILE> Input manifest file [default: manifest.json]
-v, --verbose Enable verbose output with progress
-h, --help Print help information
Examples:
# Basic extraction
embeddenator extract -e project.engram -m project.json -o ./restored
# With default filenames
embeddenator extract -o ./output -v
# From backup
embeddenator extract --engram backup.engram --manifest backup.json --output-dir ~/restored
What it does:
query - Similarity SearchCompute cosine similarity between a query file and engram contents.
embeddenator query [OPTIONS] --query <FILE>
Required:
-q, --query <FILE> Query file or pattern to search for
Options:
-e, --engram <FILE> Engram file to query [default: root.engram]
--hierarchical-manifest <FILE> Optional hierarchical manifest (selective unfolding)
--sub-engrams-dir <DIR> Directory of `.subengram` files (used with --hierarchical-manifest)
--k <K> Top-k results to print for codebook/hierarchical search [default: 10]
-v, --verbose Enable verbose output with similarity details
-h, --help Print help information
Examples:
# Query similarity
embeddenator query -e archive.engram -q search.txt
# With verbose output
embeddenator query -e data.engram -q pattern.bin -v
# Using default engram
embeddenator query --query testfile.txt -v
What it does:
If --hierarchical-manifest and --sub-engrams-dir are provided, it also runs a store-backed hierarchical query and prints the top hierarchical matches.
Similarity interpretation:
query-text - Similarity Search (Text)Encode a literal text string as a query vector and run the same retrieval path as query.
embeddenator query-text -e root.engram --text "search phrase" --k 10
# With hierarchical selective unfolding:
embeddenator query-text -e root.engram --text "search phrase" \
--hierarchical-manifest hier.json --sub-engrams-dir ./sub_engrams --k 10
bundle-hier - Build Hierarchical Retrieval ArtifactsBuild a hierarchical manifest and a directory of sub-engrams from an existing flat root.engram + manifest.json. This enables store-backed selective unfolding queries.
embeddenator bundle-hier -e root.engram -m manifest.json \
--out-hierarchical-manifest hier.json \
--out-sub-engrams-dir ./sub_engrams
# Optional: deterministically shard large nodes (bounds per-node indexing cost)
embeddenator bundle-hier -e root.engram -m manifest.json \
--max-chunks-per-node 2000 \
--out-hierarchical-manifest hier.json \
--out-sub-engrams-dir ./sub_engrams
docker build -f Dockerfile.tool -t embeddenator-tool:latest .
# Ingest data
docker run -v $(pwd)/input_ws:/input -v $(pwd)/workspace:/workspace \
embeddenator-tool:latest \
ingest -i /input -e /workspace/root.engram -m /workspace/manifest.json -v
# Extract data
docker run -v $(pwd)/workspace:/workspace -v $(pwd)/output:/output \
embeddenator-tool:latest \
extract -e /workspace/root.engram -m /workspace/manifest.json -o /output -v
Build a container from an engram:
# First, create an engram of your desired filesystem
cargo run --release -- ingest -i ./rootfs -e workspace/root.engram -m workspace/manifest.json
# Build the holographic container
docker build -f Dockerfile.holographic -t my-holographic-os:latest .
Embeddenator provides pre-built holographic OS images with a dual versioning strategy:
LTS Stable Releases (Long-Term Support):
v0.2.0-lts)Testing/Sid/Rolling Releases (Bleeding Edge):
-nightly suffix (e.g., v0.2.0-nightly-20250115)Pull images:
# LTS stable images
docker pull ghcr.io/tzervas/embeddenator-holographic-debian-stable-amd64:latest
docker pull ghcr.io/tzervas/embeddenator-holographic-ubuntu-stable-arm64:latest
# Nightly bleeding-edge images
docker pull ghcr.io/tzervas/embeddenator-holographic-debian-testing-amd64:nightly
docker pull ghcr.io/tzervas/embeddenator-holographic-ubuntu-rolling-arm64:nightly
# Specific dated nightly
docker pull ghcr.io/tzervas/embeddenator-holographic-debian-sid-amd64:v0.2.0-nightly-20250115
Available OS Configurations:
| OS | Version | LTS | Nightly | Architectures |
|---|---|---|---|---|
| Debian | 12 Bookworm | โ | โ | amd64, arm64 |
| Debian | Testing | โ | โ | amd64, arm64 |
| Debian | Sid | โ | โ | amd64, arm64 |
| Ubuntu | 24.04 LTS | โ | โ | amd64, arm64 |
| Ubuntu | Devel | โ | โ | amd64, arm64 |
| Ubuntu | Rolling | โ | โ | amd64, arm64 |
Embeddenator guarantees:
Typical performance characteristics:
SparseVec: Sparse ternary vector implementation
pos: Indices with +1 valueneg: Indices with -1 valueEmbrFS: Holographic filesystem layer
CLI: Command-line interface
Comprehensive architectural documentation is available in docs/adr/:
ADR-001: Sparse Ternary VSA
ADR-002: Multi-Agent Workflow System
ADR-003: Self-Hosted Runner Architecture
ADR-004: Holographic OS Container Design
ADR-005: Hologram-Based Package Isolation
ADR-006: Dimensionality and Sparsity Scaling
ADR-007: Codebook Security and Reversible Encoding
See docs/adr/README.md for the complete ADR index.
Engram (.engram):
Manifest (.json):
Comprehensive API documentation is available:
# Generate and open documentation locally
cargo doc --open
# Or use the automated script
./generate_docs.sh
# View online (after publishing)
# https://docs.rs/embeddenator
The documentation includes:
# Recommended: everything Cargo considers testable (lib/bin/tests/examples/benches)
cargo test --workspace --all-targets
# Doc tests only
cargo test --doc
# Optimized build tests (useful before benchmarking)
cargo test --release --workspace --all-targets
# Feature-gated correctness/perf gates
cargo test --workspace --all-targets --features "bt-phase-2 proptest"
# Long-running/expensive tests are explicitly opt-in:
# - QA memory scaling (requires env var + ignored flag)
EMBEDDENATOR_RUN_QA_MEMORY=1 cargo test --features qa --test memory_scaled -- --ignored --nocapture
# - Multi-GB soak test (requires env var + ignored flag)
EMBEDDENATOR_RUN_SOAK=1 cargo test --release --features soak-memory --test soak_memory -- --ignored --nocapture
# Integration tests via orchestrator
python3 orchestrator.py --mode test --verbose
# Full test suite
python3 orchestrator.py --mode full --verbose
Notes:
cargo bench is expected: Cargo runs the unit test
harness in libtest's --bench mode, which skips normal #[test] functions (it prints i for each).
Use cargo test (commands above) to actually execute tests.cargo test --workspace --all-targets will also compile/run Criterion benches in a fast "smoke" mode
(they print Testing ... Success). This is intended to catch broken benches early.The project uses separated CI/CD workflows for optimal performance and reliability:
# Test CI build locally with monitoring
./ci_build_monitor.sh linux/amd64 build 300
# Monitor for specific timeout (in seconds)
./ci_build_monitor.sh linux/amd64 full 900
CI Workflow Structure:
Three separate workflows eliminate duplication and provide clear responsibilities:
CI Features:
CARGO_BUILD_JOBSArchitecture Support:
| Architecture | Status | Runner Type | Trigger | Notes |
|---|---|---|---|---|
| amd64 (x86_64) | โ Production | GitHub-hosted (ubuntu-latest) | Every PR (required check) | Stable, 5-7min |
| arm64 (aarch64) | ๐ง Ready | Self-hosted (pending deployment) | Manual only | Will enable on merge to main |
ARM64 Deployment Roadmap:
["self-hosted", "linux", "ARM64"]Why Self-Hosted for ARM64?
See .github/workflows/README.md for complete CI/CD documentation and ARM64 setup guide.
Embeddenator includes a comprehensive Python-based automation system for managing GitHub Actions self-hosted runners with complete lifecycle management and multi-architecture support:
Features:
Supported Architectures:
Quick Start:
# 1. Copy and configure environment file
cp .env.example .env
# Edit .env and set GITHUB_REPOSITORY and GITHUB_TOKEN
# 2. Run in auto mode (registers, starts, monitors, auto-deregisters when idle)
python3 runner_manager.py run
# 3. Or use manual mode (keeps running until stopped)
RUNNER_MODE=manual python3 runner_manager.py run
Multi-Architecture Examples:
# Deploy ARM64 runners on x86_64 hardware (with emulation, auto-detect runtime)
RUNNER_TARGET_ARCHITECTURES=arm64 python3 runner_manager.py run
# Deploy runners for all architectures
RUNNER_TARGET_ARCHITECTURES=x64,arm64,riscv64 RUNNER_COUNT=6 python3 runner_manager.py run
# Deploy with automatic QEMU installation (requires sudo)
RUNNER_EMULATION_AUTO_INSTALL=true RUNNER_TARGET_ARCHITECTURES=arm64 python3 runner_manager.py run
# Use specific emulation method (docker, podman, or qemu)
RUNNER_EMULATION_METHOD=podman RUNNER_TARGET_ARCHITECTURES=arm64 python3 runner_manager.py run
# Use Docker for emulation
RUNNER_EMULATION_METHOD=docker RUNNER_TARGET_ARCHITECTURES=arm64,riscv64 python3 runner_manager.py run
Individual Commands:
# Register runner(s)
python3 runner_manager.py register
# Start runner service(s)
python3 runner_manager.py start
# Monitor and manage lifecycle
python3 runner_manager.py monitor
# Check status
python3 runner_manager.py status
# Stop and deregister
python3 runner_manager.py stop
Advanced Usage:
# Deploy multiple runners
python3 runner_manager.py run --runner-count 4
# Custom labels
python3 runner_manager.py register --labels self-hosted,linux,ARM64,large
# Auto-deregister after 10 minutes of inactivity
RUNNER_IDLE_TIMEOUT=600 python3 runner_manager.py run
Configuration Options:
Key environment variables (see .env.example for full list):
GITHUB_REPOSITORY - Repository to register runners for (required)GITHUB_TOKEN - Personal access token with repo scope (required)RUNNER_MODE - Deployment mode: auto (default) or manualRUNNER_IDLE_TIMEOUT - Auto-deregister timeout in seconds (default: 300)RUNNER_COUNT - Number of runners to deploy (default: 1)RUNNER_LABELS - Comma-separated runner labelsRUNNER_EPHEMERAL - Enable ephemeral runners (deregister after one job)RUNNER_TARGET_ARCHITECTURES - Target architectures: x64, arm64, riscv64 (comma-separated)RUNNER_ENABLE_EMULATION - Enable QEMU emulation for cross-architecture (default: true)RUNNER_EMULATION_METHOD - Emulation method: auto, qemu, docker, podman (default: auto)RUNNER_EMULATION_AUTO_INSTALL - Auto-install QEMU if missing (default: false, requires sudo)See .env.example for complete configuration documentation.
Deployment Modes:
Auto Mode (default): Runners automatically deregister after being idle for a specified timeout
Manual Mode: Runners keep running until manually stopped
See .github/workflows/README.md for complete CI/CD documentation and ARM64 setup guide.
embeddenator/
โโโ Cargo.toml # Rust dependencies
โโโ src/
โ โโโ main.rs # Complete implementation
โโโ tests/
โ โโโ e2e_regression.rs # 6 E2E tests (includes critical engram modification test)
โ โโโ integration_cli.rs # 7 integration tests
โ โโโ unit_tests.rs # 11 unit tests
โโโ Dockerfile.tool # Static binary packaging
โโโ Dockerfile.holographic # Holographic OS container
โโโ orchestrator.py # Unified build/test/deploy
โโโ runner_manager.py # Self-hosted runner automation entry point (NEW)
โโโ runner_automation/ # Runner automation package (NEW)
โ โโโ __init__.py # Package initialization (v1.1.0)
โ โโโ config.py # Configuration management
โ โโโ github_api.py # GitHub API client
โ โโโ installer.py # Runner installation
โ โโโ runner.py # Individual runner lifecycle
โ โโโ manager.py # Multi-runner orchestration
โ โโโ emulation.py # QEMU emulation for cross-arch (NEW)
โ โโโ cli.py # Command-line interface
โ โโโ README.md # Package documentation
โโโ .env.example # Runner configuration template (NEW)
โโโ ci_build_monitor.sh # CI hang detection and monitoring
โโโ generate_docs.sh # Documentation generation
โโโ .github/
โ โโโ workflows/
โ โโโ ci-pre-checks.yml # Pre-build validation (every PR)
โ โโโ ci-amd64.yml # AMD64 build (required for merge)
โ โโโ ci-arm64.yml # ARM64 build (self-hosted, pending)
โ โโโ build-holographic-os.yml# OS container builds
โ โโโ build-push-images.yml # Multi-OS image pipeline
โ โโโ nightly-builds.yml # Nightly bleeding-edge builds
โ โโโ README.md # Complete CI/CD documentation
โโโ input_ws/ # Example input (gitignored)
โโโ workspace/ # Build artifacts (gitignored)
โโโ README.md # This file
We welcome contributions to Embeddenator! Here's how you can help:
git clone https://github.com/YOUR_USERNAME/embeddenator.git
cd embeddenator
git checkout -b feature/my-new-feature
src/ modulestests/integration_*.rstests/e2e_*.rs# Run all Rust tests
cargo test
# Run integration tests via orchestrator
python3 orchestrator.py --mode test --verbose
# Run full validation suite
python3 orchestrator.py --mode full --verbose
# Run Clippy linter (zero warnings required)
cargo clippy -- -D warnings
# Format code
cargo fmt
# Check Python syntax
python3 -m py_compile *.py
# Build Docker images
docker build -f Dockerfile.tool -t embeddenator-tool:test .
# Test on different architectures
python3 orchestrator.py --platform linux/arm64 --mode test
cargo fmt).unwrap() in library codeWe especially welcome contributions in these areas:
When reporting bugs, please include:
embeddenator --version)rustc --version)--verbose flag)Thank you for contributing to Embeddenator! ๐
Modify chunk_size in EmbrFS::ingest_file for different trade-offs:
let chunk_size = 8192; // Larger chunks = better compression, slower reconstruction
For very large datasets, implement multi-level engrams:
// Level 1: Individual files
// Level 2: Directory summaries
// Level 3: Root engram of all directories
Combine multiple engrams:
let combined = engram1.root.bundle(&engram2.root);
// Now combined contains both datasets holographically
Reduce chunk size or process files in batches:
# Process directories separately
for dir in input_ws/*/; do
cargo run --release -- ingest -i "$dir" -e "engrams/$(basename $dir).engram"
done
Verify manifest and engram are from the same ingest:
# Check manifest metadata
jq '.total_chunks' workspace/manifest.json
# Re-ingest if needed
cargo run --release -- ingest -i ./input_ws -e root.engram -m manifest.json -v
cargo build --release is 10-100x faster--features simd and RUSTFLAGS="-C target-cpu=native"
# Build with SIMD optimizations
RUSTFLAGS="-C target-cpu=native" cargo build --release --features simd
See docs/SIMD_OPTIMIZATION.md for details on 2-4x query speedupMIT License - see LICENSE file for details
embeddenator --help)examples/ directory (coming soon) for usage patternsQ: What file types are supported?
A: All file types - text, binary, executables, images, etc. Embeddenator is file-format agnostic.
Q: Is the reconstruction really bit-perfect?
A: Yes! All files are reconstructed exactly byte-for-byte. We have 23 tests verifying this.
Q: Can I combine multiple engrams?
A: Yes! Use VSA bundle operations to create holographic superpositions. See "Algebraic Operations" in the README.
Q: What's the maximum data size?
A: Theoretically unlimited with hierarchical encoding. Tested with datasets up to 1M+ tokens.
Q: How does this compare to compression?
A: Embeddenator focuses on holographic representation, not compression. Engram sizes are typically 40-50% of original data, but the key benefit is algebraic operations on encoded data.
When reporting bugs, please include:
embeddenator --versionrustc --version--verbose flag)If you discover a security vulnerability, please email security@embeddenator.dev (or create a private security advisory on GitHub) rather than opening a public issue.
Built with โค๏ธ using Rust, Docker, and holographic computing principles.