embeddenator

Crates.ioembeddenator
lib.rsembeddenator
version0.20.0-alpha.1
created_at2026-01-09 23:04:02.043032+00
updated_at2026-01-09 23:04:02.043032+00
descriptionSparse ternary VSA holographic computing substrate
homepagehttps://github.com/tzervas/embeddenator
repositoryhttps://github.com/tzervas/embeddenator
max_upload_size
id2033120
size20,682,828
Tyler Zervas (tzervas)

documentation

https://docs.rs/embeddenator

README

Embeddenator โ€” Holographic Computing Substrate

Version 0.20.0 | Production Rust implementation of sparse ternary VSA (Vector Symbolic Architecture) holographic filesystem and computing substrate.

Author: Tyler Zervas tz-dev@vectorweight.com
License: MIT

CI License: MIT

Component Architecture

Embeddenator has been refactored into a modular component architecture with 6 independent library crates:

๐Ÿ“š Documentation: Component Architecture | Local Development | Versioning

๐Ÿณ Docker: Multi-arch images available at ghcr.io/tzervas/embeddenator (amd64 + arm64)

Features

  • Native Engram Operations: Work directly on .engram files (holographic root state)
  • Bit-Perfect Reconstruction: 100% ordered text and binary file recovery
  • Pure Algebraic Mutations: Bundle/bind/scalar operations on single root engram
  • Hierarchical Chunked Encoding: Designed for TB-scale data
  • SIMD Acceleration: Optional AVX2/NEON optimizations for 2-4x query speedup
  • CLI + Docker: Complete toolchain with multi-arch container support
  • Holographic OS Containers: Full Debian and Ubuntu distributions encoded as engrams
  • Dual Versioning: LTS stable releases + nightly bleeding-edge builds
  • Production-Grade: Comprehensive test suite with zero clippy warnings
  • Multi-Architecture: amd64 supported; arm64 supported via self-hosted runners (CI validation pending)
  • Test Runner: Intelligent validation with debug logging (v0.2.0)
  • AI Assistant Integration: Architecture for specialized coding and research assistants with embeddenator-enhanced retrieval

What's New in v0.3.0

  • ๐ŸŽฏ Deterministic hierarchical artifacts - Stable manifest/sub-engram generation with sorted iteration
  • ๐Ÿ“Š Optional node sharding - --max-chunks-per-node cap for bounded per-node indexing cost
  • ๐Ÿ“‚ Multi-input ingest - Ingest files and/or multiple directories with automatic namespacing
  • โšก Query performance - Reusable codebook index across shift-sweep + increased candidate pool
  • ๐Ÿงช Expanded test coverage - New determinism and E2E hierarchical artifact tests
  • ๐Ÿ“š Updated documentation - CLI reference, hierarchical format, and selective unfolding guides

What's New in v0.2.0

  • โœจ 6 comprehensive E2E regression tests including critical engram modification test
  • ๐Ÿงช Comprehensive test suite (unit + integration + e2e + doc tests)
  • ๐Ÿ” Intelligent test runner with accurate counting and debug mode
  • ๐Ÿ“ฆ Dual versioning strategy for OS builds (LTS + nightly)
  • ๐ŸŽฏ Zero clippy warnings (29 fixes applied)
  • ๐Ÿง Extended OS support: Debian 12 LTS, Debian Testing/Sid, Ubuntu 24.04 LTS, Ubuntu Devel/Rolling
  • ๐Ÿš€ Native amd64 CI (required pre-merge check) + arm64 ready for self-hosted runners
  • ๐Ÿ“š Automated documentation with rustdoc and 9 doc tests

Core Concepts

Vector Symbolic Architecture (VSA)

Embeddenator uses sparse ternary vectors to represent data holographically:

  • Bundle (โŠ•): Associative superposition - (A โŠ• B) โŠ• C โ‰ˆ A โŠ• (B โŠ• C)
  • Bind (โŠ™): Non-commutative composition - A โŠ™ A โ‰ˆ I (self-inverse)
  • Cosine Similarity: Algebraic cleanup - correct match >0.75, noise <0.3

The ternary representation {-1, 0, +1} is hardware-optimized for 64-bit CPUs:

  • 39-40 trits encode optimally in a 64-bit register (39 for signed, 40 for unsigned)
  • No SIMD extensions required (AVX/AVX2 optional for acceleration)
  • Based on balanced ternary mathematics for efficient computation

Scalability through Adaptive Sparsity:

  • Current: 10,000 dimensions @ ~1% sparsity (200 non-zero elements)
  • Balanced: 50,000 dimensions @ 0.4% sparsity (200 non-zero, 100ร— better collision resistance)
  • High-precision: 100,000 dimensions @ 0.2% sparsity (200 non-zero, 10,000ร— better collision resistance)
  • Key insight: Constant non-zero elements โ†’ constant computational cost regardless of dimensionality
  • See ADR-006 for detailed analysis

Engrams

An engram is a holographic encoding of an entire filesystem or dataset:

  • Single root vector containing superposition of all chunks
  • Secure codebook with VSA-lens encoded data (not plaintext)
  • Manifest tracking file structure and metadata

Security: The codebook does NOT store plaintext data. Chunks are encoded using a VSA-lens reversible encoding mechanism that is:

  • Mathematically trivial to decode WITH the master key
  • Computationally infeasible without the master key
  • Quantum resistant (no algebraic structure for quantum algorithms)
  • Enables selective decryption (decrypt only needed chunks)

See ADR-007 for details on the VSA-as-a-lens security model.

Hologram Package Isolation (Advanced)

Package factoralization enables selective manipulation of packages within holographic containers:

  • Isolate packages: Extract individual packages without full reconstruction
  • Complementary bundling: Bundle everything except target package(s)
  • Compact encoding: Balanced ternary representation (~39ร— compression)
  • Selective updates: Update packages without touching the rest of the system
  • Differential distribution: Ship only updated packages as compact holograms

See ADR-005 for technical details on hologram factoralization, balanced ternary encoding, and 64-bit register optimization.

Quick Start

Installation

# Clone the repository
git clone https://github.com/tzervas/embeddenator.git
cd embeddenator

# Build with Cargo
cargo build --release

# Or use the orchestrator
python3 orchestrator.py --mode build --verbose

Basic Usage

# Ingest a directory into an engram
cargo run --release -- ingest -i ./input_ws -e root.engram -m manifest.json -v

# Extract from an engram
cargo run --release -- extract -e root.engram -m manifest.json -o ./output -v

# Query similarity
cargo run --release -- query -e root.engram -q ./test_file.txt -v

Using the Orchestrator

The orchestrator provides unified build, test, and deployment workflows:

# Quick start: build, test, and package everything
python3 orchestrator.py --mode full --verbose -i

# Run integration tests
python3 orchestrator.py --mode test --verbose

# Build Docker image
python3 orchestrator.py --mode package --verbose

# Display system info
python3 orchestrator.py --mode info

# Clean all artifacts
python3 orchestrator.py --mode clean

CLI Reference

Embeddenator provides the following commands for working with holographic engrams:

embeddenator --help

Get comprehensive help information:

# Show main help with examples
embeddenator --help

# Show detailed help for a specific command
embeddenator ingest --help
embeddenator extract --help
embeddenator query --help
embeddenator query-text --help
embeddenator bundle-hier --help

ingest - Create Holographic Engram

Process one or more files and/or directories and encode them into a holographic engram.

embeddenator ingest [OPTIONS] --input <PATH>...

Required:
  -i, --input <PATH>...   Input file(s) and/or directory(ies) to ingest

Options:
  -e, --engram <FILE>     Output engram file [default: root.engram]
  -m, --manifest <FILE>   Output manifest file [default: manifest.json]
  -v, --verbose           Enable verbose output with progress and statistics
  -h, --help             Print help information

Examples:
  # Basic ingestion
  embeddenator ingest -i ./myproject -e project.engram -m project.json

  # Mix files and directories (repeat -i/--input)
  embeddenator ingest -i ./src -i ./README.md -e project.engram -m project.json

  # With verbose output
  embeddenator ingest -i ~/Documents -e docs.engram -v

  # Custom filenames
  embeddenator ingest --input ./data --engram backup.engram --manifest backup.json

What it does:

  • Recursively scans any input directories
  • Ingests any input files directly
  • Chunks files (4KB default)
  • Encodes chunks using sparse ternary VSA
  • Creates holographic superposition in root vector
  • Saves engram (holographic data) and manifest (metadata)

extract - Reconstruct Files

Bit-perfect reconstruction of all files from an engram.

embeddenator extract [OPTIONS] --output-dir <DIR>

Required:
  -o, --output-dir <DIR>  Output directory for reconstructed files

Options:
  -e, --engram <FILE>     Input engram file [default: root.engram]
  -m, --manifest <FILE>   Input manifest file [default: manifest.json]
  -v, --verbose           Enable verbose output with progress
  -h, --help             Print help information

Examples:
  # Basic extraction
  embeddenator extract -e project.engram -m project.json -o ./restored

  # With default filenames
  embeddenator extract -o ./output -v

  # From backup
  embeddenator extract --engram backup.engram --manifest backup.json --output-dir ~/restored

What it does:

  • Loads engram and manifest
  • Reconstructs directory structure
  • Algebraically unbinds chunks from root vector
  • Writes bit-perfect copies of all files
  • Preserves file hierarchy and metadata

query - Similarity Search

Compute cosine similarity between a query file and engram contents.

embeddenator query [OPTIONS] --query <FILE>

Required:
  -q, --query <FILE>      Query file or pattern to search for

Options:
  -e, --engram <FILE>     Engram file to query [default: root.engram]
  --hierarchical-manifest <FILE>  Optional hierarchical manifest (selective unfolding)
  --sub-engrams-dir <DIR>         Directory of `.subengram` files (used with --hierarchical-manifest)
  --k <K>              Top-k results to print for codebook/hierarchical search [default: 10]
  -v, --verbose           Enable verbose output with similarity details
  -h, --help             Print help information

Examples:
  # Query similarity
  embeddenator query -e archive.engram -q search.txt

  # With verbose output
  embeddenator query -e data.engram -q pattern.bin -v

  # Using default engram
  embeddenator query --query testfile.txt -v

What it does:

  • Encodes query file using VSA
  • Computes cosine similarity with engram
  • Returns similarity score

If --hierarchical-manifest and --sub-engrams-dir are provided, it also runs a store-backed hierarchical query and prints the top hierarchical matches.

Similarity interpretation:

  • >0.75: Strong match, likely contains similar content
  • 0.3-0.75: Moderate similarity, some shared patterns
  • <0.3: Low similarity, likely unrelated content

query-text - Similarity Search (Text)

Encode a literal text string as a query vector and run the same retrieval path as query.

embeddenator query-text -e root.engram --text "search phrase" --k 10

# With hierarchical selective unfolding:
embeddenator query-text -e root.engram --text "search phrase" \
  --hierarchical-manifest hier.json --sub-engrams-dir ./sub_engrams --k 10

bundle-hier - Build Hierarchical Retrieval Artifacts

Build a hierarchical manifest and a directory of sub-engrams from an existing flat root.engram + manifest.json. This enables store-backed selective unfolding queries.

embeddenator bundle-hier -e root.engram -m manifest.json \
  --out-hierarchical-manifest hier.json \
  --out-sub-engrams-dir ./sub_engrams

# Optional: deterministically shard large nodes (bounds per-node indexing cost)
embeddenator bundle-hier -e root.engram -m manifest.json \
  --max-chunks-per-node 2000 \
  --out-hierarchical-manifest hier.json \
  --out-sub-engrams-dir ./sub_engrams

Docker Usage

Build Tool Image

docker build -f Dockerfile.tool -t embeddenator-tool:latest .

Run in Container

# Ingest data
docker run -v $(pwd)/input_ws:/input -v $(pwd)/workspace:/workspace \
  embeddenator-tool:latest \
  ingest -i /input -e /workspace/root.engram -m /workspace/manifest.json -v

# Extract data
docker run -v $(pwd)/workspace:/workspace -v $(pwd)/output:/output \
  embeddenator-tool:latest \
  extract -e /workspace/root.engram -m /workspace/manifest.json -o /output -v

Holographic Container

Build a container from an engram:

# First, create an engram of your desired filesystem
cargo run --release -- ingest -i ./rootfs -e workspace/root.engram -m workspace/manifest.json

# Build the holographic container
docker build -f Dockerfile.holographic -t my-holographic-os:latest .

Holographic OS Images - Dual Versioning Strategy

Embeddenator provides pre-built holographic OS images with a dual versioning strategy:

LTS Stable Releases (Long-Term Support):

  • Debian 12 Bookworm (amd64, arm64)
  • Ubuntu 24.04 LTS Noble (amd64, arm64)
  • Tagged with version numbers (e.g., v0.2.0-lts)
  • Updated on stable release cycles
  • Recommended for production use

Testing/Sid/Rolling Releases (Bleeding Edge):

  • Debian Testing (amd64, arm64) - Static version + nightly
  • Debian Sid (amd64, arm64) - Static version + nightly
  • Ubuntu Devel (amd64, arm64) - Static version + nightly
  • Ubuntu Rolling (amd64, arm64) - Static version + nightly
  • Tagged with version + -nightly suffix (e.g., v0.2.0-nightly-20250115)
  • Built daily at 2 AM UTC with latest packages and Rust nightly
  • Recommended for testing and development

Pull images:

# LTS stable images
docker pull ghcr.io/tzervas/embeddenator-holographic-debian-stable-amd64:latest
docker pull ghcr.io/tzervas/embeddenator-holographic-ubuntu-stable-arm64:latest

# Nightly bleeding-edge images
docker pull ghcr.io/tzervas/embeddenator-holographic-debian-testing-amd64:nightly
docker pull ghcr.io/tzervas/embeddenator-holographic-ubuntu-rolling-arm64:nightly

# Specific dated nightly
docker pull ghcr.io/tzervas/embeddenator-holographic-debian-sid-amd64:v0.2.0-nightly-20250115

Available OS Configurations:

OS Version LTS Nightly Architectures
Debian 12 Bookworm โœ… โŒ amd64, arm64
Debian Testing โŒ โœ… amd64, arm64
Debian Sid โŒ โœ… amd64, arm64
Ubuntu 24.04 LTS โœ… โŒ amd64, arm64
Ubuntu Devel โŒ โœ… amd64, arm64
Ubuntu Rolling โŒ โœ… amd64, arm64

Validation Baseline

Embeddenator guarantees:

  • โœ… 100% ordered text reconstruction: All text files byte-for-byte identical
  • โœ… Bit-perfect binary recovery: All binary files exactly match originals
  • โœ… Algebraic update correctness: VSA operations maintain mathematical properties
  • โœ… Multi-file superposition independence: Files can be extracted independently
  • โœ… Persistence cycle identity: Ingest โ†’ extract โ†’ ingest produces identical engrams

Success Metrics

Typical performance characteristics:

  • Memory: <400MB peak for 10,000 tokens
  • Speed: Reconstruction <100ms for 10k tokens
  • Compression: Engram size ~40-50% of unpacked rootfs
  • Scalability: Handles 1M+ tokens with hierarchical encoding

Architecture

Core Components

  1. SparseVec: Sparse ternary vector implementation

    • pos: Indices with +1 value
    • neg: Indices with -1 value
    • Efficient operations: bundle, bind, cosine similarity
    • Hardware-optimized: 39-40 trits per 64-bit register
  2. EmbrFS: Holographic filesystem layer

    • Chunked encoding (4KB default)
    • Manifest for file metadata
    • Codebook for chunk storage
  3. CLI: Command-line interface

    • Ingest: directory โ†’ engram
    • Extract: engram โ†’ directory
    • Query: similarity search

Architecture Decision Records (ADRs)

Comprehensive architectural documentation is available in docs/adr/:

  • ADR-001: Sparse Ternary VSA

    • Core VSA design and sparse ternary vectors
    • Balanced ternary mathematics and hardware optimization
    • 64-bit register encoding (39-40 trits per register)
  • ADR-002: Multi-Agent Workflow System

  • ADR-003: Self-Hosted Runner Architecture

  • ADR-004: Holographic OS Container Design

    • Configuration-driven builder for Debian/Ubuntu
    • Dual versioning strategy (LTS + nightly)
    • Package isolation capabilities
  • ADR-005: Hologram-Based Package Isolation

    • Factoralization of holographic containers
    • Balanced ternary encoding for compact representation
    • Package-level granular updates
    • Hardware optimization strategy for 64-bit CPUs
  • ADR-006: Dimensionality and Sparsity Scaling

    • Scaling holographic space to TB-scale datasets
    • Adaptive sparsity strategy (maintain constant computational cost)
    • Performance analysis and collision probability projections
    • Impact on 100% bit-perfect guarantee
    • Deep operation resilience for factoralization
  • ADR-007: Codebook Security and Reversible Encoding

    • VSA-as-a-lens cryptographic primitive
    • Quantum-resistant encoding mechanism
    • Mathematically trivial with key, impossible without
    • Bulk encryption with selective decryption
    • Integration with holographic indexing

See docs/adr/README.md for the complete ADR index.

File Format

Engram (.engram):

  • Binary serialized format (bincode)
  • Contains root SparseVec and codebook
  • Self-contained holographic state

Manifest (.json):

  • Human-readable file listing
  • Chunk mapping and metadata
  • Required for extraction

Development

API Documentation

Comprehensive API documentation is available:

# Generate and open documentation locally
cargo doc --open

# Or use the automated script
./generate_docs.sh

# View online (after publishing)
# https://docs.rs/embeddenator

The documentation includes:

  • Module-level overviews with examples
  • Function documentation with usage patterns
  • 9 runnable doc tests demonstrating API usage
  • VSA operation examples (bundle, bind, cosine)

Running Tests

# Recommended: everything Cargo considers testable (lib/bin/tests/examples/benches)
cargo test --workspace --all-targets

# Doc tests only
cargo test --doc

# Optimized build tests (useful before benchmarking)
cargo test --release --workspace --all-targets

# Feature-gated correctness/perf gates
cargo test --workspace --all-targets --features "bt-phase-2 proptest"

# Long-running/expensive tests are explicitly opt-in:
# - QA memory scaling (requires env var + ignored flag)
EMBEDDENATOR_RUN_QA_MEMORY=1 cargo test --features qa --test memory_scaled -- --ignored --nocapture
# - Multi-GB soak test (requires env var + ignored flag)
EMBEDDENATOR_RUN_SOAK=1 cargo test --release --features soak-memory --test soak_memory -- --ignored --nocapture

# Integration tests via orchestrator
python3 orchestrator.py --mode test --verbose

# Full test suite
python3 orchestrator.py --mode full --verbose

Notes:

  • Seeing many tests marked as "ignored" during cargo bench is expected: Cargo runs the unit test harness in libtest's --bench mode, which skips normal #[test] functions (it prints i for each). Use cargo test (commands above) to actually execute tests.
  • cargo test --workspace --all-targets will also compile/run Criterion benches in a fast "smoke" mode (they print Testing ... Success). This is intended to catch broken benches early.

CI/CD and Build Monitoring

The project uses separated CI/CD workflows for optimal performance and reliability:

# Test CI build locally with monitoring
./ci_build_monitor.sh linux/amd64 build 300

# Monitor for specific timeout (in seconds)
./ci_build_monitor.sh linux/amd64 full 900

CI Workflow Structure:

Three separate workflows eliminate duplication and provide clear responsibilities:

  1. ci-pre-checks.yml - Fast validation (fmt, clippy, unit tests, doc tests)
  2. ci-amd64.yml - Full AMD64 build and test (REQUIRED PRE-MERGE CHECK)
  3. ci-arm64.yml - ARM64 build and test (configured for self-hosted runners)

CI Features:

  • Separated workflows prevent duplicate runs
  • AMD64 workflow is a required status check - PRs cannot merge until it passes
  • Parallel builds using all available cores
  • Intelligent timeout management (15min tests, 10min builds, 30min total)
  • Build artifact upload on failure
  • Performance metrics reporting
  • Automatic parallelization with CARGO_BUILD_JOBS

Architecture Support:

Architecture Status Runner Type Trigger Notes
amd64 (x86_64) โœ… Production GitHub-hosted (ubuntu-latest) Every PR (required check) Stable, 5-7min
arm64 (aarch64) ๐Ÿšง Ready Self-hosted (pending deployment) Manual only Will enable on merge to main

ARM64 Deployment Roadmap:

  • โœ… Phase 1: Root cause analysis completed - GitHub doesn't provide standard ARM64 runners
  • โœ… Phase 2: Workflow configured for self-hosted runners with labels ["self-hosted", "linux", "ARM64"]
  • ๐Ÿšง Phase 3: Deploy self-hosted ARM64 infrastructure (in progress)
  • โณ Phase 4: Manual testing and validation
  • โณ Phase 5: Enable automatic trigger on merge to main only

Why Self-Hosted for ARM64?

  • GitHub Actions doesn't provide standard hosted ARM64 runners
  • Self-hosted provides native execution (no emulation overhead)
  • Cost-effective for frequent builds
  • Ready to deploy when infrastructure is available

See .github/workflows/README.md for complete CI/CD documentation and ARM64 setup guide.

Self-Hosted Runner Automation

Embeddenator includes a comprehensive Python-based automation system for managing GitHub Actions self-hosted runners with complete lifecycle management and multi-architecture support:

Features:

  • โœจ Automated registration with short-lived tokens
  • ๐Ÿ”„ Complete lifecycle management (register โ†’ run โ†’ deregister)
  • โฑ๏ธ Configurable auto-deregistration after idle timeout
  • ๐ŸŽฏ Manual mode for persistent runners
  • ๐Ÿš€ Multi-runner deployment support
  • ๐Ÿ—๏ธ Multi-architecture support (x64, ARM64, RISC-V)
  • ๐Ÿ”ง QEMU emulation for cross-architecture runners
  • ๐Ÿ“Š Health monitoring and status reporting
  • ๐Ÿงน Automatic cleanup of Docker resources
  • โš™๏ธ Flexible configuration via .env file or CLI arguments

Supported Architectures:

  • x64 (AMD64) - Native x86_64 runners
  • ARM64 (aarch64) - ARM64 runners (native or emulated via QEMU)
  • RISC-V (riscv64) - RISC-V runners (native or emulated via QEMU)

Quick Start:

# 1. Copy and configure environment file
cp .env.example .env
# Edit .env and set GITHUB_REPOSITORY and GITHUB_TOKEN

# 2. Run in auto mode (registers, starts, monitors, auto-deregisters when idle)
python3 runner_manager.py run

# 3. Or use manual mode (keeps running until stopped)
RUNNER_MODE=manual python3 runner_manager.py run

Multi-Architecture Examples:

# Deploy ARM64 runners on x86_64 hardware (with emulation, auto-detect runtime)
RUNNER_TARGET_ARCHITECTURES=arm64 python3 runner_manager.py run

# Deploy runners for all architectures
RUNNER_TARGET_ARCHITECTURES=x64,arm64,riscv64 RUNNER_COUNT=6 python3 runner_manager.py run

# Deploy with automatic QEMU installation (requires sudo)
RUNNER_EMULATION_AUTO_INSTALL=true RUNNER_TARGET_ARCHITECTURES=arm64 python3 runner_manager.py run

# Use specific emulation method (docker, podman, or qemu)
RUNNER_EMULATION_METHOD=podman RUNNER_TARGET_ARCHITECTURES=arm64 python3 runner_manager.py run

# Use Docker for emulation
RUNNER_EMULATION_METHOD=docker RUNNER_TARGET_ARCHITECTURES=arm64,riscv64 python3 runner_manager.py run

Individual Commands:

# Register runner(s)
python3 runner_manager.py register

# Start runner service(s)
python3 runner_manager.py start

# Monitor and manage lifecycle
python3 runner_manager.py monitor

# Check status
python3 runner_manager.py status

# Stop and deregister
python3 runner_manager.py stop

Advanced Usage:

# Deploy multiple runners
python3 runner_manager.py run --runner-count 4

# Custom labels
python3 runner_manager.py register --labels self-hosted,linux,ARM64,large

# Auto-deregister after 10 minutes of inactivity
RUNNER_IDLE_TIMEOUT=600 python3 runner_manager.py run

Configuration Options:

Key environment variables (see .env.example for full list):

  • GITHUB_REPOSITORY - Repository to register runners for (required)
  • GITHUB_TOKEN - Personal access token with repo scope (required)
  • RUNNER_MODE - Deployment mode: auto (default) or manual
  • RUNNER_IDLE_TIMEOUT - Auto-deregister timeout in seconds (default: 300)
  • RUNNER_COUNT - Number of runners to deploy (default: 1)
  • RUNNER_LABELS - Comma-separated runner labels
  • RUNNER_EPHEMERAL - Enable ephemeral runners (deregister after one job)
  • RUNNER_TARGET_ARCHITECTURES - Target architectures: x64, arm64, riscv64 (comma-separated)
  • RUNNER_ENABLE_EMULATION - Enable QEMU emulation for cross-architecture (default: true)
  • RUNNER_EMULATION_METHOD - Emulation method: auto, qemu, docker, podman (default: auto)
  • RUNNER_EMULATION_AUTO_INSTALL - Auto-install QEMU if missing (default: false, requires sudo)

See .env.example for complete configuration documentation.

Deployment Modes:

  1. Auto Mode (default): Runners automatically deregister after being idle for a specified timeout

    • Perfect for cost optimization
    • Ideal for CI/CD pipelines with sporadic builds
    • Runners terminate when queue is empty
  2. Manual Mode: Runners keep running until manually stopped

    • Best for development environments
    • Useful for persistent infrastructure
    • Explicit control over runner lifecycle

See .github/workflows/README.md for complete CI/CD documentation and ARM64 setup guide.

Project Structure

embeddenator/
โ”œโ”€โ”€ Cargo.toml                  # Rust dependencies
โ”œโ”€โ”€ src/
โ”‚   โ””โ”€โ”€ main.rs                 # Complete implementation
โ”œโ”€โ”€ tests/
โ”‚   โ”œโ”€โ”€ e2e_regression.rs       # 6 E2E tests (includes critical engram modification test)
โ”‚   โ”œโ”€โ”€ integration_cli.rs      # 7 integration tests
โ”‚   โ””โ”€โ”€ unit_tests.rs           # 11 unit tests
โ”œโ”€โ”€ Dockerfile.tool             # Static binary packaging
โ”œโ”€โ”€ Dockerfile.holographic      # Holographic OS container
โ”œโ”€โ”€ orchestrator.py             # Unified build/test/deploy
โ”œโ”€โ”€ runner_manager.py           # Self-hosted runner automation entry point (NEW)
โ”œโ”€โ”€ runner_automation/          # Runner automation package (NEW)
โ”‚   โ”œโ”€โ”€ __init__.py            # Package initialization (v1.1.0)
โ”‚   โ”œโ”€โ”€ config.py              # Configuration management
โ”‚   โ”œโ”€โ”€ github_api.py          # GitHub API client
โ”‚   โ”œโ”€โ”€ installer.py           # Runner installation
โ”‚   โ”œโ”€โ”€ runner.py              # Individual runner lifecycle
โ”‚   โ”œโ”€โ”€ manager.py             # Multi-runner orchestration
โ”‚   โ”œโ”€โ”€ emulation.py           # QEMU emulation for cross-arch (NEW)
โ”‚   โ”œโ”€โ”€ cli.py                 # Command-line interface
โ”‚   โ””โ”€โ”€ README.md              # Package documentation
โ”œโ”€โ”€ .env.example                # Runner configuration template (NEW)
โ”œโ”€โ”€ ci_build_monitor.sh         # CI hang detection and monitoring
โ”œโ”€โ”€ generate_docs.sh            # Documentation generation
โ”œโ”€โ”€ .github/
โ”‚   โ””โ”€โ”€ workflows/
โ”‚       โ”œโ”€โ”€ ci-pre-checks.yml       # Pre-build validation (every PR)
โ”‚       โ”œโ”€โ”€ ci-amd64.yml            # AMD64 build (required for merge)
โ”‚       โ”œโ”€โ”€ ci-arm64.yml            # ARM64 build (self-hosted, pending)
โ”‚       โ”œโ”€โ”€ build-holographic-os.yml# OS container builds
โ”‚       โ”œโ”€โ”€ build-push-images.yml   # Multi-OS image pipeline
โ”‚       โ”œโ”€โ”€ nightly-builds.yml      # Nightly bleeding-edge builds
โ”‚       โ””โ”€โ”€ README.md               # Complete CI/CD documentation
โ”œโ”€โ”€ input_ws/                   # Example input (gitignored)
โ”œโ”€โ”€ workspace/                  # Build artifacts (gitignored)
โ””โ”€โ”€ README.md               # This file

Contributing

We welcome contributions to Embeddenator! Here's how you can help:

Getting Started

  1. Fork the repository on GitHub
  2. Clone your fork locally:
    git clone https://github.com/YOUR_USERNAME/embeddenator.git
    cd embeddenator
    
  3. Create a feature branch:
    git checkout -b feature/my-new-feature
    

Development Workflow

  1. Make your changes with clear, focused commits
  2. Add tests for new functionality:
    • Unit tests in src/ modules
    • Integration tests in tests/integration_*.rs
    • End-to-end tests in tests/e2e_*.rs
  3. Run the full test suite:
    # Run all Rust tests
    cargo test
    
    # Run integration tests via orchestrator
    python3 orchestrator.py --mode test --verbose
    
    # Run full validation suite
    python3 orchestrator.py --mode full --verbose
    
  4. Check code quality:
    # Run Clippy linter (zero warnings required)
    cargo clippy -- -D warnings
    
    # Format code
    cargo fmt
    
    # Check Python syntax
    python3 -m py_compile *.py
    
  5. Test cross-platform (if applicable):
    # Build Docker images
    docker build -f Dockerfile.tool -t embeddenator-tool:test .
    
    # Test on different architectures
    python3 orchestrator.py --platform linux/arm64 --mode test
    

Pull Request Guidelines

  • Write clear commit messages describing what and why
  • Reference issues in commit messages (e.g., "Fixes #123")
  • Keep PRs focused - one feature or fix per PR
  • Update documentation if you change CLI options or add features
  • Ensure all tests pass before submitting
  • Maintain code coverage - aim for >80% test coverage

Code Style

  • Rust: Follow standard Rust conventions (use cargo fmt)
  • Python: Follow PEP 8 style guide
  • Comments: Document complex algorithms, especially VSA operations
  • Error handling: Use proper error types, avoid .unwrap() in library code

Areas for Contribution

We especially welcome contributions in these areas:

  • ๐Ÿ”ฌ Performance optimizations for VSA operations
  • ๐Ÿ“Š Benchmarking tools and performance analysis
  • ๐Ÿงช Additional test cases covering edge cases
  • ๐Ÿ“š Documentation improvements and examples
  • ๐Ÿ› Bug fixes and error handling improvements
  • ๐ŸŒ Multi-platform support (Windows, macOS testing)
  • ๐Ÿ”ง New features (incremental updates, compression options, etc.)

Reporting Issues

When reporting bugs, please include:

  • Embeddenator version (embeddenator --version)
  • Operating system and architecture
  • Rust version (rustc --version)
  • Minimal reproduction steps
  • Expected vs. actual behavior
  • Relevant log output (use --verbose flag)

Questions and Discussions

  • Issues: Bug reports and feature requests
  • Discussions: Questions, ideas, and general discussion
  • Pull Requests: Code contributions with tests

Code of Conduct

  • Be respectful and inclusive
  • Provide constructive feedback
  • Focus on the technical merits
  • Help others learn and grow

Thank you for contributing to Embeddenator! ๐ŸŽ‰

Advanced Usage

Custom Chunk Size

Modify chunk_size in EmbrFS::ingest_file for different trade-offs:

let chunk_size = 8192; // Larger chunks = better compression, slower reconstruction

Hierarchical Encoding

For very large datasets, implement multi-level engrams:

// Level 1: Individual files
// Level 2: Directory summaries
// Level 3: Root engram of all directories

Algebraic Operations

Combine multiple engrams:

let combined = engram1.root.bundle(&engram2.root);
// Now combined contains both datasets holographically

Troubleshooting

Out of Memory

Reduce chunk size or process files in batches:

# Process directories separately
for dir in input_ws/*/; do
  cargo run --release -- ingest -i "$dir" -e "engrams/$(basename $dir).engram"
done

Reconstruction Mismatches

Verify manifest and engram are from the same ingest:

# Check manifest metadata
jq '.total_chunks' workspace/manifest.json

# Re-ingest if needed
cargo run --release -- ingest -i ./input_ws -e root.engram -m manifest.json -v

Performance Tips

  1. Use release builds: cargo build --release is 10-100x faster
  2. Enable SIMD acceleration: For query-heavy workloads, build with --features simd and RUSTFLAGS="-C target-cpu=native"
    # Build with SIMD optimizations
    RUSTFLAGS="-C target-cpu=native" cargo build --release --features simd
    
    See docs/SIMD_OPTIMIZATION.md for details on 2-4x query speedup
  3. Batch processing: Ingest multiple directories separately for parallel processing
  4. SSD storage: Engram I/O benefits significantly from fast storage
  5. Memory: Ensure sufficient RAM for large codebooks (~100 bytes per chunk)

License

MIT License - see LICENSE file for details

References

Vector Symbolic Architectures (VSA)

  • Vector Symbolic Architectures: Kanerva, P. (2009)
  • Sparse Distributed Representations
  • Holographic Reduced Representations (HRR)

Ternary Computing and Hardware Optimization

  • Balanced Ternary - Wikipedia overview
  • Ternary Computing - Historical and mathematical foundations
  • Three-Valued Logic and Quantum Computing
  • Optimal encoding: 39-40 trits in 64-bit registers (39 for signed, 40 for unsigned)

Architecture Documentation

Use Cases and Applications

  • Specialized AI Assistant Models - Architecture for deploying coding and research assistant LLMs with embeddenator-enhanced retrieval, multi-model parallel execution, and document-driven development workflows

Support

Getting Help

Common Questions

Q: What file types are supported?
A: All file types - text, binary, executables, images, etc. Embeddenator is file-format agnostic.

Q: Is the reconstruction really bit-perfect?
A: Yes! All files are reconstructed exactly byte-for-byte. We have 23 tests verifying this.

Q: Can I combine multiple engrams?
A: Yes! Use VSA bundle operations to create holographic superpositions. See "Algebraic Operations" in the README.

Q: What's the maximum data size?
A: Theoretically unlimited with hierarchical encoding. Tested with datasets up to 1M+ tokens.

Q: How does this compare to compression?
A: Embeddenator focuses on holographic representation, not compression. Engram sizes are typically 40-50% of original data, but the key benefit is algebraic operations on encoded data.

Reporting Issues

When reporting bugs, please include:

  • Embeddenator version: embeddenator --version
  • Operating system and architecture
  • Rust version: rustc --version
  • Minimal reproduction steps
  • Expected vs. actual behavior
  • Relevant log output (use --verbose flag)

Security

If you discover a security vulnerability, please email security@embeddenator.dev (or create a private security advisory on GitHub) rather than opening a public issue.


Built with โค๏ธ using Rust, Docker, and holographic computing principles.

Commit count: 2

cargo fmt