| Crates.io | vsa-optim-rs |
| lib.rs | vsa-optim-rs |
| version | 0.1.1 |
| created_at | 2026-01-25 02:31:08.799357+00 |
| updated_at | 2026-01-25 03:02:47.974236+00 |
| description | Deterministic training optimization using VSA compression and closed-form gradient prediction |
| homepage | https://github.com/tzervas/vsa-optim-rs |
| repository | https://github.com/tzervas/vsa-optim-rs |
| max_upload_size | |
| id | 2067958 |
| size | 265,555 |
Deterministic training optimization using Vector Symbolic Architecture (VSA), ternary quantization, and closed-form gradient prediction.
A pure Rust implementation enabling efficient large model fine-tuning on consumer hardware through mathematically principled gradient compression and prediction.
[dependencies]
vsa-optim-rs = "0.1"
The DeterministicPhaseTrainer orchestrates training through mathematically
rigorous phases with guaranteed reproducibility:
use vsa_optim_rs::{DeterministicPhaseTrainer, DeterministicPhaseConfig, DeterministicPhase};
use candle_core::Device;
use std::collections::HashMap;
// Define parameter shapes
let shapes = vec![
("layer1.weight".into(), vec![768, 768]),
("layer2.weight".into(), vec![768, 3072]),
];
// Configure deterministic training
let config = DeterministicPhaseConfig {
warmup_steps: 10, // Initial gradient collection
full_steps: 5, // Full computation per cycle
predict_steps: 20, // Predicted steps per cycle
correct_every: 5, // Correction frequency
adaptive_phases: true, // Auto-adjust on loss increase
..Default::default()
};
let mut trainer = DeterministicPhaseTrainer::new(&shapes, config, &Device::Cpu)?;
// Training loop
for step in 0..100 {
let info = trainer.begin_step()?;
match info.phase {
DeterministicPhase::Warmup | DeterministicPhase::Full | DeterministicPhase::Correct => {
// Compute gradients via backpropagation
let gradients = compute_gradients(&model, &batch);
trainer.record_full_gradients(&gradients)?;
}
DeterministicPhase::Predict => {
// Use deterministically predicted gradients (no backward pass)
let gradients = trainer.get_predicted_gradients()?;
apply_gradients(&mut model, &gradients);
}
}
trainer.end_step(loss)?;
}
let stats = trainer.get_stats();
println!("Speedup: {:.2}x ({} full, {} predicted)",
stats.speedup, stats.full_steps, stats.predicted_steps);
Compress gradients using hyperdimensional computing with bind/bundle/unbind operations:
use vsa_optim_rs::{VSAGradientCompressor, VSAConfig};
let config = VSAConfig::builder()
.dimension(8192) // Hypervector dimension
.compression_ratio(0.1) // 10x compression target
.seed(42) // Reproducible projections
.build();
let param_shapes = vec![
("weight".into(), vec![1024, 1024]),
];
let mut compressor = VSAGradientCompressor::new(¶m_shapes, config, &device)?;
// Compress gradients
let compressed = compressor.compress(&gradients)?;
println!("Compression: {:.1}x", compressed.stats.compression_ratio);
// Decompress when needed
let restored = compressor.decompress(&compressed)?;
Memory-efficient accumulation using balanced ternary {-1, 0, +1}:
use vsa_optim_rs::{TernaryGradientAccumulator, TernaryConfig};
let config = TernaryConfig::builder()
.accumulation_steps(8)
.use_stochastic_rounding(true) // Unbiased quantization
.build();
let mut accumulator = TernaryGradientAccumulator::new(¶m_shapes, config, &device)?;
for micro_batch in micro_batches {
let gradients = compute_gradients(&model, µ_batch);
accumulator.accumulate(&gradients)?; // ~93% memory savings
}
// Retrieve accumulated gradients for optimizer step
let accumulated = accumulator.get_accumulated()?;
optimizer.step(&accumulated)?;
accumulator.reset()?;
The core innovation: predict gradients using weighted least squares model fitting with a closed-form solution (no iterative optimization):
Gradient Model: g(t) = baseline + velocity × t + residual
Where:
- baseline: Weighted mean of historical gradients
- velocity: Gradient change rate (fitted via normal equations)
- residual: Exponentially-averaged prediction error for drift correction
Warmup Phase: Collect initial gradient samples to establish prediction baseline.
Prediction Fitting: Solve normal equations using Cramer's rule:
[Σw Σwt ] [b] [Σwg ]
[Σwt Σwt²] [v] = [Σwtg ]
Residual Tracking: Maintain exponentially-decayed average of prediction errors to correct systematic drift without stochastic noise.
┌─────────────────────────────────────────────────────────────────┐
│ │
│ WARMUP ──► FULL ──► PREDICT ──► CORRECT ──► FULL ──► ... │
│ (N steps) (M) (P steps) (periodic) (M) │
│ │ │ │
│ └──────────────┘ │
│ (correction cycle) │
│ │
└─────────────────────────────────────────────────────────────────┘
| Phase | Description | Backward Pass |
|---|---|---|
| Warmup | Collect gradients to initialize predictor | ✓ |
| Full | Standard training with gradient recording | ✓ |
| Predict | Use predicted gradients | ✗ |
| Correct | Compute actual gradient, update residuals | ✓ |
Gradients ──► Project to HD ──► Bind with keys ──► Bundle (majority) ──► Compressed
│
Decompressed ◄── Inverse Project ◄── Unbind with keys ◄────────────────────┘
Operations leverage the quasi-orthogonality of random vectors in high dimensions (Johnson-Lindenstrauss lemma) for information-preserving compression.
| Metric | Value | Notes |
|---|---|---|
| Gradient Storage | ~90% reduction | VSA compression |
| Backward Passes | ~80% reduction | Prediction phases |
| Accumulation Memory | ~93% reduction | Ternary quantization |
| Prediction Overhead | O(history_window × params) | Linear in tracked history |
| Determinism | 100% | Bit-exact reproducibility |
With default configuration (warmup=10, full=5, predict=20, correct_every=5):
100 steps = 10 warmup + (5 full + 20 predict) × cycles
= 10 warmup + ~25 full + ~65 predict
≈ 35 backward passes instead of 100
= 2.9x theoretical speedup
Actual speedup depends on backward pass cost relative to forward pass.
DeterministicPhaseConfig {
warmup_steps: 10, // Steps to collect initial gradients
full_steps: 5, // Full gradient steps per cycle
predict_steps: 20, // Predicted steps per cycle
correct_every: 5, // Correction frequency during predict
adaptive_phases: true, // Auto-adjust on loss increase
loss_increase_threshold: 0.1, // Threshold to trigger adaptation
history_window: 8, // Gradients to keep for model fitting
prediction_horizon: 1, // Steps ahead to predict
history_decay: 0.95, // Exponential decay for weighting
residual_threshold: 0.1, // When to apply residual correction
}
VSAConfig::builder()
.dimension(8192) // HD space dimension (↑ = better reconstruction)
.compression_ratio(0.1) // Target compression factor
.seed(42) // RNG seed for reproducibility
.build()
TernaryConfig::builder()
.accumulation_steps(8) // Micro-batches per optimizer step
.use_stochastic_rounding(true) // Unbiased quantization to {-1, 0, +1}
.build()
For YAML-driven LLM fine-tuning with automatic VSA acceleration:
use axolotl_rs::{VSAAccelerator, VSAAcceleratorConfig};
let config = VSAAcceleratorConfig::default(); // Or ::conservative(), ::aggressive()
let mut accel = VSAAccelerator::new(&trainable_params, config, &device)?;
for batch in dataloader {
let info = accel.begin_step()?;
if info.needs_backward {
loss.backward();
accel.record_gradients(&trainable_params)?;
} else {
let grads = accel.get_predicted_gradients()?;
// Apply predicted gradients
}
accel.end_step(loss_value)?;
}
println!("{}", accel.get_stats()); // "VSA: 100 steps (35 full, 65 predicted), 2.86x speedup"
| Crate | Description |
|---|---|
| trit-vsa | Balanced ternary arithmetic with VSA operations |
| bitnet-quantize | BitNet b1.58 quantization for neural networks |
| axolotl-rs | YAML-driven LLM fine-tuning toolkit |
| qlora-rs | 4-bit QLoRA with double quantization |
| peft-rs | Parameter-efficient fine-tuning adapters |
MIT License. See LICENSE-MIT for details.
"Simplicity is the ultimate sophistication." — Leonardo da Vinci