| Crates.io | tauformer |
| lib.rs | tauformer |
| version | 0.4.0 |
| created_at | 2026-01-09 05:47:47.608253+00 |
| updated_at | 2026-01-13 07:40:26.174997+00 |
| description | A Transformer Architecture using arrowspace's taumode |
| homepage | https://github.com/tuned-org-uk/tauformer |
| repository | https://github.com/tuned-org-uk/tauformer |
| max_upload_size | |
| id | 2031634 |
| size | 508,769 |
A Transformer architecture using taumode attention for memory-efficient sequence modeling
Tauformer replaces standard inner-product attention with taumode distance scoring, compressing token representations into scalar synthetic indices derived from feature-space topological analysis. This enables constant-memory key storage while maintaining full attention expressiveness.
Note:
TauGptModel::new_with_sparse_laplacianrequires a Laplacian file path, so the bench below benchmarksGptModelend-to-end and the TauMode kernel only. For a full benchmark usetaugpt-kvcache-bench
$ cargo bench --features cpu
Standard transformer attention computes ( O(T^2) ) pairwise scores via ( QK^\top ), then applies softmax and weighted sum with ( V ). Tauformer innovates by:
arrowspace-rs paper).This provides significant memory savings for long contexts while preserving the causal masking and softmax stability pipeline. Also, leveraging arrowspace it makes possible to embed domain knowledge directly at attention heads level, implementing a de-facto persistent memory for token generation based on verified domain knowledge (see TAUATTENTION.md for some details).
The core innovation replaces the scoring mechanism:
Standard attention:
scores = Q @ K.T # (B, H, Tq, Tk) matrix multiply
att = softmax(causal_mask(scores))
out = att @ V
taumode attention:
# L is the domain-specific Graph Laplacian
lambda_q = taumode(Q, L) # (B, H, Tq) scalars
lambda_k = taumode(K, L) # (B, H, Tk) scalars
scores = -|lambda_q[:,:,:,None] - lambda_k[:,:,None,:]| / temperature
att = softmax(causal_mask(scores))
out = att @ V
Where L is a feature-space Laplacian (F×F matrix, F = head dimension) built from a domain corpus using arrowspace.
The taumode(x, L) function computes a bounded Rayleigh quotient:
E_raw = (x^T L x) / (x^T x + eps)
E_bounded = E_raw / (E_raw + tau)
lambda = E_bounded # synthetic spectral index
This measures the "spectral roughness" of vector x with respect to the feature manifold encoded in L.
See ADVANTAGES.md for more details.