| Crates.io | cubemoma |
| lib.rs | cubemoma |
| version | 0.1.2 |
| created_at | 2025-11-16 12:40:02.746582+00 |
| updated_at | 2025-11-22 18:10:39.318555+00 |
| description | A multi-word modular arithmetic library based on CubeCL |
| homepage | |
| repository | https://github.com/newsniper-org/cubemoma |
| max_upload_size | |
| id | 1935486 |
| size | 74,497 |
cubemoma is a Rust crate providing an efficient CPU-based implementation of Multi-word Modular Arithmetic (MoMA) for cryptographic kernels, inspired by the paper "Code Generation for Cryptographic Kernels using Multi-word Modular Arithmetic on GPU" by Naifeng Zhang and Franz Franchetti (arXiv:2501.07535v1, published January 13, 2025). This crate focuses on the CPU version, implementing finite field arithmetic over large primes (e.g., 448-bit) without relying on a code generator. It supports operations essential for Fully Homomorphic Encryption (FHE) and Zero-Knowledge Proofs (ZKPs), such as modular addition, multiplication, inversion, square roots, and Number Theoretic Transforms (NTT).
This project adapts the MoMA formalization from the paper, which decomposes large integer arithmetic into machine-word operations (u64 limbs). Unlike the original GPU-focused work, cubemoma provides a static, const-generic Rust implementation for CPU environments. Key contributions from the paper include:
cubemoma translates these ideas to CPU, using Barrett reduction for modular ops and optimizations like pre-computed twiddles for NTT. It does not include GPU/CubeCL support yet.
For the full paper details, see the arXiv link.
BigField<N>): Modular addition, subtraction, multiplication (schoolbook + Barrett), inversion (Fermat), and square root (Tonelli-Shanks with Legendre symbol check).FpComplex<N>): Operations like mul_mod, add_mod, sub_mod, square_mod, norm_mod, inv_mod.N (e.g., N=7 for 448-bit fields).Option for non-invertible/non-residue cases.Add to your Cargo.toml:
[dependencies]
cubemoma = "0.0.1" # Replace with actual version
rand = "0.9"
[target.'cfg(target_arch = "wasm32")'.dependencies]
wasm-bindgen = "0.2"
Build with cargo build --release. For WASM: wasm-pack build --target web.
use cubemoma::{BigField, MBITS};
const N: usize = 7; // 448-bit
let p_limbs: [u64; N] = [0xFFFFFFFFFFFFFFFF, 0xFFFFFFFFFFFFFFFF, 0xFFFFFFFFFFFFFFFF, 0xFFFFFFFEFFFFFFFF, 0xFFFFFFFFFFFFFFFF, 0xFFFFFFFFFFFFFFFF, 0xFFFFFFFFFFFFFFFF];
let modulus = BigField::<N>::new(p_limbs);
let mu = BigField::<N>::precompute_mu(&modulus);
let a = BigField::<N>::random(&mut rand::thread_rng());
let b = BigField::<N>::random(&mut rand::thread_rng());
let sum = a.add_mod(&b, &modulus);
let product = a.mul_mod(&b, &modulus, mu);
if let Some(inv_a) = a.inv_mod(&modulus, mu) {
let identity = a.mul_mod(&inv_a, &modulus, mu); // Should be 1
}
if let Some(sqrt_a) = a.sqrt_mod(&modulus, mu) {
let sq = sqrt_a.square_mod(&modulus, mu); // Should equal a
}
let omega = BigField::<N>::from_limb(7u64); // Primitive root
let precompute = NttPrecompute::<N>::new(&omega, 256, &modulus, mu);
let mut input: Vec<BigField<N>> = (0..256).map(|_| BigField::<N>::random(&mut rand::thread_rng())).collect();
ntt(&mut input, &precompute, &modulus, mu);
Run cargo bench for CPU benchmarks. Example results (AMD Ryzen 7 5800H, Rust nightly-2025-10-01):
WASM results (browser, post-optimization): mul_mod ~2 µs (from repeated), NTT ~2 ms – 8-10x faster than initial.
Contributions welcome! Fork the repo, create a branch, and submit a PR. Focus on CPU optimizations (e.g., assembly intrinsics). Issues for bugs or features.
2-Clause BSD License. See LICENSE for details.
Based on the paper by Naifeng Zhang and Franz Franchetti. Special thanks to xAI for assistance in development.