| Crates.io | archmage |
| lib.rs | archmage |
| version | 0.2.1 |
| created_at | 2026-01-20 12:58:56.037926+00 |
| updated_at | 2026-01-23 21:08:50.635538+00 |
| description | Safely invoke your intrinsic power, using the tokens granted to you by the CPU. Cast primitive magics faster than any mage alive. |
| homepage | |
| repository | https://github.com/imazen/archmage |
| max_upload_size | |
| id | 2056506 |
| size | 1,685,204 |
Safely invoke your intrinsic power, using the tokens granted to you by the CPU.
archmage provides zero-cost capability tokens that prove CPU features are available at runtime, making raw SIMD intrinsics safe to call via the #[arcane] macro.
[dependencies]
archmage = "0.2"
safe_unaligned_simd = "0.2" # For safe memory operations
use archmage::{Desktop64, SimdToken, arcane};
use std::arch::x86_64::*;
#[arcane]
fn square(_token: Desktop64, data: &[f32; 8]) -> [f32; 8] {
let v = safe_unaligned_simd::x86_64::_mm256_loadu_ps(data);
let squared = _mm256_mul_ps(v, v);
let mut out = [0.0f32; 8];
safe_unaligned_simd::x86_64::_mm256_storeu_ps(&mut out, squared);
out
}
fn main() {
if let Some(token) = Desktop64::summon() {
let result = square(token, &[1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0]);
println!("{:?}", result); // [1.0, 4.0, 9.0, 16.0, 25.0, 36.0, 49.0, 64.0]
}
}
SIMD intrinsics are unsafe for two reasons:
archmage solves #1 with capability tokens - zero-sized types that can only be created after runtime CPU detection succeeds:
// summon() checks CPUID and returns Some only if features are available
if let Some(token) = Desktop64::summon() {
// Token exists = CPU definitely has AVX2 + FMA
}
The #[arcane] macro transforms your function to enable #[target_feature], which makes value-based intrinsics safe (Rust 1.85+):
#[arcane]
fn example(token: Desktop64, data: &[f32; 8]) -> [f32; 8] {
let v = safe_unaligned_simd::x86_64::_mm256_loadu_ps(data); // Safe!
let result = _mm256_mul_ps(v, v); // Safe! (value-based)
// ...
}
For memory operations (#2), use the safe_unaligned_simd crate which provides reference-based alternatives.
Start with Desktop64 for most applications:
| Token | Features | CPU Support |
|---|---|---|
Desktop64 |
AVX2 + FMA + BMI2 | Intel Haswell 2013+, AMD Zen 1 2017+ |
X64V2Token |
SSE4.2 + POPCNT | Intel Nehalem 2008+, AMD Bulldozer 2011+ |
X64V3Token |
AVX2 + FMA + BMI2 | Same as Desktop64 (alias) |
Individual feature tokens for fine-grained control:
| Token | Features |
|---|---|
Avx2FmaToken |
AVX2 + FMA |
Avx2Token |
AVX2 only |
FmaToken |
FMA only |
AvxToken |
AVX |
Sse42Token |
SSE4.2 |
Sse41Token |
SSE4.1 |
avx512 feature)[dependencies]
archmage = { version = "0.2", features = ["avx512"] }
| Token | Features | CPU Support |
|---|---|---|
X64V4Token |
AVX-512 F/BW/CD/DQ/VL | Intel Skylake-X 2017+, AMD Zen 4 2022+ |
Avx512ModernToken |
+ VBMI2, VNNI, BF16, etc. | Intel Ice Lake 2019+, AMD Zen 4+ |
Avx512Fp16Token |
+ FP16 | Intel Sapphire Rapids 2023+ |
Note: Intel 12th-14th gen consumer CPUs do NOT have AVX-512.
| Token | Features | CPU Support |
|---|---|---|
Arm64 |
NEON | All AArch64 (baseline) |
NeonToken |
NEON | Same as Arm64 (alias) |
NeonAesToken |
NEON + AES | ARM with crypto extensions |
NeonSha3Token |
NEON + SHA3 | ARMv8.2+ |
ArmCryptoToken |
AES + SHA2 + CRC | Most ARMv8 CPUs |
ArmCrypto3Token |
+ SHA3 | ARMv8.4+ (M1/M2/M3, Graviton 2+) |
| Token | Features |
|---|---|
Simd128Token |
WASM SIMD |
Tokens form a hierarchy. Higher-level tokens can extract lower-level ones:
if let Some(v3) = X64V3Token::summon() {
let v2: X64V2Token = v3.v2(); // v3 implies v2
let avx2_fma: Avx2FmaToken = v3.avx2_fma();
let avx2: Avx2Token = v3.avx2();
let fma: FmaToken = v3.fma();
let sse42: Sse42Token = v3.sse42();
}
Use trait bounds for generic SIMD code:
use archmage::{HasX64V2, SimdToken, arcane};
// Accept any token with at least v2 features
#[arcane]
fn process<T: HasX64V2>(_token: T, data: &[u8]) {
// SSE4.2 intrinsics available
}
Available traits:
| Trait | Meaning |
|---|---|
SimdToken |
Base trait for all tokens |
HasX64V2 |
Has SSE4.2 + POPCNT |
HasX64V4 |
Has AVX-512 (requires avx512 feature) |
Has128BitSimd |
Has 128-bit vectors |
Has256BitSimd |
Has 256-bit vectors |
Has512BitSimd |
Has 512-bit vectors |
HasNeon |
Has ARM NEON |
HasNeonAes |
Has NEON + AES |
HasNeonSha3 |
Has NEON + SHA3 |
All tokens compile on all platforms. summon() returns None on unsupported architectures:
use archmage::{Desktop64, Arm64, SimdToken};
fn process(data: &mut [f32]) {
if let Some(token) = Desktop64::summon() {
process_avx2(token, data);
} else if let Some(token) = Arm64::summon() {
process_neon(token, data);
} else {
process_scalar(data);
}
}
archmage provides token-gated SIMD types with ergonomic operators:
use archmage::{Desktop64, SimdToken, simd::f32x8};
if let Some(token) = Desktop64::summon() {
let a = f32x8::splat(token, 2.0);
let b = f32x8::from_array(token, [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0]);
let c = a * b + a; // Operators work naturally
let result = c.sqrt();
println!("{:?}", result.to_array());
}
| Width | Float | Signed Int | Unsigned Int | Token Required |
|---|---|---|---|---|
| 128-bit | f32x4, f64x2 |
i8x16, i16x8, i32x4, i64x2 |
u8x16, u16x8, u32x4, u64x2 |
Sse41Token |
| 256-bit | f32x8, f64x4 |
i8x32, i16x16, i32x8, i64x4 |
u8x32, u16x16, u32x8, u64x4 |
Avx2FmaToken |
| 512-bit | f32x16, f64x8 |
i8x64, i16x32, i32x16, i64x8 |
u8x64, u16x32, u32x16, u64x8 |
Avx512Token |
Construction (requires token): splat, from_array, load, zero
Extraction: to_array, as_array, store, raw
Arithmetic: +, -, *, / and assignment variants
Bitwise: &, |, ^ and assignment variants
Math (float): sqrt, abs, floor, ceil, round, min, max, clamp, mul_add, mul_sub, recip, rsqrt
Transcendentals (float): log2_lowp, log2_midp, exp2_lowp, exp2_midp, ln_lowp, ln_midp, exp_lowp, exp_midp, pow_lowp, pow_midp, cbrt_midp
Comparison: simd_eq, simd_ne, simd_lt, simd_le, simd_gt, simd_ge
Reduction: reduce_add, reduce_min, reduce_max
Integer: shl::<N>, shr::<N>, shr_arithmetic::<N>
| Feature | Description |
|---|---|
std (default) |
Standard library support |
macros (default) |
#[arcane] macro |
avx512 |
AVX-512 tokens |
__composite |
Transpose, dot product (unstable) |
__wide |
wide crate integration (unstable) |
Set ARCHMAGE_DISABLE=1 to force summon() to return None:
ARCHMAGE_DISABLE=1 cargo test
MIT OR Apache-2.0
Developed with Claude (Anthropic). Review critical paths before production use.