| Crates.io | base64-turbo |
| lib.rs | base64-turbo |
| version | 0.1.3 |
| created_at | 2026-01-20 17:33:02.516693+00 |
| updated_at | 2026-01-23 11:49:29.620513+00 |
| description | Hardware-accelerated, formally verified Base64 engine. Features AVX2/AVX512 support, zero-allocation, and no_std compatibility. |
| homepage | https://github.com/hacer-bark/base64-turbo |
| repository | https://github.com/hacer-bark/base64-turbo |
| max_upload_size | |
| id | 2057082 |
| size | 148,478 |
AVX512-Accelerated, Zero-Allocation Base64 Engine for Rust.
base64-turbo is a production-grade library engineered for High Frequency Trading (HFT), Mission-Critical Servers, and Embedded Systems where CPU cycles and memory bandwidth are scarce resources.
Designed to align with modern hardware reality without sacrificing portability, this crate ensures optimal performance across the entire spectrum of computing devices:
no_std support make it ideal for embedded firmware, operating system kernels, and bootloaders.Whether you are running on an embedded ARM microcontroller or a Zen 4 data center node, base64-turbo automatically selects the fastest, safest algorithm for your specific architecture.
We believe in radical transparency. Below is a fact-based comparison against the leading alternatives in both the Rust and C ecosystems.
base64-simd)base64-turbo outperforms the current Rust standard by approximately 2x in raw throughput. This performance delta is achieved through aggressive loop unrolling, reduced instruction count per encoded byte, and hybrid logics.

Benchmark Summarize:
| Metric | base64-turbo (This Crate) |
base64-simd |
Speedup |
|---|---|---|---|
| Decode (Read) | ~21.1 GiB/s | ~10.0 GiB/s | +111% |
| Encode (Write) | ~12.5 GiB/s | ~10.5 GiB/s | +20% |
| Small Data (32B) | ~3.0 GiB/s | ~1.6 GiB/s | +87% |
| Latency (32B) | ~10ns | ~18 ns | 1.8x Lower |
Figure 1: Comparative benchmarks conducted on an AWS c7i.large instance (Intel Xeon Platinum 8488C).
turbo-base64)The C library turbo-base64 currently sets the "speed of light" for Base64 encoding. It achieves extreme velocities by utilizing pure C, unchecked pointer arithmetic, and bypassing memory safety checks.
base64-turbo (this crate) offers a strategic compromise: it delivers 40-50% of the C speed while maintaining 100% Rust memory verifications guarantees and a permissive license.
| Feature | base64-turbo (This Crate) |
turbo-base64 (C Library) |
|---|---|---|
| Throughput (AVX2) | ~12 GiB/s (Safe Slices) | ~29 GiB/s (Unchecked Pointers) |
| Memory Safety | ✅ Guaranteed (MIRI Audited) | ❌ Unsafe (Raw C Pointers) |
| Formal Verification | ✅ Kani Verified (Math Proofs) | ❌ None (No audits) |
| Reliability | ✅ 2 Billion+ Fuzz Iterations | ❌ Unknown / Not Stated |
| License | ✅ MIT (Permissive) | ❌ GPLv3 / Commercial |
base64-turbo if you require the highest possible performance within Verified Rust (fast enough to saturate RAM bandwidth) and require a permissive license with formally verified safety guarantees.The easiest way to use the library. Handles allocation automatically.
use base64_turbo::STANDARD;
let data = b"huge_market_data_feed...";
// Automatically selects the fastest SIMD algorithm (AVX2, SSSE3, or AVX512) at runtime.
//
// Note: Multi-threaded processing (Rayon) is opt-in via the `parallel` feature
// to ensure deterministic latency in standard deployments.
let encoded = STANDARD.encode(data);
let decoded = STANDARD.decode(&encoded).unwrap();
For hot paths where malloc overhead is unacceptable.
use base64_turbo::STANDARD;
let input = b"order_id_123";
let mut buffer = [0u8; 1024]; // Stack allocated, kept hot in L1 cache
// No syscalls, no malloc, pure CPU cycles
// Returns Result<usize, Error> indicating bytes written
let len = STANDARD.encode_into(input, &mut buffer).unwrap();
assert_eq!(&buffer[..len], b"b3JkZXJfaWRfMTIz");
By default, this crate is dependency-free and compiles on Stable Rust. Features are opt-in to allow users to balance compile times, binary size, and specific performance needs.
| Flag | Description | Default |
|---|---|---|
std |
Provides high-level encode and decode API returning heap-allocated String and Vec<u8>. Disable for embedded/bare-metal no_std environments. |
Enabled |
simd |
Enables runtime CPU feature detection (AVX2/SSSE3). Automatically falls back to safe scalar logic if hardware support is missing. | Enabled |
parallel |
Enables multi-threaded processing for large payloads (> 512KB) via Rayon. Note: This adds rayon as a dependency. |
Disabled |
avx512 |
Compiles AVX512 intrinsics. Requires a supported CPU (e.g., Zen 4, Ice Lake) to execute the optimized path. | Disabled |
parallel and avx512 disabled by default?We prioritize zero-dependencies, deterministic latency, and strict formal verification in the default configuration.
parallel (Rayon)rayon, which is a significant external dependency.avx512This engine is not merely a loop over a lookup table; it is engineered to exploit the micro-architectural mechanics of modern x86 processors. By aligning software logic with hardware capabilities, it is trying to achieve maximum Instruction Level Parallelism (ILP).
vpshufb) are restricted to 128-bit lanes, preventing data from crossing between the lower and upper halves of a register. It utilize "double-load" intrinsics to bridge this gap, allowing full utilization of the 32-byte YMM registers without pipeline stalls.Achieving maximum throughput should not come at the cost of memory safety. While this crate leverages unsafe intrinsics for SIMD optimizations, the codebase is rigorously audited and formally verified to guarantee stability.
To ensure strict adherence to these standards, GitHub CI pipeline is configured to block any release that fails to pass logical tests or MIRI verification.
cargo fuzz. This ensures resilience against edge cases, invalid inputs, and complex buffer boundary conditions.As part of transparency policy, here the sizes of the compiled library artifact (.rlib) under maximum optimization settings (lto = "fat", codegen-units = 1).
| Configuration | Size | Details |
|---|---|---|
Default (std + simd) |
~512 KB | "Fat Binary" containing AVX2, SSSE3, and Scalar paths to support runtime CPU detection. |
Scalar (std only) |
~82 KB | SIMD disabled. Optimized for legacy x86 or generic architectures. |
Embedded (no_std) |
~64 KB | Pure Scalar logic. Ideal for microcontrollers, WASM, or kernel drivers. |
> Note: These sizes represent the intermediate .rlib, which includes metadata and symbol tables. The actual machine code added to your final executable is significantly smaller due to linker dead-code elimination. Additionally, compiling with -C target-cpu=native allows the compiler to strip unused SIMD paths, further reducing the binary size.
This project references several external Base64 libraries. Below is a comparative list detailing their performance characteristics and implementation details.
std in this project's benchmarks).base64-turbo (This Crate). Note: It utilizes unsafe logic, specifically leveraging core::simd (e.g., u8x32, u8x64), and has not undergone formal security audits.core::simd module. It currently fails to compile because the core::simd API has changed significantly since the crate was written, breaking backward compatibility. Even when functional, benchmarks indicate it is slower than base64-simd.fast-base64 library. Performance is lower than that of base64-simd.base64-simd.base64-simd.Safety Note: With the exception of the standard
base64crate (which uses only Safe Rust), none of these libraries offer verified guarantees against Undefined Behavior (UB).
MIT License. Copyright (c) 2026.