| Crates.io | vlen |
| lib.rs | vlen |
| version | 0.2.0 |
| created_at | 2025-07-18 19:13:43.1102+00 |
| updated_at | 2025-07-18 19:58:22.51055+00 |
| description | High-performance variable-length integer encoding with SIMD optimizations, embedded support, and enhanced functionality |
| homepage | |
| repository | https://github.com/harrychin/vlen |
| max_upload_size | |
| id | 1759584 |
| size | 146,957 |
vlen is an enhanced version of the original vu128 variable-length numeric encoding, featuring SIMD optimizations, improved performance, and enhanced functionality. Numeric types up to 128 bits are supported (integers and floating-point), with smaller values being encoded using fewer bytes.
The compression ratio of vlen equals or exceeds the widely used VLQ and LEB128 encodings, and is significantly faster on modern pipelined architectures thanks to SIMD optimizations and algorithmic improvements. The library is designed to work efficiently on both high-performance systems and embedded targets.
Values in the range [0, 2^7) are encoded as a single byte with
the same bits as the original value.
Values in the range [2^7, 2^28) are encoded as a unary length prefix,
followed by (length*7) bits, in little-endian order. This is conceptually
similar to LEB128, but the continuation bits are placed in upper half
of the initial byte. This arrangement is also known as a "prefix varint".
MSB ------------------ LSB
10101011110011011110 Input value (0xABCDE)
0101010 1111001 1011110 Zero-padded to a multiple of 7 bits
01010101 11100110 ___11110 Grouped into octets, with 3 continuation bits
01010101 11100110 11011110 Continuation bits `110` added
0x55 0xE6 0xDE In hexadecimal
[0xDE, 0xE6, 0x55] Encoded output (order is little-endian)
Values in the range [2^28, 2^128) are encoded as a binary length prefix,
followed by payload bytes, in little-endian order. To differentiate this
format from the format of smaller values, the top 4 bits of the first byte
are set. The length prefix value is the number of payload bytes minus one;
equivalently it is the total length of the encoded value minus two.
MSB ------------------------------------ LSB
10010001101000101011001111000 Input value (0x12345678)
00010010 00110100 01010110 01111000 Zero-padded to a multiple of 8 bits
00010010 00110100 01010110 01111000 11110011 Prefix byte is `0xF0 | (4 - 1)`
0x12 0x34 0x56 0x78 0xF3 In hexadecimal
[0xF3, 0x78, 0x56, 0x34, 0x12] Encoded output (order is little-endian)
alloc featurealloc: Enables allocation-dependent functionality (default: disabled)serde: Enables serde integration for serialization/deserialization (default: disabled)simd: Enables SIMD optimizations for bulk encoding/decoding (default: disabled)full: Enables all features (alloc, serde, simd)no_std environments with the alloc featureuse vlen::{encode, decode, encoded_size};
// Encode a value
let mut buf = [0u8; 17];
let value: u64 = 12345;
let encoded_len = encode(&mut buf, value)?;
// Decode a value
let (decoded_value, decoded_len) = decode::<u64>(&buf)?;
// Calculate encoded size without encoding
let size = encoded_size(value)?;
With the serde feature enabled, you can use vlen encoding with serde-based serialization formats:
use serde::{Serialize, Deserialize};
use vlen::serde::{VlenU32, VlenI64, VlenF64};
#[derive(Serialize, Deserialize)]
struct MyStruct {
id: VlenU32,
timestamp: VlenI64,
score: VlenF64,
}
let data = MyStruct {
id: VlenU32(12345),
timestamp: VlenI64(-1234567890),
score: VlenF64(3.14159),
};
// Serialize to JSON (or any other serde format)
let json = serde_json::to_string(&data).unwrap();
let deserialized: MyStruct = serde_json::from_str(&json).unwrap();
assert_eq!(data.id.0, deserialized.id.0);
assert_eq!(data.timestamp.0, deserialized.timestamp.0);
assert_eq!(data.score.0, deserialized.score.0);
With the simd feature enabled, you can use high-performance bulk encoding and decoding operations:
use vlen::{bulk_encode_u32_safe, bulk_decode_u32_safe};
let values = [1u32, 1000, 1000000, 1000000000];
let mut buf = [0u8; 20];
// Bulk encode multiple values
let encoded_len = bulk_encode_u32_safe(&mut buf, &values)?;
// Bulk decode multiple values
let mut decoded_values = [0u32; 4];
let decoded_len = bulk_decode_u32_safe(&buf[..encoded_len], &mut decoded_values)?;
assert_eq!(values, decoded_values);
The SIMD optimizations are automatically selected based on your target architecture:
The serde wrapper types provide easy access to their inner values through Deref and DerefMut:
use vlen::serde::VlenU32;
let mut val = VlenU32(42);
assert_eq!(*val, 42);
*val = 100;
assert_eq!(*val, 100);
assert_eq!(val.0, 100);
The vlen format permits over-long encodings, which encode a value using
a byte sequence that is unnecessarily long:
[0, 2^7).[0, 2^28).The encode_* functions in this module will not generate such over-long
encodings, but the decode_* functions will accept them. This is intended
to allow vlen values to be placed in a buffer before the value to be
written is known. Applications that require a single canonical encoding for
any given value should perform appropriate checking in their own code.
Signed integers and IEEE-754 floating-point values may be encoded with
vlen by mapping them to unsigned integers. It is recommended that the
mapping functions be chosen so as to minimize the number of zeroes in the
higher-order bits, which enables better compression.
This library includes helper functions that use Protocol Buffer's "ZigZag" encoding for signed integers and reverse-endian layout for floating-point.
This project is licensed under the MIT License - see the LICENSE.txt file for details.
This crate is based on the original vu128 implementation by John Millikin, with significant performance improvements and enhancements by Harrison Chin.