| Crates.io | axonml-quant |
| lib.rs | axonml-quant |
| version | 0.2.4 |
| created_at | 2026-01-24 14:03:49.968123+00 |
| updated_at | 2026-01-25 22:36:07.799004+00 |
| description | Model quantization for the Axonml ML framework |
| homepage | |
| repository | https://github.com/automatanexus/axonml |
| max_upload_size | |
| id | 2066731 |
| size | 67,902 |
axonml-quant provides model quantization support for reducing model size and improving inference performance. It supports multiple quantization formats including 8-bit, 4-bit, and half-precision floating point, with calibration methods for determining optimal quantization parameters.
| Module | Description |
|---|---|
types |
Quantization type definitions, block structures (Q8Block, Q4Block, Q4_1Block), and QuantizedTensor |
quantize |
Functions for quantizing tensors to various formats with parallel processing |
dequantize |
Functions for converting quantized tensors back to floating point |
calibration |
Calibration data collection and methods for optimal quantization parameters |
error |
Error types and Result alias for quantization operations |
Add this to your Cargo.toml:
[dependencies]
axonml-quant = "0.1.0"
use axonml_quant::{quantize_tensor, dequantize_tensor, QuantType};
use axonml_tensor::Tensor;
// Create a tensor
let tensor = Tensor::from_vec(vec![1.0, 2.0, 3.0, 4.0], &[4])?;
// Quantize to 8-bit
let quantized = quantize_tensor(&tensor, QuantType::Q8_0)?;
// Check compression ratio
println!("Compression ratio: {:.2}x", quantized.compression_ratio());
// Dequantize back to f32
let restored = dequantize_tensor(&quantized)?;
use axonml_quant::{quantize_model, QuantType};
// Quantize multiple named tensors
let tensors = vec![
("weights", &weight_tensor),
("bias", &bias_tensor),
];
let quantized_model = quantize_model(&tensors, QuantType::Q4_0)?;
use axonml_quant::{calibrate, CalibrationMethod, CalibrationData};
// Calibrate using percentile method (99.9%)
let calib_data = calibrate(&sample_tensor, CalibrationMethod::Percentile(999))?;
// Get optimal scale for quantization
let scale = calib_data.symmetric_scale(QuantType::Q8_0);
// Or use asymmetric quantization
let (scale, zero_point) = calib_data.asymmetric_scale(QuantType::Q8_0);
use axonml_quant::{compute_quantization_stats, QuantType};
let stats = compute_quantization_stats(&original, &dequantized, QuantType::Q8_0);
println!("RMSE: {:.6}", stats.rmse);
println!("Max Error: {:.6}", stats.max_error);
println!("Mean Error: {:.6}", stats.mean_error);
println!("Compression: {:.2}x", stats.compression_ratio);
| Type | Bits | Block Size | Compression | Use Case |
|---|---|---|---|---|
| Q8_0 | 8 | 32 | 4x | High accuracy, moderate compression |
| Q4_0 | 4 | 32 | 8x | Good balance of size and accuracy |
| Q4_1 | 4 | 32 | ~6x | Better accuracy with min/max tracking |
| Q5_0 | 5 | 32 | ~6x | Middle ground between Q4 and Q8 |
| F16 | 16 | 1 | 2x | Minimal accuracy loss |
| F32 | 32 | 1 | 1x | No compression (reference) |
Run the test suite:
cargo test -p axonml-quant
Licensed under either of:
at your option.