Crates.io | bintensors |
lib.rs | bintensors |
version | |
source | src |
created_at | 2025-04-02 10:40:33.10984+00 |
updated_at | 2025-04-16 16:59:48.013595+00 |
description | Bintensors is a high-performance binary tensor serialization format designed to be faster eliminating use of JSON serialization metadata. |
homepage | |
repository | https://github.com/GnosisFoundation/bintensors |
max_upload_size | |
id | 1616335 |
Cargo.toml error: | TOML parse error at line 23, column 1 | 23 | autolib = false | ^^^^^^^ unknown field `autolib`, expected one of `name`, `version`, `edition`, `authors`, `description`, `readme`, `license`, `repository`, `homepage`, `documentation`, `build`, `resolver`, `links`, `default-run`, `default_dash_run`, `rust-version`, `rust_dash_version`, `rust_version`, `license-file`, `license_dash_file`, `license_file`, `licenseFile`, `license_capital_file`, `forced-target`, `forced_dash_target`, `autobins`, `autotests`, `autoexamples`, `autobenches`, `publish`, `metadata`, `keywords`, `categories`, `exclude`, `include` |
size | 0 |
Another file format for storing your models and "tensors", in a binary encoded format, designed for speed with zero-copy access.
You can add bintensors to your cargo by using cargo add
:
cargo add bintensors
You can install bintensors via the pip manager:
pip install bintensors
For the sources, you need Rust
# Install Rust
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
# Make sure it's up to date and using stable channel
rustup update
git clone https://github.com/GnosisFoundation/bintensors
cd bintensors/bindings/python
pip install setuptools_rust
# install
pip install -e .
import torch
from bintensors import safe_open
from bintensors.torch import save_file
tensors = {
"weight1": torch.zeros((1024, 1024)),
"weight2": torch.zeros((1024, 1024))
}
save_file(tensors, "model.bt")
tensors = {}
with safe_open("model.bt", framework="pt", device="cpu") as f:
for key in f.keys():
tensors[key] = f.get_tensor(key)
Lets assume we want to handle file in rust
use bintensors::BinTensors;
use memmap2::MmapOptions;
use std::fs::File;
let filename = "model.bt";
use std::io::Write;
let serialized = b"\x18\x00\x00\x00\x00\x00\x00\x00\x00\x01\x08weight_1\x00\x02\x02\x02\x00\x04 \x00\x00\x00\x00";
File::create(&filename).unwrap().write(serialized).unwrap();
let file = File::open(filename).unwrap();
let buffer = unsafe { MmapOptions::new().map(&file).unwrap() };
let tensors = BinTensors::deserialize(&buffer).unwrap();
let tensor = tensors
.tensor("weight_1");
std::fs::remove_file(filename).unwrap()
This project initially started as an exploration of the safetensors
file format, primarily to gain a deeper understanding of an ongoing parent project of mine, on distributing models over a subnet. While the format itself is relatively intuitive and well-implemented, it leads to some consideration regarding the use of serde_json
for storing metadata.
Although the decision by the Hugging Face safetensors
development team to utilize serde_json
is understandable, such as for file readability, I questioned the necessity of this approach. Given the complexity of modern models, which can contain thousands of layers, it seems inefficient to store metadata in a human-readable format. In many instances, such metadata might be more appropriately stored in a more compact, optimized format.
TDLR why not just use a more otimized serde such as bincode
.
Serde figure from safetensors generated by cargo bench
Serde figure from bintensors generated by cargo bench
Incorporating the bincode
library led to a significant performance boost in deserialization, nearly tripling its speed—an improvement that was somewhat expected. Benchmarking code can be found in bintensors/bench/benchmark.rs
, where we conducted two separate tests per repository, comparing the serialization performance of model tests in safesensors and bintensors within the Rust-only implementation. The results, as shown in the figure above, highlight the substantial gains achieved.
To better understand the factors behind this improvement, we analyzed the call stack, comparing the performance characteristics of serde_json
and bincode
. To facilitate this, we generated a flame graph to visualize execution paths and identify potential bottlenecks in the serde_json
deserializer. The findings are illustrated in the figures below.
This experiment was conducted on macOS, and while the results are likely consistent across platforms, I plan to extend the analysis to other operating systems for further validation.
Serde figure from bintensors generated by flamepgraph & inferno
Serde figure from safetensors generated by flamepgraph & inferno
Visual representation of bintensors (bt) file format
In this section, I suggest creating a small model and opening a hex dump to better visually decouple it while we go over the high level of the bintensors file format. For a more in-depth understanding of the encoding, I would glance at specs.
The file format is divided into three sections:
Header Size:
safetensors
but that may be changed later in the future.Header Data
Tensor Data
Since this is a simple fork of safetensors it holds similar propeties that safetensors holds.
Licence: MIT