Crates.io | swh-digestmap |
lib.rs | swh-digestmap |
version | 0.1.4 |
created_at | 2025-05-15 12:35:32.514398+00 |
updated_at | 2025-08-25 15:47:17.982076+00 |
description | A tool to quickly convert between content hashes (eg. SWHID <-> sha1) |
homepage | |
repository | https://gitlab.softwareheritage.org/swh/devel/swh-digestmap |
max_upload_size | |
id | 1674908 |
size | 103,019 |
A tool to create a map of Software Heritage content hashes, from SWHIDs to SHA1, and a Python binding to access this map.
Designed after a hash conversion service idea. Current implementation is tailored for swh-fuse's "HPC" variant and relies on VFunc.
Run tests with cargo test --all-features
.
A Digestmap is stored as a folder containing 3 files:
sha1_git.bin
, the table of hashes known by the digestmap,sha1.bin
, the table of corresponding sha1
hashes,sha1_git.vfunc
, a serialized static function that maps a sha1_git
to its index in both tables.Note: before being able to read the digestmap,
the library will need to load the vfunc
file in memory.
The two other files will be memory-mapped.
This sets the requirements to read the complete archive's map at a minimum of 128GB of RAM,
and 1TB to work fully in-memory.
Default installation with cargo install swh-digestmap
will build and install the swh-digestmap-map
binary,
which is capable of looking up mapping from an already built map.
To be able to build maps yourself, install with cargo install swh-digestmap --features=build
,
which will also build and install the swh-digestmap-build
binary.
The program able to create a map has been isolated in the build
feature,
because it is mostly intended to Software Heritage's internal use.
Building a digestmap requires to work fully in-memory, please size your machine accordingly.
The program needs an ORC-exported dataset
(only the content
subfolder).
# Reference to a directory containing a Software Heritage export in ORC format.
# It must contain a subdirectory named `content`.
ORC_EXPORT_DIR=$HOME/swh-environment/swh-graph/swh/graph/example_dataset/orc
swh-digestmap-build --orc $ORC_EXPORT_DIR --dir-out digestmap_dir
We advise to use the Rust or Python API directly, but for short tests this can also be done one the CLI as follows
(where digestmap_dir
is the directory generated by the build command above)):
swh-digestmap-map --swhid swh:1:cnt:0000000000000000000000000000000000000004 digestmap_dir