| Crates.io | swh-digestmap |
| lib.rs | swh-digestmap |
| version | 0.1.4 |
| created_at | 2025-05-15 12:35:32.514398+00 |
| updated_at | 2025-08-25 15:47:17.982076+00 |
| description | A tool to quickly convert between content hashes (eg. SWHID <-> sha1) |
| homepage | |
| repository | https://gitlab.softwareheritage.org/swh/devel/swh-digestmap |
| max_upload_size | |
| id | 1674908 |
| size | 103,019 |
A tool to create a map of Software Heritage content hashes, from SWHIDs to SHA1, and a Python binding to access this map.
Designed after a hash conversion service idea. Current implementation is tailored for swh-fuse's "HPC" variant and relies on VFunc.
Run tests with cargo test --all-features.
A Digestmap is stored as a folder containing 3 files:
sha1_git.bin, the table of hashes known by the digestmap,sha1.bin, the table of corresponding sha1 hashes,sha1_git.vfunc, a serialized static function that maps a sha1_git to its index in both tables.Note: before being able to read the digestmap,
the library will need to load the vfunc file in memory.
The two other files will be memory-mapped.
This sets the requirements to read the complete archive's map at a minimum of 128GB of RAM,
and 1TB to work fully in-memory.
Default installation with cargo install swh-digestmap will build and install the swh-digestmap-map binary,
which is capable of looking up mapping from an already built map.
To be able to build maps yourself, install with cargo install swh-digestmap --features=build,
which will also build and install the swh-digestmap-build binary.
The program able to create a map has been isolated in the build feature,
because it is mostly intended to Software Heritage's internal use.
Building a digestmap requires to work fully in-memory, please size your machine accordingly.
The program needs an ORC-exported dataset
(only the content subfolder).
# Reference to a directory containing a Software Heritage export in ORC format.
# It must contain a subdirectory named `content`.
ORC_EXPORT_DIR=$HOME/swh-environment/swh-graph/swh/graph/example_dataset/orc
swh-digestmap-build --orc $ORC_EXPORT_DIR --dir-out digestmap_dir
We advise to use the Rust or Python API directly, but for short tests this can also be done one the CLI as follows
(where digestmap_dir is the directory generated by the build command above)):
swh-digestmap-map --swhid swh:1:cnt:0000000000000000000000000000000000000004 digestmap_dir