swh-digestmap

Crates.ioswh-digestmap
lib.rsswh-digestmap
version0.1.4
created_at2025-05-15 12:35:32.514398+00
updated_at2025-08-25 15:47:17.982076+00
descriptionA tool to quickly convert between content hashes (eg. SWHID <-> sha1)
homepage
repositoryhttps://gitlab.softwareheritage.org/swh/devel/swh-digestmap
max_upload_size
id1674908
size103,019
Software Heritage (swhmirror)

documentation

README

swh-digestmap

A tool to create a map of Software Heritage content hashes, from SWHIDs to SHA1, and a Python binding to access this map.

Designed after a hash conversion service idea. Current implementation is tailored for swh-fuse's "HPC" variant and relies on VFunc.

Run tests with cargo test --all-features.

A Digestmap is stored as a folder containing 3 files:

  • sha1_git.bin, the table of hashes known by the digestmap,
  • sha1.bin, the table of corresponding sha1 hashes,
  • sha1_git.vfunc, a serialized static function that maps a sha1_git to its index in both tables.

Note: before being able to read the digestmap, the library will need to load the vfunc file in memory. The two other files will be memory-mapped. This sets the requirements to read the complete archive's map at a minimum of 128GB of RAM, and 1TB to work fully in-memory.

Installation

Default installation with cargo install swh-digestmap will build and install the swh-digestmap-map binary, which is capable of looking up mapping from an already built map. To be able to build maps yourself, install with cargo install swh-digestmap --features=build, which will also build and install the swh-digestmap-build binary.

Build a digestmap

The program able to create a map has been isolated in the build feature, because it is mostly intended to Software Heritage's internal use. Building a digestmap requires to work fully in-memory, please size your machine accordingly.

The program needs an ORC-exported dataset (only the content subfolder).

# Reference to a directory containing a Software Heritage export in ORC format.
# It must contain a subdirectory named `content`.
ORC_EXPORT_DIR=$HOME/swh-environment/swh-graph/swh/graph/example_dataset/orc
swh-digestmap-build --orc $ORC_EXPORT_DIR --dir-out digestmap_dir

Find a SHA1 from a SWHID

We advise to use the Rust or Python API directly, but for short tests this can also be done one the CLI as follows (where digestmap_dir is the directory generated by the build command above)):

swh-digestmap-map --swhid swh:1:cnt:0000000000000000000000000000000000000004 digestmap_dir
Commit count: 0

cargo fmt