Crates.io | swh-provenance |
lib.rs | swh-provenance |
version | 0.4.1 |
created_at | 2025-03-12 10:25:47.769824+00 |
updated_at | 2025-04-02 13:36:32.582254+00 |
description | gRPC service to efficiently find the first revisions/releases/origins to contain a given content/directory |
homepage | |
repository | https://gitlab.softwareheritage.org/swh/devel/swh-provenance |
max_upload_size | |
id | 1589620 |
size | 216,944 |
This service provide a provenance query service for the Software Heritage Archive. Provenance is the ability to ask for a given object stored in the Archive: "where does it come from?"
This question generally does not have a simple and unambiguous answer. It can be, among other:
Answering this kind of question requires querying the Merkle DAG on which the Software Heritage Archive is built with complex queries, mostly from the bottom to the top (aka from Content to Origin objects).
The idea is to use both the compressed graph representation of the Archive (swh-graph) and a preprocessed provenance index to speed up some of the provenance queries.
The core feature of this tool is to provide a service to the reference to an object within the Software Heritage Archive where the queried object can be found.
There are mostly 2 kinds of provenance queries that can be done:
For each input object, the definition of "best provenance answer" is simple and unambiguous; for now, the best answer is the an origin in which the oldest revision (in the sense of the revision with the oldest commit date) in which this object has been found.
Provenance can be looked for:
Content
Directory
Revision
Release
For each object:
This documents the backend provenance service; it is not meant to be used directly but rather via the Public API; please refer to its description for more details on how to use the Provenance public API.