First report: More than 80% of the crates link to their public VCS

Date: 2023.07.22

In order to make it easy to contribute to an open source project, the first step is to find its public VCS (Version Control System) which is usually on GitHub nowadays. In particular this means that Rust crates should include a link to their VCS in their meta-data.

I started to write the Rust Digger on Sat Jun 17 11:08:42 2023 +0300, slightly more than a month ago. At first, as I was just started to learn Rust, it went extremely slowly, but after a few weeks it started to have some form. I even located a few crates that either linked to http://github.com (instead of https) or that they were linking to www.github.com instead of github.com. Neither are big issues, but I sent a few Pull-requests to fix these and I think most of those PRs were accepted.

This is the first report based on the data collected by the Rust Digger. Below you can find a copy of the stats page. A few highlights: We found 120,484 crates. Almost 80% of them have a link to their repository (Public version control system). An additional 3% have a value in the homepage field. In many cases that is actually the link to their repository. 17.17% of them have neither repository nor homepage. There are also 313 crates (0.26% of the total), that use an http link to their git repo. It is not a big issue, after all most likely the VCS host will redirect to the https version, but nevertheless using the https link is more correct.

Is that 80% high or low you might ask. I stats on PyDigger where only 63.49% of the Python packages have links to their VCS and that only covers the 115,321 most recently uploaded packages. For older packages it is probably lower. So 80% is much better than those for Python, but still it means for 17-20% of the crates we don't have a very easy way to find their repos.

Almost 75% of all the crates use GitHub as their repository leaving 5% for the rest of the services. GitLab has 2.68%. There are also 1,821 crates (1.51% of all the crates) where we did not categorize the repository yet. That still needs some manual work. Some of these have http link to github.com.

name value percentage
total 120,484 100%
No repository 24,173 20.06%
GitHub 90,215 74.87%
GitLab 3,230 2.68%
Codeberg 264 0.21%
Gitee 153 0.12%
Freedesktop (GitLab) 124 0.1%
Torproject (GitLab) 47 0.03%
Wikimedia (GitLab) 11 0%
e3t 12 0%
sr.ht 342 0.28%
openprivacy 5 0%
gnome (GitLab) 69 0.05%
cronce (GitLab) 18 0.01%
other 1,821 1.51%
no-repo 2,4173 20.06%
Has homepage but no repository 3,480 2.88%
No homepage and no repository 20,693 17.17%
Unsecure repo url (using http) 313 0.26%