Crates.io | criner-cli |
lib.rs | criner-cli |
version | 0.4.0 |
source | src |
created_at | 2020-03-20 00:43:27.712246 |
updated_at | 2024-10-17 05:49:40.948426 |
description | A command-line interface for the 'Criner' crates mining platform |
homepage | |
repository | https://github.com/the-lean-crate/criner |
max_upload_size | |
id | 220596 |
size | 133,178 |
We want to improve build times by reducing download and extraction times. This makes the ecosystem more approachable to people or regions with slow internet and thus is very relevant for inclusiveness and for extending Rusts reach.
This is facilitated by three means:
I lived in China and learned to live with slow and flaky internet connections. Every byte that reaches my computer makes me shed a tear in joy.
This initiative was motivated by a nushell
update which took forever and failed multiple times when trying to send me 3MB of
images in a 4MB download. The fix was trivial, and I wondered how much more there was to gain by simple fixes like that.
The idea for The Criner Waste Report was born, which soon turned into a multi-step plan to tackle this problem.
Nowadays, nushell
is perfectly lean, and I hope we will have more of these crates as the initiative progresses.
First of all, thanks so much for your willingness to help! Let's get started.
Head over to The Criner Waste Report and find your crate or jump to your crate directly using https://the-lean-crate.github.io/waste/<your-crate>
.
See if a lot of 'waste' is detected, and validate and try the suggested fix. If something is wrong or not working, click the Provide Feedback link
at the bottom of your crates page.
include
directive, with values suggested by the page above, i.e. include = ["src/**/*", "LICENSE", "README.md", "!**/benches/*"]
.cargo package --offline --allow-dirty --no-verify
target/package
and untar+gz it using tar -xzf target/package/<crate-version>.crate
.As the first part of The Lean Crate Initiative, this report provides the data needed to see if this is a problem worth solving in the first place. And as of 2020-03-18, initial numbers show that out of 147GB of uncompressed crates data, 59GB or 40% are most probably not required to build a crate.
The report operates on the following assumptions:
From these assumptions, some conclusions can be drawn. There is no need for…
Based on these assumptions and conclusions, The Criner Waste Report computes a suggestions for new include
or exclude
directives which prevent
unnecessary data to be put into the crate archive.
Due to the way Cargo handles these directives, include
directives are deemed most powerful in the persuit of keeping the amount of patterns small, using
negative patterns where needed. Thus these will be recommended whenever feasible.
This part of the initiative is still under heavy development, but available as ugly alpha.
Please do note that your feedback on whether or not these assumptions and conclusions are correct is much appreciated, everything can be changed to make The Criner Waste Report better in a collaborative, community driven effort.
cargo diet
Companion ProgramThe logic employed in 'The Criner Waste Report' is available at your finger tips using cargo diet
, making it easy to start out with a perfectly lean crate
even when publishing a crate for the first time.
Apologies, the term was proposed by the marketing department who believed that 'The Criner Waste Report' will do better than 'The Criner Report of files you do not need to build a crate'.
The author does shame crates that are bigger than they probably have to be, and is happy to help get your crate off the index. Some files listed are certainly false positivies due to limitations, read on in this FAQ to learn how to remove these false positives.
Indeed the Waste Report does its best to extract names from build scripts, but won't be able to resolve things like format!("C-lib-1.0.23-{}", suffix)
.
To resolve this, set your own include
directive. The Criner Waste Report will help finding even better includes from that point on, but it will merely
be a suggestion, trusting that you set includes exactly the way they are needed.
It detecs files included via include_str!(…)
and include_bytes!(…)
, but only so in in lib.rs
and main.rs
, or other binary targets.
Add the include = […]
that it proposes, possibly altered to your liking and needs. It will still provide you with potential negated include
patterns to exclude, for instance, tests and docs.
The waste report favors include directives, as it will not mark any file as wasted if present, but make recommendations on how to save even more by excluding tests, docs and the likes.
When excludes are present, it makes recommendations mandatory, and considers all files that don't are included despites those recommendations to be waste. The reason is that whitelists, i.e. include directives, are better supported by cargo due to the presence of negations, so it assumes people have better control over the includes they make.
It's our way to hint at the possibility of making your crate smaller while acknowledging that your include
directive is probably exactly what you
had in mind when designing it.
However, right now we believe that certain kinds of files are not needed to build a crate and thus may have additional negation patterns that would
exclude these files. Common examples are tests, which are easily included by the typical src/**/*.rs
include directive.
Potential savings do not count as 'Waste', but currently prevent the crate version
from achieving the perfectly lean
status.
README.md
will actually match everything that matches */README.md
or **/README.md
. This may include unwanted files and we will not detect these.
Circumvent this yourself by using a prefix slash, such as in /README.md
indicating the file must be in the top level.On a venerable 5 year old quad-core MBP…
Both of the above only happen once, as from that point on all else is incremental, reducing the amount of unnecessary work to close to zero.
Criner is a platform to make incrementally mining crates.io easy and affordable for everyone. Criner is fast, configurable to use all available bandwidth and CPU, while keeping the memory footprint low enough to comfortably run on small devices with less than 512MB of RAM.
Criner currently operates in three stages when executed with criner mine
:
Cargo.toml
in full up to 128kb in size.
As of 2018-03-18 it takes 10min to process all 215k crate versions on a 5year old MBPro with 4 physical cores.cargo geiger
.Clone this repository and run cargo run --release -- mine
to get started. Provided criner is allowed to finish, it will require about 46GB of disk space as of 2020-03-18.
Provided there is a database generated already with criner mine
, run criner export
to get another SQlite database with all data exploded into tables and fields, which
can be operated using SQL. This process is non-incremental and takes about 5 minutes to complete on a single core. Threading is not implemented.
Some of the columns are of type JSON
, whose properties can be used in queries using the json_*(…)
set of SQLITE functions.
Possible improvements are along export performance - it could probably be parallel and incremental - and along not having to mine yourself for an initial database state. Criner could upload its database once a day to an S3 bucket for instance - it's about 800MB gzipped.
As migrations are currently special purpose programs that may eat laundry for breakfast, they cannot be executed by accident.
RUST_LOG=info cargo run --features migration -- migrate