# User Stories

as a rust beginner
I would like my builds to be fast
so that I get a good first impression of rust

as an open source contributor (starting on a new project)
i want to have pre-built packages
so that it takes less time to build a new project

as a user of github actions
I want to use a global cache of pre-built packages
so that my builds don't take 10 minutes
(this can be solved with decent cache)

# Extended Pizza Analogy

This analogy came up in a meeting between David, Matt, and Zed (two developers and an insights person), and fleshed out over time. If you're wondering why we're constantly talking about pizza, this is why.

- MHRA is like a pizza shop where pizzas are made
- there are many chefs working in the MHRA pizza shop and there are many pizza shops doing similar work to MHRA
- a pizza (backend web server) has cheese (http library) and pizza base (database library)
  - pizza base has dough (low-level database library)
    - dough has flour (network library), salt (encryption library) and yeast (database query formatter)
  - cheese has milk (http library) and salt (same encryption library)
- _but_
  - there are lots of different ways to make pizza base
    - there are lots of different ways to make flour
    - there are lots of different ways to make yeast
  - there are lots of different ways to make cheese
    - there are lots of different ways to make milk
    - there are lots of different ways to make salt
- it costs money to _pre-make different types of_ dough, cheese and pizza base so we want to only _pre-make_ the ingredients for the pizzas that people want
- everything pre-built has a shelf life of 6 weeks (trust me on this one)
- _so_
- We want to pre-make only the most popular ingredients to maximise the _amount of time saved by_ chefs making pizzas
- and then we can have quick pizza in rust land for the masses and no one needs to go hungry again

# cargo-quickbuild sketch design

## minimal version of the client:

- assume Cargo.lock is up to date
- explode immediately if it's not a debug build, or there are already release assets, or there is a .cargo/config that we should be honouring
- parse dependency tree using https://crates.io/crates/cargo-lock or similar
- for each root of the tree, serialise and compute a hash
  - try to fetch a pre-built pizza base
    - fetch /cratename-HASH_OF_DEPENDENCY_TREE-rustc_version-arch from github releases of `cargo-quickbuild-releases` repo, and unpack
  - if success, unpack it and report to stats server
    - stretch goal: keep a download cache and/or unpack in a common place and hardlink them into target/
  - if failure, build from source and report time to stats server
- if any cache miss happens, POST the full Cargo.lock somewhere.

## minimal version of the analyser:

- hoover up Cargo.lock files from rust-repos
- for each Cargo.lock file:

  - parse dependency tree
  - for each root:
    - caclulate cratename-HASH_OF_DEPENDENCY_TREE-rustc_version-arch
    - estimate the size of the dependency tree (unit = crate count?)
    - stats.count("cratename-HASH_OF_DEPENDENCY_TREE-rustc_version-arch", 1)
    - stats.count("cratename-HASH_OF_DEPENDENCY_TREE-rustc_version-arch-size", size)
    - store the serialised dependency tree in a `cargo-quickbuild-trees` git repo if it doesn't already exist

- assume that compilation pain is proportional to download count
- TODO: get timings of how long it takes to build a sample of packages
  - can we assume that build time is the same for all packages (might be no)

We want to optimise

- minimise cost of storage (TODO: work out how to account for this) - assume that this is proportonal to compilation time, or assume that this is insignificant for now
-
- time saved in total (globally for all users) - compilation (download) count \* compilation time
- minimise cost of compilation - compilation time
- therefore: maximise time saved globally per unit of compilation cost (time) time - download counts

what proportion of package downloads are commodities, and what proportion are niche and need to be bespoke

Focus on just a subset of projects? Just ones that we have checked out locally.

figure out a way to get a handle on time saved globally - how long does the average package take to compile, ignoring its dependencies (most popular 1000 packages).

## minimal version of the service:

- receive Cargo.lock and store somewhere
- parse dependency tree
- for each root:
  - calculate the hash etc
  - store the
  - if the count for that hash exceeds $THRESHOLD and a build isn't started, trigger a build

## minimal version of the builder:

- When trigger comes in to build a package:
- fetch cratename-HASH_OF_DEPENDENCY_TREE serialised tree from the `cargo-quickbuild-trees` git repo
- unpack it into `Cargo.lock` and create a fake src/main.rs like how `cargo-chef` does
- create a fake Cargo.toml as well
- `cargo build --package=cratename`
- make a tarball of target/
- release it as `cratename-HASH_OF_DEPENDENCY_TREE-rustc_version-arch` on `cargo-quickbuild-releases` repo (should be fine to have a single commit in that repo and tag it will infinity git tags).

# pivot triggers

Analyser is part of validation stage. If we find lots of large common subtrees then we can continue with the project. If not then ⏎ or 🚮 .

- If 30% of all projects depending on tokio could fetch tokio's dependency tree from the same bundle then we're winning.
- Similar with tide/actix ?
- Do we discriminate for/against projects that are using dependabot to keep their dependencies in.
- Do we ignore inactive projects somehow?

# Possible pivots

There might be some value in saying "I see you're using $X. Would you like to buy $Y?"

# KPIs (Quantifiable things for later)

Time saved - can't know this yet because we can't build the whole world yet.