![Rust](https://github.com/Byron/github-star-counter/workflows/Rust/badge.svg) [![Crates.io](https://img.shields.io/crates/v/github-star-counter.svg)](https://crates.io/crates/github-star-counter) This program is made just for trying async-await code in the current ecosystem. It features the following capabilities: * do https requests * do multiple requests at a time, one per page * use async closures The code was done synchronously first, and then moved to async with a surprisingly small amount of changes. It was interesting to see how the [`async` constructs](https://github.com/Byron/github-star-counter/blob/e3746b9182a28a9e9a9e8dd55cdb660f6b1b97df/src/lib.rs#L90) allow to control parallelism precisely, to the point where I was able to design interdependent futures to match the data dependency. That way, things run concurrently when they can run concurrently, which can be visualized neatly with a dependency graph. The greatest difficulties were around getting https to work. Besides, it's clearly a learning process to understand the implications of futures better. Constructs with `async` tend to _look_ synchronous, but show their teeth with closures and ownership. Everything is solvable, just own everything, yet I think more borrowing will be enabled once `async` lands on _stable_. Something I absolutely agree with is the [statements in the async book](https://rust-lang.github.io/async-book/01_getting_started/02_why_async.html) which indicate that not everything needs to be async. Personally, I would probably start `sync`, and wait for performance requirements to change before making the switch. However, threads I would avoid in _future_, unless it truly is the simpler solution. Something I look forward to is to see fully-async libraries emerge, for example, to interact with `git`, which will probably perform better than existing libraries. _Using_ `async` libraries already is a breeze! When thinking about the parallelism of this simple application it already becomes evident that one would want to control the amount of in-flight futures. Just imagine the adverse effects of making too may concurrent connections to the same host, or the limits of resources imposed by the operating system itself. One would want to have executors who are aware of what kind of future they are running, and have them limit the amount of concurrently running ones. With `async`, Rust can be even more so change the game! ### Installation ```bash cargo install github-star-counter ``` ### Running and usage ```bash count-github-stars Byron ``` ```bash count-github-stars --help ``` A more complete example, showing how massive the speedups can be. However, please keep in mind that this can also be contention, e.g. there are simply too many concurrent requests which are much slower together than they would be individually. ``` 2019-08-15 08:47:49,553 INFO [github_star_counter] Total bytes received in body: 11.5 MB 2019-08-15 08:47:49,553 INFO [github_star_counter] Total time spent in network requests: 366.84s 2019-08-15 08:47:49,553 INFO [github_star_counter] Wallclock time for future processing: 22.62s 2019-08-15 08:47:49,553 INFO [github_star_counter] Speedup due to networking concurrency: 16.22x Total: 214379 Total for seanmonstar: 3818 Total for orgs: 210561 mozilla/pdf.js ★ 27611 mozilla/DeepSpeech ★ 10899 mozilla/BrowserQuest ★ 8249 mozilla/send ★ 8165 mozilla/togetherjs ★ 6393 mozilla/nunjucks ★ 6207 tokio-rs/tokio ★ 5598 linkerd/linkerd ★ 5042 hyperium/hyper ★ 5031 linkerd/linkerd2 ★ 4342 ➜ ``` ### Development ```bash git clone https://github.com/Byron/github-star-counter cd github-star-counter # Print all available targets make ``` All other interactions can be done via `cargo`. ### Difficulties on the way... Please note that at the time of writing, 2019-08-13, the ecosystem wasn't ready. Search the code for `TODO` to learn about workarounds/issues still present. * `async || {}` _(without move)_ is not yet ready, and needs to be move. This comes with the additional limitation that references can't be passed as argument, everything it sees must be owned. * `reqwest` with await support is absolutely needed. The low-level hyper based client we are using right now will start failing once github gzips its payload. For now I pin a working hyper version, which hopefully keeps working with Tokio. * Pinning of git repositories is not as easy as I had hoped - I ended up creating my own forks which are set to the correct version. However, it should also work with the `foo = { git = "https://github.com/foo/foo", rev = "hash" }` syntax. Maybe my ignorance though. * I would be interested in something like `collect::Result, Error>` for `Vec>>`. `join_all` won't abort on first error, but I think it should be possible to implement such functionality based on it. * Defining a closure with `let mut closure: impl FnMut(User, usize) -> impl Future` doesn't seem to work. The closure return type must be a type parameter. ### Changelog For the parallelism diagrams, a data point prefixed with `*` signals that multiple data is handled at the same time. #### v1.1.0 - Support for 'tera' templates Thanks to the generous contribution of @mre there now is support for rendering to custom tera templates. [Look here](https://endler.dev/about/) for an example. #### v1.0.6 - Assurance of correctness Github can silently adjust the page size, e.g. one asks for 1000 items per page and generates queries accordingly, but it will respond only with 100. Now we check and abort with a suggested page size, if the given one was not correct. The current page size seems to be limited to 100. #### v1.0.5 - Better performance metrics #### v1.0.4 - Even better progress - less is more Just show the aggregated result #### v1.0.3 - Better progress messages Even though the header is parsed and received relatively quickly, the body is read afterwards which takes additional time. This will now be logged as well. #### v1.0.2 - Even more parallel query of user's repositories Parallelism looks like this: ``` user-info+---->orgs-info+---->*(user-of-orgs+---->*repo-info-page) | | +---->*repo-info-page ``` Now it's as parallel as it can be, based on the data dependency. This is real nice actually! #### v1.0.1 - More parallel query of user's repositories Parallelism looks like this: ``` user-info+---->orgs-info+-+-->*(user-of-orgs+---->*repo-info-page) | | ^ | wait | | +----------------+-----------------------^ ``` We don't wait for fetching org user info, but still wait for orgs information before anything makes progress. Fetching repo information for the main user waits longer than needed. #### v1.0.0 - Initial Release Parallelism looks like this: ``` user-info+---->orgs-info+--->*(user-of-orgs-and-main-user+---->*repo-info-page) ``` ### Reference [This gist](https://gist.github.com/yyx990803/7745157) got me interested in writing a Rust version of it.