# Todo List

## Find an optimal value for `STRATEGY_RATIO_THRESHOLD`

When generating random names, if the number of files is on the same order of magnitude as the naming space, there will almost certainly be a lot of name collisions if we generate them individually (using `generate_on_demand` strategy), causing lots of retries.

Given `p` files and naming space size `q` (`p <= q`), the expected value of number of tries can be calculated as follows:

```rust
// partial sum of expected number of tries for each file
Exp(total_tries) = Sum(f = 0; f < p; f += 1){ Exp(tries[f]) }
```
where:
```rust
// infinite sum of (i * probability of success at i tries)
Exp(tries[f]) = Sum(i = 1; i -> Inf; i += 1){ i * P(success_at_try[i]) }
```
where:
```rust
// probability of colliding (i - 1) times * probability of not colliding
P(success_at_try[i]) = (f / q)^(i - 1) * (q - f) / q
```

Substituting and simplifying:

```rust
Exp(tries[f]) = Sum(i = 1; i -> Inf; i += 1){ i * (f / q)^(i - 1) * (q - f) / q }
  = Sum(i = 1; i -> Inf; i += 1){ i * (f / q)^i * (q - f) / f }
  = (q - f) / f * Sum(i = 1; i -> Inf; i += 1){ i * (f / q)^i }
  = (q - f) / f * (f * q / (q - f)^2)
  = q / (q - f)
```
```rust
Exp(total_tries) = Sum(f = 0; f < p; f += 1){ q / (q - f) }
```

If you select a concrete value for `q`, and plot `Exp(total_tries)` against `p`, you should be able to see that the expected number of tries starts out linear, but starts to increase hyperbolically as `p` approaches `q` due to the very high likelihood of collision.

For example:
 - [when `q = 1000` and `p = 950`](https://www.wolframalpha.com/input?i=sum+from+f%3D0+to+949%2C+1000%2F%281000-f%29), `Exp(total_tries) ≈ 2986`.
 - [when `q = 1000` and `p = 1000`](https://www.wolframalpha.com/input?i=sum+from+f%3D0+to+999%2C+1000%2F%281000-f%29), `Exp(total_tries) ≈ 7485`.

![Total tries v.s. Number of Files](res/total_tries_vs_no_files.png)

This is why rng-rename has an alternate strategy - `generate_then_match`. In this mode, we first generate all possible permutations given the charset, then randomly match them to files.

This is mode is non-economical when the number of possible permutations (naming space) is very large compared to the number of files, but it is increasingly appealing as the `generate_on_demand` strategy starts to cause an excessive amount of collisions, as the number of files approach the size of the naming space.

Therefore we use a hard-coded constant `STRATEGY_RATIO_THRESHOLD` to determine at which point we should switch from `generate_on_demand` to `generate_then_match`. This constant is located at `crate::compute::STRATEGY_RATIO_THRESHOLD`.

However, the optimal value for this constant is not trivially findable. Due to the difference in implementation of the two strategies, we cannot assume that a lower number of names generated (or its expected value) equates to overall better performance. Most likely, benchmarking is necessary to optimise the specific value of `STRATEGY_RATIO_THRESHOLD`.

If you would like to help with this task, you can force the strategy using the option `--force-generation-strategy`, then use a tool like `hyperfine` to measure the overall execution time.

## Refactor desired but blocked

 - `crate::compute::get_extension`
   - Awaiting implementation and stabilisation of `Path::file_suffix`
   - See https://github.com/rust-lang/rust/issues/86319#issuecomment-996152668
 - `crate::io_helper::do_rename`
   - Awaiting stabilisation of `Path::try_exists`
   - See https://github.com/rust-lang/rust/issues/83186