## Wipe-on-fork `OnceCell`, `LazyCell`, `Once`, `OnceLock`, `LazyLock` for Rust
There has been a conspiracy theory on who created the pyramids: Egyptians or aliens. Similarly, thousands of years
later, we can expect futurelings to ask who invented the Internet: humans or aliens?
A historian, at that time, can cite the HTTP status code [418 I'm a teapot](https://en.wikipedia.org/wiki/Hyper_Text_Coffee_Pot_Control_Protocol), which was originally
an April Fools' prank, to prove that HTTP was invented by living creatures that drink tea, make teapots, and carry
a sense of humors. Shane Brunswick, who created the [save418.com](https://save418.com/) website that was crucial in the effort to not discard 418,
said "It's a reminder that the underlying processes of computers are still made by humans."
Similar things happen in other areas of computer science. `fork()` being one of them. It is a way for one process to
create another process. In HotOS 2019, four highly reputable computer systems researchers—Andrew Baumann, Jonathan Appavoo,
Orran Krieger, and Timothy Roscoe—in their paper [A fork() in the road](https://www.microsoft.com/en-us/research/uploads/prod/2019/04/fork-hotos19.pdf),
have discussed why people should avoid `fork()`, despite that it has been the core of POSIX and operating systems
designs and has been widely used. A parallel universe may not have `fork()`.
Since `fork()` is a low-level operating systems primitive that performs changes in a way that applications are close to
being transparent to, it always has a compatability issue. For Rust, it has been a headache (see https://github.com/rust-lang/rust/issues/6930,
https://github.com/rust-lang/rust/issues/9373, https://github.com/rust-lang/rust/issues/9568, https://github.com/rust-lang/rust/issues/16799).
This calls for a replacement to the `lazy_static!` above that child processes would need to run their own rather than
inherit it from the parent. This is closely related to a concept called "wipe-on-fork" in [Linux madvise function](https://man7.org/linux/man-pages/man2/madvise.2.html),
which allows a program to advise the operating system to wipe a page upon being forked:
> MADV_WIPEONFORK (since Linux 4.14)
>
> Present the child process with zero-filled memory in this
range after a fork(2). This is useful in forking servers
in order to ensure that sensitive per-process data (for
example, PRNG seeds, cryptographic secrets, and so on) is
not handed to child processes.
Therefore, we adopt this naming convention and creates a number of data structures.
This repository re-implements (copy-pastes, but with some modifications) the following structures from Rust's `std` library:
| Rust `std` Library | This library |
|-----------------------|------------------------------------|
| `std::cell::OnceCell` | `wipe_on_fork::WipeOnForkOnceCell` |
| `std::cell::LazyCell` | `wipe_on_fork::WipeOnForkLazyCell` |
| `std::sync::Once` | `wipe_on_fork::WipeOnForkOnce` |
| `std::sync::OnceLock` | `wipe_on_fork::WipeOnForkOnceLock` |
| `std::sync::LazyLock` | `wipe_on_fork::WipeOnForkLazyLock` |
Most of the code, including the [documentation tests](https://doc.rust-lang.org/rustdoc/write-documentation/documentation-tests.html),
are copy-and-pasted from Rust std library in [rust-lang/rust](https://github.com/rust-lang/rust). We did so rather than
using the existing primitives in a black-box manner—which would always be the preferred choice—because (1) some are still
pending stabilization, (2) some necessary types or functions are only accessible within the `std` crate, (3) certain changes
from us require more low-level manipulation.
The usage fo the wipe-on-fork versions of `OnceCell`, `LazyCell`, `Once`, `OnceLock`, `LazyLock` resembles their keep-on-fork
counterparts. It is necessary to note that these wipe-on-fork versions are not "better" or "more general-purpose" implementations.
Some applications would **_specifically_** require wipe-on-fork, while other applications would **_specifically_** require keep-on-fork.
This is why we include the prefix `WipeOnFork*` to remind that they are related but fundamentally different upon `fork()`.
Note that `fork()` is not the only solution to create child processes. Indeed, a more favorable solution, though less convenient,
is to `posix_spawn()` new processes. This has been used in [Dask](https://www.dask.org/), but not in [Ray](https://github.com/ray-project/ray) (see discussion in https://github.com/ray-project/ray/issues/13568).
The use of wipe-on-fork primitives is to offer compatibility upon composability, as anywhere, any thread of a process can
make a call to `fork()`, and the less destructive solution is, like [thread safety](https://en.wikipedia.org/wiki/Thread_safety),
to write code with **fork safety**.
### Fork detection
There are two approaches to detect a fork on the background.
- check if the code is running under a different [process ID (PID)](https://en.wikipedia.org/wiki/Process_identifier), obtainable from `std::process::id()`
- register a fork handler through [pthread_atfork()](https://man7.org/linux/man-pages/man3/pthread_atfork.3.html)
We eventually did not go with the PID approach because it has an inherent limitation. In Unix, there is no guarantee that
PID does not repeat. In fact, assuming that the entire operating system already has `pid_max - 3` processes (note: since `pid_max` is 2^22 in 64-bit systems, this is very unlikely):
- The father has PID `a`
- The father forks and creates the son with PID `b`
- The son forks and creates the grandson with PID `c`
- The father passes away, leaving `a` available to be reused by the operating system
- The grandson forks and can expect to obtain PID `a` for the great-grandson
- If the father creates and initializes an `Once`, and this `Once` remains untouched by the son and the grandson, when
the great-grandson first uses it, it cannot distinguish whether this `Once` should be wiped or not.
Although this is extremely niche, as most consumer memory is unlikely capable to have `pid_max` processes, we choose to
go with a more resilient approach.
We introduce a notion of **generations**. When a wipe-on-fork object is being initialized for the very first time, it would
be the first generation (i.e., with generation ID `0`). This generation ID is stored in a global variable.
```rust
pub struct GenerationCounter {
pub(crate) gen: Mutex