Crates.io | polonius-the-crab |
lib.rs | polonius-the-crab |
version | 0.4.2 |
source | src |
created_at | 2022-04-28 15:48:20.476025 |
updated_at | 2024-11-08 12:06:49.001928 |
description | Tools to feature more lenient Polonius-based borrow-checker patterns in stable Rust. |
homepage | |
repository | https://github.com/danielhenrymantilla/polonius-the-crab.rs |
max_upload_size | |
id | 576864 |
size | 80,361 |
Hamlet:
For yourself, sir, shall grow old as I am – if, like a crab, you could go backward.
Polonius:
Though this be madness, yet there is method in 't.
Polonius, eventually:
::polonius-the-crab
Tools to feature more lenient Polonius-based borrow-checker patterns in stable Rust.
See the following issues:
#54663 – Borrow checker extends borrow range in code with early return
#92985 – Filter adapter for LendingIterator requires Polonius (this one marks bonus
points for involving GATs and the pervasive LendingIterator
example).
All these examples boil down to the following canonical instance:
#![forbid(unsafe_code)]
use ::std::{
collections::HashMap,
};
/// Typical example of lack-of-Polonius limitation: get_or_insert pattern.
/// See https://nikomatsakis.github.io/rust-belt-rust-2019/#72
fn get_or_insert (
map: &'_ mut HashMap<u32, String>,
) -> &'_ String
{
if let Some(v) = map.get(&22) {
return v;
}
map.insert(22, String::from("hi"));
&map[&22]
}
# /*
error[E0502]: cannot borrow `*map` as mutable because it is also borrowed as immutable
--> src/lib.rs:53:5
|
14 | map: &mut HashMap<u32, String>,
| - let's call the lifetime of this reference `'1`
15 | ) -> &String {
16 | if let Some(v) = map.get(&22) {
| --- immutable borrow occurs here
17 | return v;
| - returning this value requires that `*map` be borrowed for `'1`
18 | }
19 | map.insert(22, String::from("hi"));
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ mutable borrow occurs here
# */
Now, this pattern is known to be sound / a false positive from the current borrow checker, NLL.
The technical reason behind it is the named / in-function-signature lifetime involved in the borrow: contrary to a fully-in-body anonymous borrow, borrows that last for a "named" / outer-generic lifetime are deemed to last until the end of the function, across all possible codepaths (even those unreachable whence the borrow starts).
So "jUsT uSe UnSaFe" you may suggest. But this is tricky:
does your use-case really fit this canonical example?
even when we know "we can use unsafe
", actually using it is subtle and
error-prone. Since &mut
borrows are often involved in this situation,
one may accidentally end up transmuting a &
reference to a &mut
reference, which is always UB.
both of these issues lead to a certain completely legitimate allergy to
unsafe
_code, and the very reassuring
#![forbid(unsafe_code)]
-at-the-root-of-the-crate pattern.
unsafe
albeit cumbersome workarounds for lack-of-Polonius issuesif possible, reach for a dedicated API.
For instance, the get_or_insert()
example can be featured using the
.entry()
API:
#![forbid(unsafe_code)]
use ::std::{
collections::HashMap,
};
fn get_or_insert (
map: &'_ mut HashMap<u32, String>,
) -> &'_ String
{
map.entry(22).or_insert_with(|| String::from("hi"))
}
Sadly, the reality is that you won't always have such convenient APIs at your disposal.
otherwise, you can perform successive non-idiomatic lookups to avoid holding the borrow for too long:
#![forbid(unsafe_code)]
use ::std::{
collections::HashMap,
};
fn get_or_insert (
map: &'_ mut HashMap<u32, String>,
) -> &'_ String
{
// written like this to show the "transition path" from previous code
let should_insert =
if let Some(_discarded) = map.get(&22) {
false
} else {
true
}
;
// but `should_insert` can obviously be shortened down to `map.get(&22).is_none()`
// or, in this very instance, to `map.contains_key(&22).not()`.
if should_insert {
map.insert(22, String::from("hi"));
}
map.get(&22).unwrap() // or `&map[&22]`
}
finally, related to the "this only happens with concrete named lifetimes"
issue, a clever non-unsafe
albeit cumbersome way to circumvent the
limitation is to use CPS / callbacks / a scoped API:
#![forbid(unsafe_code)]
use ::std::{
collections::HashMap,
};
fn with_get_or_insert<R> (
map: &'_ mut HashMap<u32, String>,
yield_: impl FnOnce(
/* -> */ &'_ String
) -> R ) -> R
{
if let Some(v) = map.get(&22) {
yield_(v)
} else {
map.insert(22, String::from("hi"));
yield_(&map[&22])
}
}
While you should try these workarounds first and see how they apply to your
codebase, sometimes they're not applicable or way too cumbersome compared to
"a tiny bit of unsafe
".
In that case, as with all the cases of known-to-be-sound unsafe
patterns, the
ideal solution is to factor it out down to its own small and easy to review
crate or module, and then use the non-unsafe fn
API thereby exposed 👌.
::polonius-the-crab
So, back to that "safety encapsulation" idea:
let's find a canonical instance of this borrow checker issue that is known to be sound and accepted under Polonius;
and tweak it so that it can then be re-used as a general-purpose tool for most of these issues.
And if we stare at the borrow checker issues above, we can see there are two defining ingredients:
The issue is then that this second branch ought to get back access to the stuff borrowed in the first branch, but the current borrow checker denies it.
That's where we'll sprinkle some correctly-placed unsafe
to make the "borrow
checker look the other way" just for a moment, the right moment.
This thus gives us (in pseudo-code first):
fn polonius<'r, T> (
borrow: &'r mut T,
branch:
impl // generic type to apply to all possible scopes.
for<'any> // <- higher-order lifetime ensures the `&mut T` infected with it…
FnOnce(&'any mut T) // …can only escape the closure…
// vvvv … through its return type and its return type only.
-> Option< _<'any> > // <- The `Some` / `None` discriminant represents the branch info.
// ^^^^^^^
// some return type allowed to depend on `'any`.
// For instance, in the case of `get_or_insert`, this could
// have been `&'any String` (or `Option<&'any String>`).
// Bear with me for the moment and tolerate this pseudo-code.
,
) -> Result< // <- we "forward the branch", but with data attached to the fallback one (`Err(…)`).
_<'r>, // <- "plot twist": `'any` above was `'r` !
&'r mut T, // <- through Arcane Magic™ we get to transmute the `None` into an `Err(borrow)`
>
{
let tentative_borrow = &mut *borrow; // reborrow
if let Some(dependent) = branch(tentative_borrow) {
/* within this branch, the reborrow needs to last for `'r` */
return Ok(dependent);
}
/* but within this branch, the reborrow needs to have ended: only Polonius supports that kind of logic */
// give the borrow back
Err(borrow) // <- without Polonius this is denied
}
This function, ignoring that generic unspecified _<'…>
return type in
pseudo-code, does indeed represent a canonical example of the borrow checker
issue (without -Zpolonius
, it will reject the Err(borrow)
line saying that
borrow
needs to be borrowed for 'r
so that dependent
is, and that 'r
spans until any end of function (the borrow checker bug).
Whereas with -Zpolonius
it is accepted.
The correct use of unsafe
, here, to palliate the lack of -Zpolonius
, is to
change:
let tentative_borrow = &mut *borrow; // reborrow
into:
let tentative_borrow = unsafe { &mut *(borrow as *mut _) }; // reborrow
where unsafe { &mut *(thing as *mut _) }
is the canonical way to perform
lifetime(-of-the-borrow) extension: the lifetime of that &mut
borrow
is then no longer tied, in any way, to 'r
nor to *borrow
.
mem::transmute
. While that does
indeed work, it is a strictly more flexible API, which in the case of
unsafe
, means it's a strictly more dangerous API. With transmute
, for
instance, when the borrowee has lifetime parameters of its own, those may
be erased as well, whereas a downgrade-to-pointer-and-upgrade-back-to-ref
operation is guaranteed to "erase" only the outer lifetime of the borrow,
leaving the inner type untouched: definitely safer.The borrow checker no longer holds our hand, as far as overlapped usage of
borrow
and tentative_borrow
is concerned (which would be UB). It is now
up to us to ensure no runtime path can ever lead to such borrows
overlapping.
And indeed they don't, as the simple branch showcases:
in the Some
branch,
the dependent
is still borrowing tentative_borrow
, and thus, *borrow
. But
we do not use borrow
anymore in that branch, nor in the caller's body, as
long as dependent is used. Indeed, signature-wise, we do tell that that
dependent
return value, of type _<'r>
, is borrowing from *borrow
, due to
that repetition of the 'r
name.
in the None
branch,
there is no dependent
, and tentative_borrow
isn't used anymore, so it is
sound to refer to borrow
again.
In other words:
Though this be
unsafe
, yet there is soundness in 't.
As an extra precaution, this crate does even guard that usage of unsafe
through a cfg
-opt-out, so that when using -Zpolonius
, the unsafe
is
removed, and yet the body of the function, as well as its signature, compiles
fine (this is further enforced in CI through a special test
).
Option<T<'_>>
becomes PoloniusResult<T<'_>, U>
It turns out that we don't have to restrict the branch
to returning no data on
None
, and that we can use it as a "channel" through which to pass
non-borrowing data.
This leads to replacing Option< T<'any> >
with PoloniusResult< T<'any>, U >
U
cannot depend on 'any
since it can't name it
(generic parameter introduced before the 'any
quantification ever gets
introduced).FnOnceReturningAnOption
trick is replaced with a ForLt
patternFnOnceReturningAnOption
is the helper trait used in the Demo
snippet above)Indeed, a FnOnceReturningAnOption
-based signature would be problematic on the
caller's side, since:
it infers the higher-order-'any
-infected return type of the closure
through the actual closure instance being fed;
but a closure only gets to be higher-order when the API it is fed to explicitly requires it to
So this leads to a situation where both the caller and callee expect each other to disambiguate what the higher-order return value of the closure should be, leading to no higher-orderness to begin with and/or to type inference errors.
hrtb!
macro from https://docs.rs/higher-order-closure, or
the actual for<…>
-closures RFC such crate polyfills, would help in that
regard. But the usage then becomes, imho, way more convoluted than any of
the aforementioned workarounds, defeating the very purpose of this crate.So that Ret<'any>
is achieved in another manner.
Through "higher kinded types", that is, through "generic generics" /
"generics which are, themselves, generic":
//! In pseudo-code:
fn polonius<'r, T, Ret : <'_>> (
borrow: &'r mut T,
branch: impl FnOnce(&'_ mut T) -> PoloniusResult<Ret<'_>, ()>,
) -> PoloniusResult<
Ret<'r>,
(), &'r mut T,
>
This cannot directly be written in Rust, but you can define a trait representing
the <'_>
-ness of a type (ForLt
in this crate), and with it (R: ForLt
), use
R::Of<'lt>
as the "feed <'lt>
" operator:
// Real code!
use ::polonius_the_crab::{ForLt, PoloniusResult};
fn polonius<'input, T, Ret : ForLt> (
borrow: &'input mut T,
branch: impl for<'any> FnOnce(&'any mut T) -> PoloniusResult< Ret::Of<'any>, ()>,
) -> PoloniusResult<
Ret::Of<'input>,
(), &'input mut T,
>
# { unimplemented!(); }
We have reached the definition of the actual fn polonius
exposed by this very
crate!
Now, a ForLt
type is still cumbersome to use. If we go back to that
get_or_insert
example that was returning a &'_ String
, we'd need to express
this "generic type" representing <'lt> => &'lt String
, such as:
# use ::polonius_the_crab::ForLt;
#
/// Invalid code for our API:
/// It is not `StringRefNaïve` which is a type, but `StringRefNaïve<'smth>`
/// (notice the mandatory "early fed" generic lifetime parameter).
type StringRefNaïve<'any> = &'any String;
/// Correct code: make `StringRef` a fully-fledged stand-alone type!
type StringRef = ForLt!(<'any> = &'any String);
// Note: the macro supports lifetime elision, so as to be able to instead write:
type StringRef2 = ForLt!(&String); // Same type as `StringRef`!
get_or_insert
with no .entry()
nor double-lookupThis crate exposes a "raw" polonius()
function which has the unsafe
in
its body, and which is quite powerful at tackling these lack-of-polonius related
issues.
use ::polonius_the_crab::{polonius, ForLt, Placeholder, PoloniusResult};
#[forbid(unsafe_code)] // No unsafe code in this function: VICTORY!!
fn get_or_insert (
map: &'_ mut ::std::collections::HashMap<i32, String>,
) -> &'_ String
{
// Our `BorrowingOutput` type. In this case, `&String`:
type StringRef = ForLt!(&String);
match polonius::<_, _, StringRef>(map, |map| match map.get(&22) {
| Some(ret) => PoloniusResult::Borrowing(ret),
| None => PoloniusResult::Owned {
value: 42,
// We cannot name `map` in this branch since `ret` borrows it in the
// other (the very lack-of-polonius problem).
input_borrow: /* map */ Placeholder,
},
}) { // 🎩🪄 `polonius-the-crab` magic 🎩🪄
| PoloniusResult::Owned {
value,
// we got the borrow back in the `Placeholder`'s stead!
input_borrow: map,
} => {
assert_eq!(value, 42);
map.insert(22, String::from("…"));
&map[&22]
},
// and yet we did not lose our access to `ret` 🙌
| PoloniusResult::Borrowing(ret) => {
ret
},
}
}
We'll have to admit this is quite cumbersome to use! 😵💫
Hence why this crate also offers:
Mainly, the polonius!
entry point, within which you can use polonius_return!
to early return the dependent value, or exit_polonius!
to instead
"break" / leave the polonius!
block with a non-dependent value (notice how
the branch nature of this borrow checker limitation is kept in the very bones
of the API).
polonius!
macro requires that a 'polonius
-infected return type be
used —the HKT marker (for<'polonius>
), for those having followed the
implementation.This leads to the following get_or_insert
usage:
#![forbid(unsafe_code)]
use ::polonius_the_crab::prelude::*;
use ::std::collections::HashMap;
/// Typical example of lack-of-Polonius limitation: get_or_insert pattern.
/// See https://nikomatsakis.github.io/rust-belt-rust-2019/#72
fn get_or_insert(
mut map: &mut HashMap<u32, String>,
) -> &String {
// Who needs the entry API?
polonius!(|map| -> &'polonius String {
if let Some(v) = map.get(&22) {
polonius_return!(v);
}
});
map.insert(22, String::from("hi"));
&map[&22]
}