Crates.io | dropping-thread-local |
lib.rs | dropping-thread-local |
version | 0.1.4 |
created_at | 2025-07-07 06:49:30.678542+00 |
updated_at | 2025-08-14 21:45:31.279237+00 |
description | A dynamically allocated ThreadLocal that ensures destructors are run on thread exit |
homepage | |
repository | https://github.com/Techcable/dropping-thread-local.rs |
max_upload_size | |
id | 1740816 |
size | 70,300 |
Dynamically allocated thread locals that properly run destructors when a thread is destroyed.
This is in contrast to the thread_local
crate, which has similar functionality,
but only runs destructors when the ThreadLocal
object is dropped.
This crate guarantees that one thread will never see the thread-local data of another,
which can happen in the thread_local
crate due to internal storage reuse.
This crate attempts to implement "true" thread locals,
mirroring std::thread_local!
as closely as possible.
I would say the thread_local
crate is good for functionality like reusing allocations
or for having local caches that can be sensibly reused once a thread dies.
This crate will attempt to run destructors as promptly as possible,
but taking snapshots may interfere with this (see below).
Panics in thread destructors will cause aborts, just like they do with std::thread_local!
.
Right now, this crate has no unsafe code. This may change if it can bring a significant performance improvement.
The most complicated feature of this library is snapshots.
It allows anyone who has access to a DroppingThreadLocal
to iterate over all currently live
values using the DroppingThreadLocal::snapshot_iter
method.
This will return a snapshot of the live values at the time the method is called, although if a thread dies during iteration, it may not show up. See the method documentation for more details.
Benchmarks show that lookup is 10x and 30x slower than the thread_local
crate, which is in turn about 2x slower than std::thread_local!
.
Keep in mind that using a std::thread_local
is a very fast operation.
It takes about 0.5 nanoseconds on both my M1 Mac and a Intel i5 from 2017.
For reference, calling Arc::clone
takes about 11 ns on both machines.
See performance.md
in the repository root for benchmarks results and more detailed
performance notes.
The implementation needs to acquire a global lock to initialize/deinitialize threads and create new locals.
Accessing thread-local data is also protected by a per-thread lock.
This lock should be uncontended, and parking_lot::Mutex
should make this relatively fast.
I have been careful to make sure that locks are not held while user code is being executed.
This includes releasing locks before any destructors are executed.
The type that is stored must be Send + Sync + 'static
.
The Send
bound is necessary because the DroppingThreadLocal
may be dropped from any thread.
The Sync
bound is necessary to support snapshots,
and the 'static
bound is due to internal implementation chooses (use of safe code).
A Mutex can be used to work around the Sync
limitation.
(I recommend parking_lot::Mutex
, which is optimized for uncontented locks)
You can attempt to use the fragile
crate to work around the Send
limitation,
but this will cause panics if the value is dropped from another thead.
Some ways a value can be dropped from another thread if a snapshot keeps the value alive,
or if the DroppingThreadLocal
itself is dropped.
per-thread-object
- I have not investigated this, but it appears very similar. By the maintainer of io_uring
.thread_local
- Discussed in documentation.std::thread::LocalKey
- Part of the stdlib. It is the abstraction upon which this crate is based.
#[thread_local]
attribute may be faster than a LocalKey
Licensed under either the Apache 2.0 License or MIT License at your option.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in this project by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.