| Crates.io | stringleton |
| lib.rs | stringleton |
| version | 0.2.0 |
| created_at | 2025-03-27 08:26:50.564988+00 |
| updated_at | 2025-03-28 07:43:01.168798+00 |
| description | Extremely fast string interning library |
| homepage | https://docs.rs/stringleton/latest/stringleton |
| repository | https://github.com/simonask/stringleton |
| max_upload_size | |
| id | 1607734 |
| size | 34,017 |
Extremely efficient string interning solution for Rust crates.
String interning: The technique of representing all strings which are equal by a pointer or ID that is unique to the contents of that strings, such that O(n) string equality check becomes a O(1) pointer equality check.
Interned strings in Stringleton are called "symbols", in the tradition of Ruby.
Symbol is a lock-free
memory load. No reference counting or atomics involved.sym!(...)) are "free" at the call-site. Multiple
invocations with the same string value are eagerly reconciled on program
startup using linker tricks.&str, it
is capable of displaying Symbol.stringleton-dylib).no_std support: std synchronization primitives used in the symbol registry
can be replaced with once_cell and spin. See below for caveats.serde support - symbols are serialized/deserialized as strings.smol_str or cowstr is a better fit for such use cases.Add stringleton as a dependency of your project, and then you can do:
use stringleton::{sym, Symbol};
// Enable the `sym!()` macro in the current crate. This should go at the crate root.
stringleton::enable!();
let foo = sym!(foo);
let foo2 = sym!(foo);
let bar = sym!(bar);
let message = sym!("Hello, World!");
let message2 = sym!("Hello, World!");
assert_eq!(foo, foo2);
assert_eq!(bar.as_str(), "bar");
assert_eq!(message, message2);
assert_eq!(message.as_str().as_ptr(), message2.as_str().as_ptr());
alloc. When disabled, critical-section and
spin must both be enabled (see below for caveats).String.serde::Serialize and serde::Deserialize for symbols,
which will be serialized/deserialized as plain strings.std is not enabled, this enables once_cell as a
dependency with the critical-section feature enabled. Only relevant in
no_std environments. See critical-section for more
details.std is not enabled, this enables spin as a dependency,
which is used to obtain global read/write locks on the symbol registry. Only
relevant in no_std environments (and is a pessimization in other
environments).Stringleton tries to be as efficient as possible, but it may make different
tradeoffs than other string interning libraries. In particular, Stringleton is
optimized towards making the use of the sym!(...) macro practically free.
Consider this function:
fn get_symbol() -> Symbol {
sym!("Hello, World!")
}
This compiles into a single load instruction. Using cargo disasm on x86-64
(Linux):
get_symbol:
8bf0 mov rax, qword ptr [rip + 0x52471]
8bf7 ret
This is "as fast as it gets", but the price is that all symbols in the program are deduplicated when the program starts. Any theoretically faster solution would need fairly deep cooperation from the compiler aimed at this specific use case.
Also, symbol literals are always a memory load. The compiler cannot perform
optimizations based on the contents of symbols, because it doesn't know how they
will be reconciled until link time. For example, while sym!(a) != sym!(a) is
always false, the compiler cannot eliminate code paths relying on that.
Stringleton relies on magical linker tricks (supported by linkme and ctor)
to minimize the cost of the sym!(...) macro at runtime. These tricks are
broadly compatible with dynamic libraries, but there are a few caveats:
dylib crate appears in the dependency graph, and it has
stringleton as a dependency, things should "just work", due to Rust's
linkage rules.cdylib crate appears in the dependency graph, Cargo seems to be
a little less clever, and the cdylib dependency may need to use the
stringleton-dylib crate instead. Due to Rust's linkage rules, this will
cause the "host" crate to also link dynamically with Stringleton, and
everything will continue to work.stringleton, because it would either cause duplicate symbol
definitions, or worse, the host and client binaries would disagree about
which Registry to use. To avoid this, the host binary can use
stringleton-dylib explicitly instead of stringleton, which forces dynamic
linkage of the symbol registry.dlclose() and
similar). Unloading a library that has any calls to the sym!(..) or
static_sym!(..) macros is instant UB. Such a library can in principle use
Symbol::new(), but probably not Symbol::new_static().To summarize:
stringleton directly.crate-type = ["dylib"]) are
present, it is also fine to use stringleton directly - Cargo and rustc will
figure out how to link things correctly.cdylib dependencies should use stringleton-dylib. The host can use
stringleton.stringleton-dylib instead of stringleton.no_std caveatsStringleton works in no_std environments, but it does fundamentally require
two things:
hashbrown hash map.The latter can be supported by the spin and critical-section features:
spin replaces std::sync::RwLock, and is almost always a worse choice when
std is available.critical-section replaces std::sync::OnceLock with
once_cell::sync::OnceCell,
and enables the critical-secion feature of once_cell. Using
critical-section requires additional work, because you must manually link in
a crate that provides the relevant synchronization primitive for the target
platform.Do not use these features unless you are familiar with the tradeoffs.
stringleton works in WASM binaries, but since the wasm32-unknown-unknown
does not support static constructors, the sym!(..) macro will fall back to a
slightly slower implementation that uses atomics and a single branch. (Note that
WASM is normally single-threaded, so atomic operations have no overhead.)
Please note that it is not possible to pass a Symbol across a WASM boundary,
because the host and the guest have different views of memory, and use separate
registries. However, it is possible to pass an opaque u64 representing the
symbol across such a boundary using Symbol::to_ffi() and
Symbol::try_from_ffi(). Getting the string representation of the symbol is
only possible on the side that owns the symbol.
The name is a portmanteau of "string" and "singleton".