Crates.io | interned-string |
lib.rs | interned-string |
version | 0.3.0 |
source | src |
created_at | 2024-07-04 15:02:21.097169 |
updated_at | 2024-07-06 10:55:04.043858 |
description | High-performance and concurrent string interning for general purpose use. |
homepage | |
repository | https://github.com/drash-course/interned-string |
max_upload_size | |
id | 1291758 |
size | 37,616 |
This crate exposes IString
, an
immutable and interned string type.
It's built for high performance and with multithreading in mind.
It provides O(1) Hash
and Eq
operations, perfect for your HashMap<IString, _>
.
You can intern any String
or &str
value by calling intern()
.
You can pass an IString
by reference to any function that accepts a &str
.
use interned_string::Intern;
fn main() {
let my_istring = "hello".intern();
foo(&my_istring);
}
fn foo(string: &str) {
println!("{string}");
}
If you enable the serde
feature, you can use IString
in place of String
in your DTOs.
[dependencies]
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
interned-string = { version = "0.1", features = ["serde"] }
use serde::Deserialize;
use interned_string::IString;
#[derive(Deserialize, Debug)]
struct ExampleDTO {
favorite_dish: IString
}
fn main() {
let serialized = "{\"favorite_dish\":\"pasta\"}";
let example: ExampleDTO = serde_json::from_str(&serialized).unwrap();
println!("{example:?}")
// ExampleDTO { favorite_dish: IString("pasta") }
}
Reading an IString
's contents is very fast, lock free and wait free (thanks to left_right
).
The IString
can be shared and read from any number of threads.
It scales linearly with the number of reading threads.
The tradeoff is that creating a new IString
is slower.
A radix tree (compact trie) needs to be traversed to deduplicate the new string,
a lock needs to be acquired, and the tree needs to be updated if the string wasn't interned yet.
While the tree walk can be done in parallel from multiple threads, the lock prevents linear
scaling for writes.
Replace or rewrite the radix tree to make it reuse the string storage, instead of storing a clone of the each interned string. Currently the crate uses 2x the interned string storage space because of this (1x in storage, 1x as a clone in the radix tree).
Free the memory of unused strings in the background / in a cold path. Currenly the memory is only free'd the next time a new string is inserted in storage, which may be unexpected.
Feel free to open a PR. Any contribution is greatly appreciated.
If you have a suggestion, please open an Issue.
Special thanks to Jon Gjengset (@jonhoo) for providing the inspiration for this crate, and for his
work on left-right
.