Crates.io | dir-cache |
lib.rs | dir-cache |
version | 0.1.0 |
source | src |
created_at | 2024-02-17 10:33:47.657861 |
updated_at | 2024-02-17 10:33:47.657861 |
description | Directory based kv-store |
homepage | https://github.com/MarcusGrass/dir-cache |
repository | https://github.com/MarcusGrass/dir-cache |
max_upload_size | |
id | 1143155 |
size | 92,176 |
dir-cache
- A very low-effort directory cacheA map-interface which propagates writes to disk in a somewhat browsable format.
Designed to be simple to use and understand, not particularly effective.
Bad.
Okay if you write lazily to disk. Since (potentially depending on options) each map operation corresponds to at least one disk-operation, the map is ill-suited to high-frequency operations.
Since a specific value will net a disk-representation that is strictly larger than the raw content size.
Meaning, the dir-cache
is space-inefficient.
Lastly, file-system specifics may make many keys perform poorly (I'm looking at you NTFS).
You have some toy-project that's pinging an external API and you want to easily just cache responses to certain requests to reduce load on your counterparts servers, or similar.
If you want an embedded KV-store in Rust
, consider Sled.
If you want an embedded KV-store not in Rust
, consider RocksDB (github),
RocksDB(docs.rs).
If you want an embedded SQL-store, not in Rust
, consider Sqlite(website),
Sync rust crate(Rusqlite), Async rust crate (Sqlx).
Now that the above is out of the way, we can get into why this crate exists.
I have in my time exploring public APIs using Rust
encountered the same problem many times:
I am exploring an API, taking apart the response and analysing it to inform my further code, but I don't want to make a new web request each iteration out of both latency, and respect for my counterpart.
What I generally have done in these cases is saving the responses to disk and done offline analysis on them.
This works, but it's cumbersome, it's handling the same old std::fs::...
errors, figuring out a fitting directory
structure, and worst of all, writing two separate parts of code, fetch and analyze.
I want to write this:
fn iterate_on_api_response_handling(dir_cache: &mut Cache) {
// This is preferably not dynamic
let req_key = Path::new("examplerequest");
// If this has run before then don't send an http request
let resp = dir_cache.get_or_insert_with(req_key, || {
let resp = http::client::get("https://example.com")?;
Ok(resp)
});
// Mess around with resp here
println!("Got length {}", resp.len());
}
With the above, both the fetching and analyzing code can be kept in the same place.
The feature set is kept fairly minimal to support the above use case.
There are get
, get_or_insert_with
, insert
, and remove
methods on the DirCache
.
The values are written to disk at cache-location/{key}/
, which makes it easy to check out the saved
file, which in my cases are most-often json
.
Since values may become stale, depending on how long the iterating takes, a max age can be set by duration,
after which the value will be treated as non-existent. Meaning, running the same get_or_insert_with
will
the first time fetch data, each time up until the max age has passed, return the cached data, and after the
max age has passed fetch new data.
Overwriting the same key can optionally shuffle the older key down one generation, leaving it on disk.
Useful in some cases where response changes over time, and you wish to keep a history.
Although it's definitely the least useful feature.
I found some use for this when working with an incredibly sparse json
dataset where responses were pretty huge,
with the feature lz4
lz4
-compression can be picked for old generations.
There is one caveat apart from performance that bears consideration.
Keys are PathBufs
and joined with the dir-cache
base directory. This opens up a can of worms,
the worst of which is accidentally joining with an abs-path, see the docs on Path::join.
This could potentially lead to destructive results.
There are a few mitigations:
OsStr
length (Mitigating unexpected effective paths).dir-cache-generation-{manifest.txt | n}
. (Reducing risk of accidental overwrites of important files).This covers all the cases that I can think of, but of course, doesn't cover the cases that I fail to think of.
If using this library, a recommendation is to not use dynamic keys.
Fuzzing is done on Linux
only, so extra danger if using dynamic keys on other Oses, although it's not safe
on Linux
just because it's fuzzed.
The project is licensed under MPL-2.0.