| Crates.io | metrics-procession |
| lib.rs | metrics-procession |
| version | 0.2.0 |
| created_at | 2025-07-23 14:07:16.72846+00 |
| updated_at | 2025-07-24 04:29:34.848339+00 |
| description | A compact in-memory collector for metrics.rs |
| homepage | |
| repository | https://github.com/FreeMasen/metrics-procession |
| max_upload_size | |
| id | 1764822 |
| size | 9,735,022 |
This project is an in-memory metrics recorder aimed at reducing total size of metrics collected while maintaining a millisecond precision time-series.
A time series of data in the raw-est sense is an array of triples, the first element is a
representation of the time this event was emitted, the second element is the event's type and
payload as a pair, finally the last element is a representation of the unique name and label set. If
we were to construct that value, we would en up with an object much larger than is ideal since the
Instant or time::OffsetDateTime types will en up taking at least 16 bytes of memory and metrics
deals with 64 bit types by default, with 2 tags on the largest size meaning we end up with an enum
of 10(8+2) bytes in size and then another 64 bytes for the metrics::Key value that is a total of
90 bytes per entry, which adds up quickly.
So, how small could we make this reasonably? Fist of all, we probably don't need to keep a full time
representation on each metric, if a chunk of events was associated with a reference time, then a 16
bit integer could be used to track the milliseconds since the reference time. That means we can
represent a Chunk as a pair of a OffsetDateTime and a Vec of the triple that contains the metric
type and value, the milliseconds and the unique key+label set. That reducing 16 bytes down to 2
bytes, the remaining bytes being amortized across the number of events in a give chunk.
Next we will want to try and reduce the size of the Key value, for that we can again use a u16
and then amortized the cost of each Key across all chunks currently in the series. For that we use
a BTreeMap<Key, u16> to allow looking up the id value for any give key while recording, this
mapping is owned by the series itself and not any given chunk which means we have a maximum unique
set of labels at 65535 which is reasonable for most systems but may not be suitable for all
systems. It may be valuable to add a filtering metrics layer above the Recorder provided by this
crate to avoid loss of data.
So, we've now knocked another 10 bytes off the storage size of each raw event, meaning we have a total size of 2 + 10 + 2 = 14 bytes, which is a very large amount smaller than where we started.
The Procession type provides multiple representations that can be used to capture the current
state of the recorder that can be deserialized later. The type itself implements Serialize and
Deserialize which will include the map of keys to their ids and a Vec<Chunk> which includes
the reference time along with a Vec<Event> in that section of the series.
There are also 2 ways to iterate through the series, either by cloning the Key's contents or by
borrowing them from the Procession, both representations also implement Serialize but only the
cloned version implements Deserialize since the semantics of the Key storage is a bit more
complicated.
As a warning the
Procession's implementation ofDeseriaizerequires a borrowed string meaning it cannot be used with animpl Readtype (used byserde_json::from_reader)