meshanina

Crates.iomeshanina
lib.rsmeshanina
version0.4.2
sourcesrc
created_at2022-01-07 01:53:02.666656
updated_at2022-11-16 16:57:48.717799
descriptionContent-addressed, log-structured memory-mapped database
homepage
repositoryhttps://github.com/themeliolabs/meshanina
max_upload_size
id509236
size79,464
Eric Tung (nullchinchilla)

documentation

README

Meshanina --- specialized, WORM, content-addressed database

Meshanina is a rather strange key-value database, with three assumptions:

  • Once written, a key-value mapping will never be deleted
  • Some function H maps every value to every key: H(v) = k. That is, the same key will never be rebound to a different value.

Meshanina is designed for use as a content-addressed datastore, where keys are typically hashes of values and deletion is ill-defined. It is a purely log-structured, content-addressed database file where we interleave data blocks with 64-ary HAMT nodes. When we insert keys, we just insert gobs of data, then when we flush we make sure metadata is pushed out too.

In-memory, we keep track of an Arc-linked bunch of new nodes before they get flushed out. Everything is managed in a "purely functional" way.

On-disk format

  • 4 KiB: reserved region
    • starting with 10 bytes: meshanina2
    • then 16 bytes more of a random, unique, database-specific 128-bit divider
  • indefinite number of records:
    • (possibly padding to the some nice boundary)
    • 16 bytes: magic divider stored in the reserved region
    • 8 bytes: SipHash 1-3 checksum of the record contents
    • 4 bytes: what kind of record, little endian
      • 0x00000000: data
      • 0x00000001: HAMT interior node
      • 0x00000002: HAMT root node
    • 4 bytes: length of the record
    • n bytes: the content of the record
      • for HAMT nodes, this is:
        • 8 bytes: 64-bit little-endian bitmap
        • n*8 bytes: 64-bit pointers to absolute offsets
      • for data nodes, this is:
        • 32 bytes: key
        • n bytes: value

Recovery

On DB open, there is a recovery mechanism. We search backwards, from the end of the file, for instances of the magic divider, then try to decode a record at each instance. When we find the first validly encoded HAMT root, we stop. We then scan forwards, reinserting every valid data block after this root into the database.

Assuming that there are no "gaps" in correctly written blocks --- that is, if there's a record that's correctly written, every record before it must be so too --- this defends against arbitrary crashes and power interruptions. Essentially all Unix filesystems do guarantee that interrupted file appends cannot disturb existing data in the file.

Lookup and insertion

Starting from the latest HAMT root node, do the usual HAMT lookup/insertion, using the 256-bit key value 6 bits at a time.

Commit count: 135

cargo fmt