matdb

Crates.iomatdb
lib.rsmatdb
version0.1.0
created_at2025-07-19 02:10:56.132817+00
updated_at2025-07-19 02:10:56.132817+00
descriptionAn experimental embedded SQL-like DBMS
homepage
repositoryhttps://gitlab.com/vizdun/matdb
max_upload_size
id1759875
size231,761
(cresynthesis)

documentation

README

matdb

Crates.io Docs.rs CI

matdb is an experimental embedded SQL-like DBMS. The primary source of information on it is my matura thesis, this readme provides a technical overview.

All code is MIT-licensed, and is pure Rust, building is as simple as:

$ cargo build -r
$ ./target/release/matdb sample.db
matdb> CREATE TABLE ...;
matdb> INSERT INTO ...;
matdb> SELECT ...;

Features

  • Core SQL support: SELECT, UPDATE, DELETE, INSERT
SELECT a, y.b, z.c FROM x,y,z WHERE a=y.b AND y.b = z.c;
UPDATE x SET b = 3 WHERE a = 5;
DELETE FROM x WHERE a = 4;
INSERT INTO x (a, b) VALUES (3, 5);
  • Graph expressions - support for graph-style joins and navigation
SELECT f2.id, author->fiction AS f2(f2.author)->f2.title FROM fiction WHERE title = 'The Little Prince';
  • ACID compliance - fully transactional storage engine
  • System catalogue - introspection via internal tables
  • E-graph based query optimization - advanced rewrite and cost-based planning
  • Time-traveling queries - access to full historical data versions
DROP TABLE x;
SELECT a FROM x VERSION 10;

Code structure

  • src/lib.rs - API definition and certain private convenience functions used across the codebase
  • src/program.rs - main loop for queries
  • src/parser/ - parsing and lexing
    • src/ast.rs - defines the AST
  • src/plan/ - query optimization and compilation
  • src/executor/ - query execution
    • src/eval.rs - scalar expression evaluation
    • src/value.rs - scalar value definition
  • src/kv/ - underlying storage engine
    • src/encoding.rs - row level encoding
  • src/catalog.rs - system catalogue definitions

Testing

Right now, testing is quite scarce. tests/hello.rs defines a pulse check test, tests/delete.rs tries to test the correctness of the B+ Tree implementation.

When compiled in debug mode, matdb checks if the B+ Tree is malformed and whether memory was leaked (specifically, if there are inaccessible pages in the database file that haven't been marked as free) after each write operation (see src/kv/tx.rs and src/kv/write_op.rs). These tests can be run like this:

$ cargo test

Unfortunately, for a large part of the codebase, making meaningful tests is really difficult.

Architecture

Queries in matdb follow the typical pipeline of Parser → Compiler → Optimizer → Executor. The Optimizer is currently based on the egg e-graph crate.

Concurrency control is currently unimplemented, but due to the design of the storage engine, a basic implementation is trivial.

The DBMS tracks information about the database in internal tables:

  • matdb_tables - current page ids of the roots of table trees, includes indexes,

Schema:

CREATE READONLY TABLE matdb_tables (
    name TEXT PRIMARY KEY,
    tree_id INT NOT NULL
);

Example:

matdb> SELECT name, tree_id FROM matdb_tables;
matdb_tables.name,matdb_tables.tree_id
actors,7
matdb_revs,10
matdb_schemas,9
matdb_tables,4
  • matdb_schemas - tables schemas,

Schema:

CREATE READONLY TABLE matdb_schemas (
    name TEXT PRIMARY KEY,
    schema TEXT NOT NULL
);

Example:

matdb> SELECT name, schema FROM matdb_schemas;
matdb_schemas.name,matdb_schemas.schema
actors,"CREATE TABLE actors (id INT,name TEXT,gender TEXT,PRIMARY KEY(actors.id),REFERENCED BY());"
matdb_revs,"CREATE READONLY TABLE matdb_revs (tx_id INT NOT NULL,page_number INT NOT NULL,PRIMARY KEY(matdb_revs.tx_id),REFERENCED BY());"
matdb_schemas,"CREATE READONLY TABLE matdb_schemas (name TEXT NOT NULL,schema TEXT NOT NULL,PRIMARY KEY(matdb_schemas.name),REFERENCED BY());"
matdb_tables,"CREATE READONLY TABLE matdb_tables (name TEXT NOT NULL,tree_id INT NOT NULL,PRIMARY KEY(matdb_tables.name),REFERENCED BY());"
  • matdb_revs - tracking page ids of the footer pages of previous database revisions.

Example (this database has only two revisions):

matdb> SELECT tx_id, page_number FROM matdb_revs;
matdb_revs.tx_id,matdb_revs.page_number
1,3

Schema:

CREATE READONLY TABLE matdb_revs (
    tx_id INT PRIMARY KEY,
    page_number INT NOT NULL
);

Storage engine architecture

The storage engine is based on a single-file copy-on-write immutable B+ Tree architecture. There is no write-ahead log, nor a rollback log. Pages are always appended (or they overwrite free pages), once committed, they are never freed.

For recovery, the storage engine utilizes a "footer page." This page is always appended to the end of the database file, if while recovering, the engine doesn't find the footer page at the end of the file, it simply searches for it backwards through the file and truncates all pages after it.

Limitations

Unfortunately, right now matdb performs rather unfavourably compared to mainstream DBMSs like SQLite. This is due to several issues including:

  • the lack of a write-ahead log and the inherent overhead of a copy-on-write B+ Tree compared to write-ahead logging,
  • the overhead of keeping history,
  • matdb's inferior B+ Tree balancing algorithm compared to SQLite,
  • at times plain inefficient code, many things that are written as zero-copy in SQLite copy and heap allocate like crazy in matdb,
  • lack of optimized bulk insert.

License

This project is licensed under the MIT License.

Commit count: 0

cargo fmt