matdb

Crates.io	matdb
lib.rs	matdb
version	0.1.0
created_at	2025-07-19 02:10:56.132817+00
updated_at	2025-07-19 02:10:56.132817+00
description	An experimental embedded SQL-like DBMS
homepage
repository	https://gitlab.com/vizdun/matdb
max_upload_size
id	1759875
size	231,761

(cresynthesis)

documentation

README

matdb

matdb is an experimental embedded SQL-like DBMS. The primary source of information on it is my matura thesis, this readme provides a technical overview.

All code is MIT-licensed, and is pure Rust, building is as simple as:

$ cargo build -r
$ ./target/release/matdb sample.db
matdb> CREATE TABLE ...;
matdb> INSERT INTO ...;
matdb> SELECT ...;

Features

Core SQL support: SELECT, UPDATE, DELETE, INSERT

SELECT a, y.b, z.c FROM x,y,z WHERE a=y.b AND y.b = z.c;
UPDATE x SET b = 3 WHERE a = 5;
DELETE FROM x WHERE a = 4;
INSERT INTO x (a, b) VALUES (3, 5);

Graph expressions - support for graph-style joins and navigation

SELECT f2.id, author->fiction AS f2(f2.author)->f2.title FROM fiction WHERE title = 'The Little Prince';

ACID compliance - fully transactional storage engine
System catalogue - introspection via internal tables
E-graph based query optimization - advanced rewrite and cost-based planning
Time-traveling queries - access to full historical data versions

DROP TABLE x;
SELECT a FROM x VERSION 10;

Code structure

src/lib.rs - API definition and certain private convenience functions used across the codebase
src/program.rs - main loop for queries
src/parser/ - parsing and lexing
- src/ast.rs - defines the AST
src/plan/ - query optimization and compilation
src/executor/ - query execution
- src/eval.rs - scalar expression evaluation
- src/value.rs - scalar value definition
src/kv/ - underlying storage engine
- src/encoding.rs - row level encoding
src/catalog.rs - system catalogue definitions

Testing

Right now, testing is quite scarce. tests/hello.rs defines a pulse check test, tests/delete.rs tries to test the correctness of the B+ Tree implementation.

When compiled in debug mode, matdb checks if the B+ Tree is malformed and whether memory was leaked (specifically, if there are inaccessible pages in the database file that haven't been marked as free) after each write operation (see src/kv/tx.rs and src/kv/write_op.rs). These tests can be run like this:

$ cargo test

Unfortunately, for a large part of the codebase, making meaningful tests is really difficult.

Architecture

Queries in matdb follow the typical pipeline of Parser → Compiler → Optimizer → Executor. The Optimizer is currently based on the egg e-graph crate.

Concurrency control is currently unimplemented, but due to the design of the storage engine, a basic implementation is trivial.

The DBMS tracks information about the database in internal tables:

matdb_tables - current page ids of the roots of table trees, includes indexes,

Schema:

CREATE READONLY TABLE matdb_tables (
    name TEXT PRIMARY KEY,
    tree_id INT NOT NULL
);

Example:

matdb> SELECT name, tree_id FROM matdb_tables;
matdb_tables.name,matdb_tables.tree_id
actors,7
matdb_revs,10
matdb_schemas,9
matdb_tables,4

matdb_schemas - tables schemas,

Schema:

CREATE READONLY TABLE matdb_schemas (
    name TEXT PRIMARY KEY,
    schema TEXT NOT NULL
);

Example:

matdb> SELECT name, schema FROM matdb_schemas;
matdb_schemas.name,matdb_schemas.schema
actors,"CREATE TABLE actors (id INT,name TEXT,gender TEXT,PRIMARY KEY(actors.id),REFERENCED BY());"
matdb_revs,"CREATE READONLY TABLE matdb_revs (tx_id INT NOT NULL,page_number INT NOT NULL,PRIMARY KEY(matdb_revs.tx_id),REFERENCED BY());"
matdb_schemas,"CREATE READONLY TABLE matdb_schemas (name TEXT NOT NULL,schema TEXT NOT NULL,PRIMARY KEY(matdb_schemas.name),REFERENCED BY());"
matdb_tables,"CREATE READONLY TABLE matdb_tables (name TEXT NOT NULL,tree_id INT NOT NULL,PRIMARY KEY(matdb_tables.name),REFERENCED BY());"

matdb_revs - tracking page ids of the footer pages of previous database revisions.

Example (this database has only two revisions):

matdb> SELECT tx_id, page_number FROM matdb_revs;
matdb_revs.tx_id,matdb_revs.page_number
1,3

Schema:

CREATE READONLY TABLE matdb_revs (
    tx_id INT PRIMARY KEY,
    page_number INT NOT NULL
);

Storage engine architecture

The storage engine is based on a single-file copy-on-write immutable B+ Tree architecture. There is no write-ahead log, nor a rollback log. Pages are always appended (or they overwrite free pages), once committed, they are never freed.

For recovery, the storage engine utilizes a "footer page." This page is always appended to the end of the database file, if while recovering, the engine doesn't find the footer page at the end of the file, it simply searches for it backwards through the file and truncates all pages after it.

Limitations

Unfortunately, right now matdb performs rather unfavourably compared to mainstream DBMSs like SQLite. This is due to several issues including:

the lack of a write-ahead log and the inherent overhead of a copy-on-write B+ Tree compared to write-ahead logging,
the overhead of keeping history,
matdb's inferior B+ Tree balancing algorithm compared to SQLite,
at times plain inefficient code, many things that are written as zero-copy in SQLite copy and heap allocate like crazy in matdb,
lack of optimized bulk insert.

License

This project is licensed under the MIT License.

Commit count: 0

matdb

documentation

README

matdb

Features

Code structure

Testing

Architecture

Storage engine architecture

Limitations

License

cargo fmt