Crates.io | matdb |
lib.rs | matdb |
version | 0.1.0 |
created_at | 2025-07-19 02:10:56.132817+00 |
updated_at | 2025-07-19 02:10:56.132817+00 |
description | An experimental embedded SQL-like DBMS |
homepage | |
repository | https://gitlab.com/vizdun/matdb |
max_upload_size | |
id | 1759875 |
size | 231,761 |
matdb
is an experimental embedded SQL-like DBMS. The primary source of information on it is my matura thesis, this readme provides a technical overview.
All code is MIT-licensed, and is pure Rust, building is as simple as:
$ cargo build -r
$ ./target/release/matdb sample.db
matdb> CREATE TABLE ...;
matdb> INSERT INTO ...;
matdb> SELECT ...;
SELECT
, UPDATE
, DELETE
, INSERT
SELECT a, y.b, z.c FROM x,y,z WHERE a=y.b AND y.b = z.c;
UPDATE x SET b = 3 WHERE a = 5;
DELETE FROM x WHERE a = 4;
INSERT INTO x (a, b) VALUES (3, 5);
SELECT f2.id, author->fiction AS f2(f2.author)->f2.title FROM fiction WHERE title = 'The Little Prince';
DROP TABLE x;
SELECT a FROM x VERSION 10;
src/lib.rs
- API definition and certain private convenience functions used across the codebasesrc/program.rs
- main loop for queriessrc/parser/
- parsing and lexing
src/ast.rs
- defines the ASTsrc/plan/
- query optimization and compilationsrc/executor/
- query execution
src/eval.rs
- scalar expression evaluationsrc/value.rs
- scalar value definitionsrc/kv/
- underlying storage engine
src/encoding.rs
- row level encodingsrc/catalog.rs
- system catalogue definitionsRight now, testing is quite scarce. tests/hello.rs
defines a pulse check test, tests/delete.rs
tries to test the correctness of the B+ Tree implementation.
When compiled in debug mode, matdb
checks if the B+ Tree is malformed and whether memory was leaked (specifically, if there are inaccessible pages in the database file that haven't been marked as free) after each write operation (see src/kv/tx.rs
and src/kv/write_op.rs
). These tests can be run like this:
$ cargo test
Unfortunately, for a large part of the codebase, making meaningful tests is really difficult.
Queries in matdb
follow the typical pipeline of Parser → Compiler → Optimizer → Executor. The Optimizer is currently based on the egg
e-graph crate.
Concurrency control is currently unimplemented, but due to the design of the storage engine, a basic implementation is trivial.
The DBMS tracks information about the database in internal tables:
matdb_tables
- current page ids of the roots of table trees, includes indexes,Schema:
CREATE READONLY TABLE matdb_tables (
name TEXT PRIMARY KEY,
tree_id INT NOT NULL
);
Example:
matdb> SELECT name, tree_id FROM matdb_tables;
matdb_tables.name,matdb_tables.tree_id
actors,7
matdb_revs,10
matdb_schemas,9
matdb_tables,4
matdb_schemas
- tables schemas,Schema:
CREATE READONLY TABLE matdb_schemas (
name TEXT PRIMARY KEY,
schema TEXT NOT NULL
);
Example:
matdb> SELECT name, schema FROM matdb_schemas;
matdb_schemas.name,matdb_schemas.schema
actors,"CREATE TABLE actors (id INT,name TEXT,gender TEXT,PRIMARY KEY(actors.id),REFERENCED BY());"
matdb_revs,"CREATE READONLY TABLE matdb_revs (tx_id INT NOT NULL,page_number INT NOT NULL,PRIMARY KEY(matdb_revs.tx_id),REFERENCED BY());"
matdb_schemas,"CREATE READONLY TABLE matdb_schemas (name TEXT NOT NULL,schema TEXT NOT NULL,PRIMARY KEY(matdb_schemas.name),REFERENCED BY());"
matdb_tables,"CREATE READONLY TABLE matdb_tables (name TEXT NOT NULL,tree_id INT NOT NULL,PRIMARY KEY(matdb_tables.name),REFERENCED BY());"
matdb_revs
- tracking page ids of the footer pages of previous database revisions.Example (this database has only two revisions):
matdb> SELECT tx_id, page_number FROM matdb_revs;
matdb_revs.tx_id,matdb_revs.page_number
1,3
Schema:
CREATE READONLY TABLE matdb_revs (
tx_id INT PRIMARY KEY,
page_number INT NOT NULL
);
The storage engine is based on a single-file copy-on-write immutable B+ Tree architecture. There is no write-ahead log, nor a rollback log. Pages are always appended (or they overwrite free pages), once committed, they are never freed.
For recovery, the storage engine utilizes a "footer page." This page is always appended to the end of the database file, if while recovering, the engine doesn't find the footer page at the end of the file, it simply searches for it backwards through the file and truncates all pages after it.
Unfortunately, right now matdb
performs rather unfavourably compared to mainstream DBMSs like SQLite. This is due to several issues including:
matdb
's inferior B+ Tree balancing algorithm compared to SQLite,matdb
,This project is licensed under the MIT License.