Crates.io | ayb |
lib.rs | ayb |
version | 0.1.8 |
source | src |
created_at | 2023-04-17 03:02:49.79181 |
updated_at | 2024-08-27 03:27:15.999512 |
description | ayb makes it easy to create, host, and share embedded databases like SQLite and DuckDB |
homepage | https://github.com/marcua/ayb |
repository | |
max_upload_size | |
id | 841118 |
size | 358,211 |
ayb
ayb
makes it easy to create databases, share them with collaborators, and query them from a web application or the command line.
With ayb
, all your (data)base can finally belong to you. Move SQL for great justice.
ayb
is a database management system with easy-to-host instances that enable users to quickly register an account, create databases, share them with collaborators, and query them from a web application or the command line. An ayb
server allows users to create SQLite databases (other databases to come), and then exposes those databases through an HTTP API.
To learn more about why ayb
matters, how it works, or who it's for, read this introductory blog post.
alpha warning: ayb
is neither feature complete nor production-ready. Functionality like authentication, permissions, collaboration, isolation, high availability, and transaction support are on the Roadmap but not available today. I work on ayb
as a hobbyist side project.
ayb
is written in Rust, and is available as the ayb
crate. Assuming you have installed Rust on your machine, installing ayb
takes a single command:
cargo install ayb
An ayb
server stores its metadata in SQLite or PostgreSQL, and stores the databases it's hosting on a local disk. An ayb.toml
file tells the server what host/port to listen for connections on, how to connect to the database, and the data path for the hosted databases. You can generate a starter file with ayb default_server_config
.
$ ayb default_server_config > ayb.toml
$ cat ayb.toml
host = "0.0.0.0"
port = 5433
database_url = "sqlite://ayb_data/ayb.sqlite"
# Or, for Postgres:
# database_url = "postgresql://postgres_user:test@localhost:5432/test_db"
data_path = "./ayb_data"
[authentication]
# A secret (and unique to your server) key that is used for account registration.
fernet_key = "<UNIQUE_KEY_GENERATED_BY_COMMAND>="
token_expiration_seconds = 3600
[email]
from = "Server Sender <server@example.org>"
reply_to = "Server Reply <replyto@example.org>"
smtp_host = "localhost"
smtp_port = 465
smtp_username = "login@example.org"
smtp_password = "the_password"
Running the server then requires one command
$ ayb server
Once the server is running, you can register a user (in this case, marcua
), create a database marcua/test.sqlite
, and issue SQL as you like. Here's how to do that at the command line:
$ ayb client --url http://127.0.0.1:5433 register marcua you@example.com
Check your email to finish registering marcua
# You will receive an email at you@example.com instructing you to type the next command
$ ayb client confirm <TOKEN_FROM_EMAIL>
Successfully authenticated and saved token <API_TOKEN>
$ ayb client create_database marcua/test.sqlite
Successfully created marcua/test.sqlite
$ ayb client list marcua
Database slug | Type
---------------+--------
test.sqlite | sqlite
$ ayb client query marcua/test.sqlite "CREATE TABLE favorite_databases(name varchar, score integer);"
Rows: 0
# If you don't pass a query to the query command, ayb launches an interactive query session
$ ayb client query marcua/test.sqlite
Launching an interactive session for marcua/test.sqlite
marcua/test.sqlite> INSERT INTO favorite_databases (name, score) VALUES ("PostgreSQL", 10);
Rows: 0
marcua/test.sqlite> INSERT INTO favorite_databases (name, score) VALUES ("SQLite", 9);
Rows: 0
marcua/test.sqlite> INSERT INTO favorite_databases (name, score) VALUES ("DuckDB", 9);
Rows: 0
marcua/test.sqlite> SELECT * FROM favorite_databases;
name | score
------------+-------
PostgreSQL | 10
SQLite | 9
DuckDB | 9
Rows: 3
marcua/test.sqlite>
$ ayb client update_profile marcua --display_name 'Adam Marcus' --links 'http://marcua.net'
Successfully updated profile
$ ayb client profile marcua
Display name | Description | Organization | Location | Links
--------------+-------------+--------------+----------+-------------------
Adam Marcus | | | | http://marcua.net
Note that the command line also saved a configuration file for your
convenience so you don't have to keep entering a server URL or API
token. If you ever want to set these explicitly, the --url
/--token
command-line flags and AYB_SERVER_URL
/AYB_API_TOKEN
environment
variables will override whatever is in the saved configuration. By
default, the configuration file can be found in:
/home/alice/.config/ayb/ayb.json
/Users/Alice/Library/Application Support/org.ayb.ayb/ayb.json
C:\Users\Alice\AppData\Roaming\ayb\ayb\config\ayb.json
The command line invocations above are a thin wrapper around ayb
's HTTP API. Here are the same commands as above, but with curl
:
$ curl -w "\n" -X POST http://127.0.0.1:5433/v1/register -H "entity-type: user" -H "entity: marcua" -H "email-address: your@example.com"
{}
$ curl -w "\n" -X POST http://127.0.0.1:5433/v1/confirm -H "authentication-token: TOKEN_FROM_EMAIL"
{"entity":"marcua","token":"<API_TOKEN>"}
$ curl -w "\n" -X POST http://127.0.0.1:5433/v1/marcua/test.sqlite/create -H "db-type: sqlite" -H "authorization: Bearer <API_TOKEN_FROM_PREVIOUS_COMMAND>"
{"entity":"marcua","database":"test.sqlite","database_type":"sqlite"}
$ curl -w "\n" -X PATCH http://127.0.0.1:5433/v1/entity/marcua -H "authorization: Bearer <API_TOKEN_FROM_PREVIOUS_COMMAND>" -d "{\"display_name\": \"Adam Marcus\"}"
{}
$ curl -w "\n" -X GET http://localhost:5433/v1/entity/marcua -H "authorization: Bearer <API_TOKEN_FROM_PREVIOUS_COMMAND>"
{"slug":"marcua","databases":[{"slug":"test.sqlite","database_type":"sqlite"}],"profile":{"display_name":"Adam Marcus"}}
$ curl -w "\n" -X POST http://127.0.0.1:5433/v1/marcua/test.sqlite/query -H "authorization: Bearer <API_TOKEN_FROM_PREVIOUS_COMMAND>" -d 'CREATE TABLE favorite_databases(name varchar, score integer);'
{"fields":[],"rows":[]}
$ curl -w "\n" -X POST http://127.0.0.1:5433/v1/marcua/test.sqlite/query -H "authorization: Bearer <API_TOKEN_FROM_PREVIOUS_COMMAND>" -d "INSERT INTO favorite_databases (name, score) VALUES (\"PostgreSQL\", 10);"
{"fields":[],"rows":[]}
$ curl -w "\n" -X POST http://127.0.0.1:5433/v1/marcua/test.sqlite/query -H "authorization: Bearer <API_TOKEN_FROM_PREVIOUS_COMMAND>" -d "INSERT INTO favorite_databases (name, score) VALUES (\"SQLite\", 9);"
{"fields":[],"rows":[]}
$ curl -w "\n" -X POST http://127.0.0.1:5433/v1/marcua/test.sqlite/query -H "authorization: Bearer <API_TOKEN_FROM_PREVIOUS_COMMAND>" -d "INSERT INTO favorite_databases (name, score) VALUES (\"DuckDB\", 9);"
{"fields":[],"rows":[]}
$ curl -w "\n" -X POST http://127.0.0.1:5433/v1/marcua/test.sqlite/query -H "authorization: Bearer <API_TOKEN_FROM_PREVIOUS_COMMAND>" -d "SELECT * FROM favorite_databases;"
{"fields":["name","score"],"rows":[["PostgreSQL","10"],["SQLite","9"],["DuckDB","9"]]}
You can configure ayb
to periodically upload snapshots of each
database to S3-compatible storage to
recover from the failure of the machine running ayb
or revert to a
previous copy of the data. Each snapshot is compressed (using
zstd) and only uploaded if the database changed
since the last snapshot. To enable snapshot-based backups, include a
configuration block like the following in your ayb.toml
:
[snapshots]
sqlite_method = "Vacuum"
access_key_id = "YOUR_S3_ACCESS_KEY_ID"
secret_access_key = "YOUR_S3_ACCESS_KEY_SECRET"
bucket = "bucket-to-upload-snapshots"
path_prefix = "some/optional/prefix"
endpoint_url = "https://url-endpoint-of-s3-compatible-provider.com" # Optional
region = "us-east-1" # Optional
force_path_style = false # Optional
[snapshots.automation]
interval = "10m"
max_snapshots = 3
Here is an explanation of the parameters:
sqlite_method
: The two SQLite backup methods are Vacuum and Backup. ayb
only supports Vacuum
for now.access_key_id
/ secret_access_key
: The access key ID and secret to upload/list snapshots to your S3-compatible storage provider.bucket
: The name of the bucket to which to upload snapshots.bucket_prefix
: (Can be blank) if you want to upload snapshots to a prefixed path inside bucket
(e.g., my-bucket/the-snapshots
), provide a prefix (e.g., the-snapshots
).endpoint_url
: (Optional if using AWS S3) Each S3-compatible storage provider will tell you their own endpoint to manage your buckets.region
: (Optional if using AWS S3) Some S3-compatible storage providers will request a region in their network where your bucket will live.force_path_style
: (Optional, legacy) If included and true
, will use the legacy path-style method of referencing buckets. Used in ayb
's end-to-end tests and might be helpful beyond, but start without it.interval
: How frequently to take a snapshot of your data in human-readable format (e.g., every 30 minutes = 30m
, every hour = 1h
, every hour and 30 minutes = 1h30m
, with more examples here).max_snapshots
: How many old snapshots to keep before pruning the oldest ones.Once snapshots are enabled, you will see logs on the server with each periodic snapshot run. The following example shows how snapshots work, including how to list and restore them (using interval = "3s"
and max_snapshots = 2
):
$ ayb client create_database marcua/snapshots.sqlite
Successfully created marcua/snapshots.sqlite
$ ayb client query marcua/snapshots.sqlite "CREATE TABLE favorite_databases(name varchar, score integer);"
Rows: 0
$ ayb client query marcua/snapshots.sqlite "INSERT INTO favorite_databases (name, score) VALUES (\"PostgreSQL\", 10);"
Rows: 0
# Wait longer than 3 seconds before inserting the next row, so that a snapshot with just PostgreSQL exists.
$ ayb client query marcua/snapshots.sqlite "INSERT INTO favorite_databases (name, score) VALUES (\"SQLite\", 9);"
Rows: 0
$ ayb client query marcua/snapshots.sqlite "SELECT * FROM favorite_databases;"
name | score
------------+-------
PostgreSQL | 10
SQLite | 9
Rows: 2
# Wait longer than 3 seconds before listing snapshots to ensure that a snapshot with SQLite exists as well.
$ ayb client list_snapshots marcua/snapshots.sqlite
Name | Last modified
------------------------------------------------------------------+---------------------------
f9e01a396fb7f91be988c26d43f9ffa667bd0fd05009b231aa61ea1073d34423 | 2024-08-18T15:05:04+00:00
856e21f7cae8383426cd2e0599caf6e83962b051af4734ab5c53aff87ea0ff45 | 2024-08-18T15:04:40+00:00
# Restore the older snapshot, which didn't contain SQLite
$ ayb client restore_snapshot marcua/snapshots.sqlite 856e21f7cae8383426cd2e0599caf6e83962b051af4734ab5c53aff87ea0ff45
Restored marcua/snapshots.sqlite to snapshot 856e21f7cae8383426cd2e0599caf6e83962b051af4734ab5c53aff87ea0ff45
$ ayb client query marcua/snapshots.sqlite "SELECT * FROM favorite_databases;"
name | score
------------+-------
PostgreSQL | 10
Rows: 1
Credits: the design of snapshot-based backups was influenced by that of rqlite. Thank you to the authors for their great design and documentation.
ayb
allows multiple users to run queries against databases that are
stored on the same machine. Isolation enables you to prevent one user
from accessing another user's data, and allows you to restrict the
resources any one user is able to utilize.
By default, ayb
uses
SQLITE_DBCONFIG_DEFENSIVE
flag and sets
SQLITE_LIMIT_ATTACHED
to 0
in order to prevent users from corrupting the database or
attaching to other databases on the filesystem.
For further isolation, ayb
uses nsjail to
isolate each query's filesystem access and resources. When this form
of isolation is enabled, ayb
starts a new nsjail
-managed process
to execute the query against the database. We have not yet benchmarked
the performance overhead of this approach.
To enable isolation, you must first build nsjail
, which you can do
through scripts/build_nsjail.sh. Note that
nsjail
depends on a few other packages. If you run into issues
building it, it might be helpful to see its
Dockerfile
to get a sense of those requirements.
Once you have a path to the
nsjail
binary, add the following to your ayb.toml
:
[isolation]
nsjail_path = "path/to/nsjail"
ayb
is largely tested through end-to-end
tests that mimic as realistic an environment as
possible. Individual modules may also provide more specific unit
tests. To set up your environment for running end-to-end tests, type:
tests/set_up_e2e_env.sh
After your environment is set up, you can run the tests with:
cargo test --verbose
In order to mimic as close to a realistic environment as possible, the end-to-end tests mock out very little functionality. The tests/set_up_e2e_env.sh
script, which has been used extensively in Ubuntu, does the following:
nsjail
binary to test ayb
's isolation functionality.ayb
for?The introductory blog post has a section describing each group that stands to benefit from ayb
's aim to make it easier to create a database, interact with it, and share it with relevant people/organizations. Students would benefit from encountering less operational impediments to writing their first SQL query or sharing their in-progress database with a mentor or teacher for help. Sharers like scientists and journalists would benefit from an easy way to post a dataset and share it with collaborators. Finally, anyone concerned about the sovereignty of their data would benefit from a world where it's so easy to spin up a database that more of their data can live in databases they control.
Thank you for asking. I hope the answer elicits some nostalgia! Shout out to Meelap Shah and Eugene Wu for convincing me to not call this project stacks
, to Andrew Lange-Abramowitz for making the connection to the storied meme, and to Meredith Blumenstock for listening to me fret over it all.
Here's a rough roadmap for the project, with items near the top of the list more likely to be completed first. The nitty-gritty list of prioritized issues can be found on this project board, with the most-likely-to-be-completed issues near the top of the to-do list.
ayb
experience excellent
ayb
is to make it easier to create, share, and query databases, it's frustrating that running ayb
requires you to pay the nontrivial cost of operationalizing PostgreSQL. While Postgres will be helpful for eventually coordinating between multiple ayb
nodes, a single-node version should be able to store its metadata in SQLite with little setup costs.ayb
instance can have multiple tenants/databases, we want to use one of the many container/isolate/microVM projects to ensure that one tenant isn't able to access another tenant's data.ayb
nodes to serve databases and requests. Whereas a single database will not span multiple machines, parallelism/distribution will happen across users and databases.ayb
's query API is a stateless request/response API, making it impossible to start a database transaction or issue multiple queries in a session. Exposing sessions in the API will allow multiple statements per session, and by extension, transactions.ayb
already uses existing well-established file formats (e.g., SQLite). There should be endpoints to import existing databases into ayb
in those formats or export the underlying files so you're not locked in.ayb
provides snapshot-based backups to protect against cataclysmic failures, the recovery process is manual. Streaming databases to replicas and switching to replicas on failure will make ayb
more highly available.ayb
to more people and software
ayb
over the PostgreSQL wire protocol will allow existing tools and libraries to connect to and query an ayb
database.ayb
database.ayb
-hosted databases.(This section is inspired by the LiteFS project, and is just one of the many things to love about that project.)
ayb
contributions work a little different than most GitHub projects:
This project has a roadmap and features are added and tested in a certain order. I'm adding a little friction in requiring a discussion/design document for features before submitting a pull request to ensure that I can focus my attention on well-motivated, well-sequenced, and well-understood functionality.