zksync_dal

Crates.iozksync_dal
lib.rszksync_dal
version0.1.0
sourcesrc
created_at2024-07-12 12:41:47.669515
updated_at2024-07-12 12:41:47.669515
descriptionZKsync data access layer
homepagehttps://zksync.io/
repositoryhttps://github.com/matter-labs/zksync-era
max_upload_size
id1300637
size1,189,877
crates-io (github:matter-labs:crates-io)

documentation

README

DAL (Data access Layer)

This crate provides read and write access to the main database (which is Postgres), that acts as a primary source of truth.

Current schema is managed by sqlx. Schema changes are stored in the migrations directory.

Schema overview

This overview skips prover-related and Ethereum sender-related tables, which are specific to the main node.

L2 blocks and L1 batches

  • miniblocks. Stores L2 block headers. The naming is due to historic reasons.

  • miniblocks_consensus. Stores L2 block data related to the consensus algorithm used by the decentralized sequencer. Tied one-to-one to L2 blocks (the consensus side of the relation is optional).

  • l1_batches. Stores L1 batch headers.

  • commitments. Stores a part of L1 batch commitment data (event queue and bootloader memory commitments). In the future, other commitment-related columns will be moved here from l1_batches.

Transactions

  • transactions. Stores all transactions received by the node, both L2 and L1 ones. Transactions in this table are not necessarily included into an L2 block; i.e., the table is used as a persistent mempool as well.

VM storage

See zksync_state crate for more context.

  • storage_logs. Stores all the VM storage write logs for all transactions, as well as non-transaction writes generated by the bootloader. This is the source of truth for the VM storage; all other VM storage implementations (see the zksync_state crate) are based on it (e.g., by adding persistent or in-memory caching). Used by multiple components including Metadata calculator, Commitment generator, API server (both for reading one-off values like account balance, and as a part of the VM sandbox) etc.

  • initial_writes. Stores initial writes information for each L1 batch, i.e., the enumeration index assigned for each key. Used when generating L1 batch metadata in Metadata calculator and Commitment generator components, and in the VM sandbox in API server for fee estimation.

  • protective_reads. Stores protective read information for each L1 batch, i.e., keys influencing VM execution for the batch that were not modified. Used when generating L1 batch metadata in Commitment generator.

  • factory_deps. Stores bytecodes of all deployed L2 contracts.

  • storage. Obsolete, going to be removed; must not be used in new code.

Other VM artifacts

  • events. Stores all events (aka logs) emitted by smart contracts during VM execution.

  • l2_to_l1_logs. Stores L2-to-L1 logs emitted by smart contracts during VM execution.

  • call_traces. Stores call traces for transactions emitted during VM execution. (Unlike with L1 node implementations, in Era call traces are currently proactively generated for all transactions.)

  • tokens. Stores all ERC-20 tokens registered in the L1–L2 bridge.

  • transaction_traces. Obsolete, going to be removed; must not be used in new code.

Snapshot generation and recovery

See snapshots_creator and snapshots_applier crates for the overview of application-level nodes snapshots.

  • snapshots. Stores metadata for all snapshots generated by snapshots_creator, such as the L1 batch of the snapshot.

  • snapshot_recovery. Stores metadata for the snapshot used during node recovery, if any. Currently, this table is expected to have no more than one row.

Logical invariants

In addition to foreign key constraints and other constraints manifested directly in the DB schema, the following invariants are expected to be upheld:

  • If a header is present in the miniblocks table, it is expected that the DB contains all artifacts associated with the L2 block execution, such as events, l2_to_l1_logs, call_traces, tokens etc. (See State keeper I/O logic for the exact definition of these artifacts.)
  • Likewise, if a header is present in the l1_batches table, all artifacts associated with the L1 batch execution are also expected in the DB, e.g. initial_writes and protective_reads. (See State keeper I/O logic for the exact definition of these artifacts.)
  • L2 blocks and L1 batches present in the DB form a continuous range of numbers. If a DB is recovered from a node snapshot, the first L2 block / L1 batch is the next one after the snapshot L2 block / L1 batch mentioned in the snapshot_recovery table. Otherwise, L2 blocks / L1 batches must start from number 0 (aka genesis).
  • address and key fields in the storage_logs table are not null for all blocks executed on the node (i.e., blocks the header of which is present in miniblocks). On the other hand, address and key fields may be null for snapshot storage logs. These fields are needed for some components post-processing L1 batches, such as the Merkle tree and the commitment generator. Both use (address, key) tuples to sort logs in a batch to get canonical ordering. Since a snapshot is not post-processed in such a way, it is acceptable to skip them for the snapshot logs (and only for them).

Contributing to DAL

Some tips and tricks to make contributing to DAL easier:

  • If you want to add a new DB query, search the DAL code or the .sqlx directory for the identical / equivalent queries. Reuse is almost always better than duplication.
  • It usually makes sense to instrument your queries using instrument tooling. See the instrument module docs for details.
  • It's best to cover added queries with unit tests to ensure they work and don't break in the future. sqlx has compile-time schema checking, but it's not a panacea.
  • If there are doubts as to the query performance, run a query with EXPLAIN / EXPLAIN ANALYZE prefixes against a production-size database.

Backward compatibility

All DB schema changes are expected to be backward-compatible. That is, old code must be able to function with the new schema. As an example, dropping / renaming columns is not allowed. Instead, a 2-phase migration should be used:

  1. The column should be marked as obsolete, with its mentions replaced in all queries. If the column should be renamed, a new column should be created and data (if any) should be copied from the old column (see also: Programmatic migrations).
  2. After a significant delay (order of months), the old column may be removed in a separate migration.

Programmatic migrations

We cannot afford non-trivial amount of downtime caused by a data migration. That is, if a migration may cause such downtime (e.g., it copies non-trivial amount of data), it must be organized as a programmatic migration and run in the node background (perhaps, splitting work into chunks with a delay between them so that the migration doesn't hog all DB resources).

Commit count: 1500

cargo fmt