# Beelay - A new sync protocol for Automerge

This document describes a new sync protocol - tentatively named "Beelay" - designed for efficiently synchronizing collections of Automerge documents. 

## Introduction and motivation

An Automerge document can be thought of as like git for JSON. Each document is a data structure which can be materialized as a JSON object, but modifications to the document are stored within the document as a commit DAG rather like Git. This means that Automerge documents can be edited concurrently and merged together automatically later, providing a substrate for collaboration without central peers.

One common requirement for collaborative software is real-time communication between concurrently editing peers. For this purpose Automerge includes a sync protocol which enables peers to send deltas rather than sending the entire document on every change - for things like per-keystroke editing this is essential. This sync protocol works but production use has exposed a few limitations: firstly, that running sync servers is expensive and compromises security and secondly, that applications frequently want to synchronize many documents.

### Sync Servers

Due to the structure of networks on the internet, and due to requirements for data availability, it is often necessary to have a central server which acts as a relay and store-and-forward peer for sync messages. Unlike peers running on behalf of a single user these peers must synchronize many documents at once. The existing sync protocol requires that the entire document be in memory in order to run the sync protocol which limits the scale these servers can affordably reach.

In addition, one of the appealing things about Automerge is that you don't need to trust a central server. Introducing a sync server somewhat compromises this feature - you must trust the server operator not to look at your data and to keep it secure.

### Collections of Documents

An Automerge document is a "unit of sharing". You can't share subcomponents of a document with other people - or view the history of just one part of the document. This is a useful property because it makes the behavior of a document predictable in the face of concurrent changes but it is also a limitation because many useful applications involve sharing subsets of a users data with different people.

To work around this limitation applications often create many Automerge documents and link them together via a location independent url. This introduces challenges for the sync protocol. We have users who have thousands of documents - synchronizing just the small number of documents which have changed in one session is very inefficient with the current scheme - which must load every document into memory and create a separate sync session for each one.

### Requirements

These two sets of problems are technically unrelated, what brings them together for our purposes is that they can both be solved by a new sync protocol. Specifically what we require is a sync protocol which

* Does not impose O(n) memory requirements on sync servers where n is the number of documents being synchronized    
* Allows for sync servers to operate over encrypted data to reduce the trust users have to place in sync servers
* Provides a mechanism for efficiently determining what documents in a collection of documents have changed

## Design

Beelay is an RPC based protocol which operates between two peers. We make no assumptions about the ordering of messages on the channel connecting these two peers. In typical interactions one peer will be a "sync server" and one will be a user agent within some application, although the protocol doesn't require this topology.

To begin synchronizing a document with a remote peer the local Beelay peer first requests that the remote peer create a "snapshot" representing the current state of the given document and every document transitively reachable (via links) from it. The remote peer creates this snapshot and returns an identifier the local peer can use to perform [RIBLT-sync] with the snapshot. Once this sync is complete the local peer knows which documents in the collection are out of sync.

At this point the local peer runs [ sedimentree sync ]( ./sedimentree.md ) for each out of sync document. Once this is complete the local peer knows that it is at least up to date with the state of the collection at the time of the snapshot. Finally the local peer can [listen] to any changes to documents in the snapshot which have occurred since the snapshot was created. Thus the local peer can stay in sync with live updates.

It must be possible to run this synchronization protocol over encrypted data which means that we cannot directly examine the contents of the documents being synchronzed in order to extract the links between them. Instead, peers which have the clear text of the documents synchronize the links between documents to a "reachability index" on the server. This index is a very simple CRDT which is also synchronized via [sedimentree sync].

### RIBLT Sync

RIBLT sync refers to "Rateless Invertible Bloom Lookup Tables" as presented in [Practical Rateless Set Reconciliation](https://arxiv.org/abs/2402.02668). This scheme allows for a set of items to be reconciled between two peers with a bandwidth overhead which is proportional to the size of the set difference and with a very low number of round trips.

### Sedimentree Sync

Sedimentree sync is a scheme we have designed for synchronizing commit DAGs such as those which make up an Automerge document. The important features of sedimentree sync are that it allows for compressing runs of operations in the commit DAG _and ommitting their change hashes_ from the compressed runs. This is crucial as it allows us to keep a very granular commit history without using enormous amounts of storage and/or bandwidth. See [sedimentree.md] for more details. 

## Messages

Beelay messages are encoded in a binary format which we describe here. With the following additional notation:

```
byte = %x00-FF
        ; any octet
bytes = 1*byte
        ; any sequence of octets
uleb128 = 1*8( %x00-7F ) / ( %x80-FF bytes )
        ; unsigned LEB128 encoding
leb128 = 1*8( %x00-7F ) / ( %x80-FF bytes )
        ; signed LEB128 encoding
```    

With the exception of notifications sent in response to listen requests, every message is either a request or a response. 

```
message = message_type ((request_id (request / response)) /  notification)
message_type = %d00 ; request
    / %d01 ; response
    / %d02 ; notification
request_id = 16(byte)

request = create_snapshot_request
    / snapshot_symbols_request
    / fetch_sedimentree_request
    / fetch_blob_part_request
    / upload_commits_request
    / upload_blob_request
    / listen
response = create_snapshot
    / snapshot_symbols_response
    / fetch_sedimentree_response
    / fetch_blob_part_response
    / upload_commits_response
    / upload_blob_response
    / listen
```

### Create Snapshot

The create snapshot request contains just the document ID of the root document to create a snapshot from.

```
create_snapshot_request = %d04 ; request type
    uleb128 ; length of root document ID
    bytes ; root document ID
```

The response to a create snapshot request contains the snapshot ID and the first "coded symbols" from the RIBLT sync for the snapshot in question. The receiver should attempt to peel these symbols and if they find they still need more, then use the `snapshot_symbols` request to request more symbols.

```
create_snapshot_response = %d04 ; response type
    16(byte) ; snapshot ID
    uleb128 ; number of coded symbols
    *coded_symbol

coded_symbol = 16(byte) ; first part of symbol which is a document ID when peeled
    32(byte) ; second part of symbol, which is the hash of the heads of the 
             ; document when peeled
```


### Snapshot Symbols

Used to request additional RIBLT coded symbols for an existing snapshot.

```
snapshot_symbols_request = %d05 ; request type
    16(byte) ; snapshot ID

snapshot_symbols_response = %d05 ; response type
    uleb128 ; number of coded symbols
    *coded_symbol
```

### Fetch Sedimentree

Request to fetch the Sedimentree (both content and reachability index) for a specific document.

```
fetch_sedimentree_request = %d01 ; request type
    uleb128 ; length of document ID
    bytes ; document ID

fetch_sedimentree_response = %d02 ; response type
    sedimentree_data

sedimentree_data = %d00 ; not found
    / (%d01 ; found
       content_summary
       index_summary)

content_summary = uleb128 ; number of bundles
    *sedimentree_bundle

index_summary = uleb128 ; number of bundles  
    *sedimentree_bundle

sedimentree_bundle = bytes ; bundle data
```

### Fetch Blob Part

Request to fetch a portion of a blob by its hash.

```
fetch_blob_part_request = %d02 ; request type
    32(byte) ; blob hash
    uleb128 ; offset
    uleb128 ; length

fetch_blob_part_response = %d03 ; response type
    uleb128 ; length of blob part
    bytes ; blob part data
```

### Upload Commits

Request to upload commits for a document.

```
upload_commits_request = %d00 ; request type
    uleb128 ; length of document ID
    bytes ; document ID
    commit_category ; single byte indicating commit category
    uleb128 ; number of upload items
    *upload_item

upload_commits_response = %d01 ; response type
    ; empty response indicating success

upload_item = blob_ref tree_part

blob_ref = %d00 32(byte) ; blob hash
    / (%d01 ; inline blob
       uleb128 ; blob length
       bytes) ; blob data

tree_part = %d00 ; stratum
    (%d00 / (%d01 32(byte))) ; optional start commit hash
    32(byte) ; end commit hash
    uleb128 ; number of checkpoints
    *32(byte) ; checkpoint hashes
    / (%d01 ; commit
       32(byte) ; commit hash
       uleb128 ; number of parents
       *32(byte)) ; parent hashes
```

### Upload Blob

Request to upload a complete blob.

```
upload_blob_request = %d03 ; request type
    uleb128 ; blob length
    bytes ; blob data

upload_blob_response = %d01 ; response type
    ; empty response indicating success
```

### Listen

Request to listen for updates to a snapshot. Any changes to the documents reachable from the snapshot which have been discovered by the server since the snapshot was created will be delivered as "notification" messages, which are the only kind of message sent without an explicit request.

```
listen_request = %d06 ; request type
    16(byte) ; snapshot ID

listen_response = %d06 ; response type
    ; empty response indicating successful subscription

notification = peer_id
    doc_id
    upload_item

peer_id = uleb128 ; length of peer ID
    bytes ; peer ID bytes

doc_id = uleb128 ; length of document ID
    bytes ; document ID bytes
```

### Notifications

Notifications are sent to a peer when a change is discovered to a document which is transitively reachable from a snapshot the peer issued a `listen` request to.


```
notification = from_peer
    document_id
    upload_item

from_peer = uleb128 ; length of peer ID bytes
    bytes ; peer ID

document_id = uleb128 ; length of document ID bytes
    bytes ; document ID

upload_item = blob_ref tree_part

blob_ref = %d00 32(byte) ; blob hash reference
    / (%d01 ; inline blob
       uleb128 ; blob length
       bytes) ; blob data

tree_part = %d00 ; stratum
    (%d00 / (%d01 32(byte))) ; optional start commit hash
    32(byte) ; end commit hash
    uleb128 ; number of checkpoints
    *32(byte) ; checkpoint hashes
    / (%d01 ; commit
       32(byte) ; commit hash
       uleb128 ; number of parents
       *32(byte)) ; parent hashes
```


### Error Response

Any request can result in an error response:

```
error_response = %d00 ; error response type
    uleb128 ; length of error message
    bytes ; UTF-8 encoded error message
```