# quilt-rs Design Overview ## I. Introduction The primary purpose of quilt-rs is to provide a simple and efficient Rust API for managing Quilt packages and registries. In particular, we need objects that represent both on-disk data structures and the operations that can be performed on them. Importantly, we ultimately want the ability to transparently support multiple storage backends (local filesystem, S3, mock, etc.) and registry types (i.e., flat vs versioned). This document captures the current "as implemented" architecture, and identifies the key areas that may need to be refactored to meet the above requirements. ## II. Key Concepts ### II.A. Packages Quilt provides a universal data abstraction layer for managing "data packages", immutable, self-describing data containers whose cryptographically secure checksum acts a unique identifier. Crucially, each Package is a logical collection that abstracts away the physical location of the data, allowing users to interact with the data consistently, regardless of where and how it is stored. ### II.B. Manifests Manifests are the primary data structure used to define a Quilt package. They contain: - standard `info` about the package: version, commit message - optional package-level user-specified `meta` data - one `Entry` for each object "contained" in the package ### II.C. Entries Each Entry in a manifest represents a single object in the package. It contains: - name: the logical name of the object - place: the physical URI of the object - hash: the hash of the object (as a `MultiHash`) - size: the size of the object in bytes - info: optional system-generated metadata - meta: optional user-specified metadata ### II.D. Registries Registries are special folders within a storage system that contain Manifests, and associate them with the Namespaces used to identify packages. They may also contain configuration information for systems that work with packages. ### II.E. Domains A Domain is a location (e.g., an S3 bucket or local folder) that contains both a Registry and a Store for the actual data objects. ### II.F. Stores Each Store may be Versioned or Flat. Versioned Stores assign a `versionId` to reach revision of an object, while Flat Stores simply overwrite them. Currently, the system assumes S3 stores are versioned, while local stores are flat. When using a flat store, the Registry caches each known version of the object to avoid overwrites. ### II.G. Revisions A Revision is a specific version of a Package. It is identified by a hash of the package contents, and may be tagged with a human-readable name. The tag for most recent Revision is called `latest`. Unlike software packages, Quilt packages can be extremely large (thousands of files, terabytes of data). Therefore, the Quilt API must make it easy to only download and modify the parts of a package that are needed. This complicates the semantics of updating Revisions stored across different systems, as Manifests do not currently keep track of their entire Revision history. ### II.H. Lineages To address the Revision history problem, `quilt_rs` added a new concept called Lineages. The Lineage file for a Domain tracks which Packages have been downloaded and installed, and from where. It also tracks which Revisions of each Package have been "checked out" for users to edit. ## III. Current Architecture ### III.A. Library Module The `lib.rs` file contains the following modules and exported types: 1. `paths` (*private*): handles path scaffolding and the directory environment 2. `quilt` (**public**): legacy module, primarily focused on managing the local cache 1. `InstalledPackage`: represents a package that has been installed in the local cache. It keeps a reference to the `lineage`, `namespace`, `paths`, `remote`, and `storage` to help manage the package. 2. `LocalDomain`: represents the local cache for a concrete `storage` and `remote`. 3. `Manifest`: represents the on-disk manifest for a Quilt package. 4. `RemoteManifest`: references a remote manifest and keeps track of the objects in it. 5. `S3PackageUri`: represents a Quilt+ URI, which is a URL with a `quilt+` scheme. This is used to identify a package or registry in a Quilt-compatible storage system. 3. `quilt4`: this is the newer module, primary focused on managing Parquet manifests 1. `manifest::Manifest4`: represents the high-level manifest object 2. `row4::Row4`: provides methods to decode/encode quilt3's JSONL format 3. `table::Table`: a wrapper for arrow-rs's Table, the native Manifest format for quilt4 4. `uri::UriParser`: parses a URI. To be replaced with `url::Url` in the future 5. `uri::UriQuilt`: represents a Quilt URI, which is a URL with a `quilt` scheme. This is used to uniquely identify a package, registry, or path. 4. `s3_utils` (**public**): contains utilities for working with S3 It also defines the `Error` type and two high-level functions: - `install_temporarily`: installs a package into a temporary folder - `installed_packages`: returns a list of all currently installed packages ### III.B. Main Module `quilt_rs` provides a simple CLI interface for interacting with Quilt packages. The functionality for this is in the `cli` module, which supports the following commands: - `Browse` - `Install` - `List` - `Package` - `Uninstall`