API requirements =========== Simultaneous access --------------------------- Pippin is roughly a database. Like databases, it should support simultaneous access from multiple parties or parts of a program. It already does in a way: it is designed such that multiple instances accessing the same files should not clash (though they may not synchronise until the next load). But it should support in-memory parallel usage. One option would be to put each access through a queing system, but this makes each request slower. Another would be to allow read access to multiple parties and copy-on-write with some synchronisation system. This makes each element insertion/deletion slow and requires frequent synchronisation. Another would be to allow each user to make a copy of the current state and only allow modifications through a copy. Committing modifications may require a merge. The user should be able to check for external modifications while holding a copy. Partitions ----------- There should be an interface representing a single partition's data in memory; call this `Partition`. A `Partition` should be able to represent multiple states of its data; this is needed for commit replay and commit creation, as well as for history browsing. In particular, a `Partition` must represent at least two states (which may be equal): the current state, and the last saved state. Interfaces for creating a `Partition`: 1. Create empty 2. Load all snapshots and commit logs provided by some interface 3. Ditto, but with restrictions (e.g. only latest state) Interfaces for modifying a `Partition`: 1. Load more snapshots/logs provided initially, optionally with restrictions 2. Modify the current state, by: * inserting an element * replacing an element * deleting an element 3. Create a new commit from the current state 4. Writting all changes to a commit log (automatically choosing whether or not to additionally create a new snapshot) 5. Write a new snapshot Interfaces for reading data from a `Partition`: 1. List element identifiers 2. Iterate over elements, perhaps with filters 3. Retrieve a specified element Note that this is incomplete: some mechanism is required in order to (a) provide snapshot and commit log data streams for loading, (b) provide a data stream for writing a new snapshot, (c) provide a data stream for writing a commit log as well as removing obsolete commit logs. ### Partition file discovery There should be some interface for discovering partition snapshots and log files given a path to a snapshot file, either limited to the specified snapshot file plus its commit logs, or resolving all snapshots and commit logs for the same partition. Creation: 1. Snapshot only, via path 2. 1 + find corresponding commit logs 3. Extrapolate to all files for the partition Interface: this should implement some trait used by `Partition`, allowing retrieval of the latest snapshot, all snapshots in historical order, commit logs for each snapshot (maybe via a sub-interface), creation of new snapshot files (more accurately writable streams), and creation of new log files. Repositories -------------- An interface for representing a multiple-partition repository in memory is required; call this `Repository`. A repository could be created: 1. Empty, with a name and some classifier functions (possibly empty) 2. Loading partitioning data given some data source It could then provide: 1. A list of partitions available and partitions loaded and could be modified by: 1. Loading all partitions at the latest state 2. Loading or unloading a specific partition 3. Per element operations, which optionally load on demand: * iterate over elements according to some filters * insertion of an element * deletion of an element * replacement of an element * retrieval of an element by id 4. Adding a classifier function 5. Reprioritising classifier functions 6. Repartitioning some or all data 7. Committing all changes ### Repository discovery There should be some interface for discovering partition files given a repository's location (directory or any one snapshot). It would implement a trait provided alongside `Repository`, returning a list of partitions found, each as a partition file discovery object. It would also be used to store new partitions. Note: this leaves interpretation of partitioning to `Repository`, not the discovery tool. This is better for the case where the discovery tool is not used. Maintenance operations ------------------------------- Compact data stores: rewrite some stored partitions, possibly combining some patches / discarding some states in line with revised history requirements. Push/pull/merge: push local modifications to a remote copy, pull remote modifications, merge changes (only automatic ones or start manual merge), etc. Fix: if checksum errors are found, try to recover (e.g. check whether remote copies are also corrupted, try to localise the corruption, possibly ask the user, replay a series of patches and compare to a snapshot). Encryption -------------- I don't know what might be needed here, or maybe combined elsewhere...