Crates.io | cooklang-sync-client |
lib.rs | cooklang-sync-client |
version | 0.1.6 |
source | src |
created_at | 2024-06-19 11:56:52.119887 |
updated_at | 2024-08-09 07:39:02.971655 |
description | A client library for cooklang-sync |
homepage | |
repository | https://github.com/cooklang/cooklang-sync |
max_upload_size | |
id | 1276796 |
size | 116,670 |
REMOTE SCHEMA
Namespace Id (NSID) Relative Path in namespace Journal ID (JID): Monotonically increasing within a namespace
Q:
where to store chunks? s3 is to expensive for such small files, maybe cheap distributed key/value db?
jid: integer path // relative to current dir format: text|binary modified: unix timestamp size: integer is_symlink: bool checksum: varchar
client needs to update a file from meta server (MS)
client needs to upload a file to server
program just starts
file was removed locally
file was moved locally
file was renamed
one line in a file was edited
one line in a file was added
one line in a file was removed
if latest jid remotely bigger sync dowload from remote if metadata, size is different upload to remote and after commit store into local db
Q:
do I need hierarchy of services or they should be all independent?
how sharing should work?
how to thread it? multiple modules and multiple files
do I need to sync file metadata as well?
We have separate threads for sniffing the file system, hashing, commit, store_batch, list, retrieve_batch, and reconstruct, allowing us to pipeline parallelize this process across many files. We use compression and rsync to minimize the size of store_batch/retrieve_batch requests.
commit("breakfast/Mexican Style Burrito.cook", "h1,h2,h3");
Q:
problem if by line? => seek wont work, need to store block size to do the seek effeftively.
where to store chunks for not yet assembled file
how to understand that a new file created remotely
hot to understand that file was deleted
how to understand that
Q:
do I need to copy not changed jid? or just update updated? => it makes sense to update all
what happens on delete, move?
Role of Chunker is to deal with persistance of hashes and files. It operates on text files and chunks are not a fixed sized but each chunk is a line of file.
fn hashify(file_path: String) -> io::Result<Vec<String>>
fn save(file_path: String, Vec<String>) -> io::Result
. It should raise an error if cache doesn't have content for a specific chunk hashfn read_chunk(chunk: String) -> io::Result<String>
fn save_chunk(chunk: String, content: String) -> io::Result
fn compare_sets(left: Vec<String>, right: Vec<String>) -> bool
fn check_chunk(chunk: String>) -> io::Result<bool>
Q:
strings will be short, 80-100 symbols. what should be used as hashing function? what size of hash should be? I'd say square root of 10. You can test it!
empty files should be different from deleted
bundling of uploads/downloads
read-only
namespaces
proper error handling
report error on unexpeted cache behaviour
don't need to throw unknown error in each non-200 response
remove clone
limit max file
configuration struct
pull changes first or reindex locally first? research possible conflict scenarios
extract to core shared datasctuctures
garbage collection on DB
test test test
metrics for monitoring (cache saturation, miss)
protect from ddos https://github.com/rousan/multer-rs/blob/master/examples/prevent_dos_attack.rs
auto-update client