# Draft This is the initial draft for **Rubro**. ## The Problem Backing up a massive number of files may take a long time, and it may also increase the cost of data storage, specially on cloud providers. Syncing software needs to know which files changed, and it involves walking through the entire directory structure and reading metadata from the remote server (such as S3) to determine what to upload, incurring on additional costs. In addition to this, metadata itself can sometimes take substantial space, especially when dealing with a large number of small files. My problem is this, I have thousands of files to back up, almost half a million. It takes a very long time to back up everything to Nextcloud, which has S3 configured as an external storage (which later moves data to Glacier). ## The solution Given a scenario similar to the following: ```txt development/public/java/pub_jproj1 development/public/java/pub_jproj2 development/public/java/pub_jproj3 development/public/rust/pub_rproj1 development/public/rust/pub_rproj2 development/public/rust/pub_rproj3 development/public/go/pub_gproj1 development/public/go/pub_gproj2 development/public/go/pub_gproj3 development/public/scala/pub_sproj1 development/public/scala/pub_sproj2 development/public/scala/pub_sproj3 development/public/kotlin/pub_kproj1 development/public/kotlin/pub_kproj2 development/public/kotlin/pub_kproj3 development/personal/java/per_jproj1 development/personal/java/per_jproj2 development/personal/java/per_jproj3 development/personal/kotlin/per_kproj1 development/personal/kotlin/per_projgp1/per_kproj1 development/personal/kotlin/per_projgp1/per_kproj2 development/personal/kotlin/per_projgp1/per_kproj3 development/personal/kotlin/per_projgp1/per_kproj4 development/work/ development/oss_contrib/oss_project1/... development/oss_contrib/oss_project2/... development/oss_contrib/oss_project2/subproject{1,3}/... development/oss_contrib/oss_project3/... ``` We need to group those directories based on some criteria, then mount zip files using [fuse-zip](https://bitbucket.org/agalanin/fuse-zip) and copy the files to the mounted directories. After this, we umount everything and sync the zip files using Nextcloud[^1]. The major goal of **rubro** (this project), is to achieve both a balanced distribution of directories and proper distribution of directories according to their modification frequency. [^1]: The idea is to be cloud-agnostic, **rubro** will rely on user-defined scripts for anything that goes beyond its purpose, which is file and directory group. ### Possible feature #### Keep but regroup A feature that may help a lot is to not remove a directory that was recently modified from a group of infrequently modified directories. And how would one know which one is the newest one? The plan is to not provide any automated solution at first, either let another tool do this job (based on modification time) or build a tool later on. **Keep but regroup** will not mount the previous zip file that contained that directory, but will create a new group, or move to a group that makes sense, while keeping the previous version. ## What Rubro will not be Rubro does not aim to provide a solution for straightforward file access, it's meant to be **use-and-forget-until-the-catastrophe-strikes**. When the catastrophe strikes, the effort of getting the files out of the zip files should be the last of your worries.