# Draft

This is the initial draft for **Rubro**.

## The Problem

Backing up a massive number of files may take a long time, and it may also increase the cost of data storage,
specially on cloud providers.

Syncing software needs to know which files changed, and it involves walking through the entire directory
structure and reading metadata from the remote server (such as S3) to determine what to upload,
incurring on additional costs. In addition to this, metadata itself can sometimes take substantial space,
especially when dealing with a large number of small files.

My problem is this, I have thousands of files to back up, almost half a million. It takes a very long time
to back up everything to Nextcloud, which has S3 configured as an external storage (which later moves data to Glacier).

## The solution

Given a scenario similar to the following:

```txt
development/public/java/pub_jproj1
development/public/java/pub_jproj2
development/public/java/pub_jproj3
development/public/rust/pub_rproj1
development/public/rust/pub_rproj2
development/public/rust/pub_rproj3
development/public/go/pub_gproj1
development/public/go/pub_gproj2
development/public/go/pub_gproj3
development/public/scala/pub_sproj1
development/public/scala/pub_sproj2
development/public/scala/pub_sproj3
development/public/kotlin/pub_kproj1
development/public/kotlin/pub_kproj2
development/public/kotlin/pub_kproj3
development/personal/java/per_jproj1
development/personal/java/per_jproj2
development/personal/java/per_jproj3
development/personal/kotlin/per_kproj1
development/personal/kotlin/per_projgp1/per_kproj1
development/personal/kotlin/per_projgp1/per_kproj2
development/personal/kotlin/per_projgp1/per_kproj3
development/personal/kotlin/per_projgp1/per_kproj4
development/work/
development/oss_contrib/oss_project1/...
development/oss_contrib/oss_project2/...
development/oss_contrib/oss_project2/subproject{1,3}/...
development/oss_contrib/oss_project3/...
```

We need to group those directories based on some criteria, then mount zip files using
[fuse-zip](https://bitbucket.org/agalanin/fuse-zip) and copy the files to the mounted directories.
After this, we umount everything and sync the zip files using Nextcloud[^1].

The major goal of **rubro** (this project), is to achieve both a balanced distribution of directories and
proper distribution of directories according to their modification frequency.

[^1]: The idea is to be cloud-agnostic, **rubro** will rely on user-defined scripts for anything that goes
beyond its purpose, which is file and directory group.

### Possible feature

#### Keep but regroup

A feature that may help a lot is to not remove a directory that was recently modified from a group
of infrequently modified directories.

And how would one know which one is the newest one? The plan is to not provide any automated solution at first,
either let another tool do this job (based on modification time) or build a tool later on.

**Keep but regroup** will not mount the previous zip file that contained that directory, but will create
a new group, or move to a group that makes sense, while keeping the previous version.

## What Rubro will not be

Rubro does not aim to provide a solution for straightforward file access, it's meant to be
**use-and-forget-until-the-catastrophe-strikes**. When the catastrophe strikes, the effort of getting
the files out of the zip files should be the last of your worries.