# Tsumugu exclusion/inclusion logic and rules

Currently tsumugu follows a simple algorithm to determine whether a path should be completely excluded, partially excluded, or included:

0. When parsing regex, a `rev_inner` regex will be generated by replacing variables (`${UBUNTU_LTS}`, etc.) to `(?<distro_ver>.+)` (aka, match everything). The `rev_inner` would be used like this:

    ```rust
    pub fn is_others_match(&self, text: &str) -> bool {
        !self.inner.is_match(text) && self.rev_inner.is_match(text)
    }
    ```

1. First, users' exclusions and inclusions are preprocessed. For all **exclusions, if it is a prefix of any inclusion**, it will be put into the `list_only_regexes`, otherwise it will be put into `instant_stop_regexes`. All inclusions are in `include_regexes`.
2. While working threads are handling listing requests:
    1. Check with `instant_stop_regexes` and `include_regexes`:

        ```rust
        for regex in &self.instant_stop_regexes {
            if regex.is_match(text) {
                return Comparison::Stop;
            }
        }
        for regex in &self.include_regexes {
            if regex.is_match(text) {
                return Comparison::Ok;
            }
        }
        ```

    2. Then, the path will be checked with `rev_inner` regex by `is_others_match()`, and also completely excluded if matches (a fast shortcut).

       This is used for cases like Fedora -- it has many versions (currently from 1 to 40). Listing other version folders not in `${FEDORA_CURRENT}` is a waste of time and network. With this trick we could skip these unmatched versions.
    3. Finally, if the path matches `list_only_regexes`, files under this directory will be ignored (unless they are matched by `include_regexes`), but subdirectories will still be listed. Paths that are not matched by any regexes will be included as usual.

In this process some paths, which would be unnecessary, will still be listed. However, this logic suits needs of filtering OS versions well.

Also note that currently, this is used when generating relative path for comparison:

```rust
pub fn relative_to_str(relative: &[String], filename: Option<&str>) -> String {
    let mut r = relative.join("/");
    if r.starts_with('/') {
        warn!("unexpected / at the beginning of relative ({r})");
    } else {
        r.insert(0, '/');
    }
    if r.len() != 1 {
        if r.ends_with('/') {
            warn!("unexpected / at the end of relative ({r})")
        } else {
            r.push('/')
        }
    }

    // here r already has / at the end
    match filename {
        None => r,
        Some(filename) => {
            assert!(!filename.starts_with('/') && !filename.ends_with('/'));
            format!("{}{}", r, filename)
        }
    }
}
```

As a result:

1. All relative paths for comparison have "/" at front.
2. Directory paths have "/" at back, and files don't.

Examples:

1. `http://example.com/file` => `/file`
2. `http://example.com/dir` => `/dir/`
3. `http://example.com/dir/file` => `/dir/file`

Not that for compatibilities considerations, this trick is done: User regex which starts with `^` and not `^/`, would be replaced: `^` -> `^/` (this might break some very rare regexes).

So you could **write `/something$` to exclude ALL files and directories with name `something`**, instead of using 2 regexes (`^something$` and `/something$`, to match `something` at root and others not in root).

And also, `upstream` itself is NOT included when comparing. So if your upstream is set to `https://some.example.com/dir/`, you need to exclude `^something/` to exclude `https://some.example.com/dir/something/` instead of `^dir/something/`.

Test with [tsumugu list](./parser.md#debugging), if in doubt.