![docs.rs](https://img.shields.io/docsrs/annatto) [![codecov](https://codecov.io/gh/korpling/annatto/graph/badge.svg?token=51VXZ3IJPA)](https://codecov.io/gh/korpling/annatto) # Annatto This software aims to test and convert data within the [RUEG](https://hu.berlin/rueg) research group at Humboldt-Universität zu Berlin. Tests aim at continuously evaluating the state of the [RUEG corpus data](https://zenodo.org/record/3236068) to early identify issues regarding compatibility, consistency, and integrity to facilitate data handling with regard to annotation, releases and integration. For efficiency annatto relies on the [graphANNIS representation](https://korpling.github.io/graphANNIS/docs/v2.2/data-model.html) and already provides a basic set of data handling modules. ## Installing and running annatto Annatto is a command line program, which is available pre-compiled for Linux, Windows and macOS. Download and extract the [latest release file](https://github.com/korpling/annatto/releases/latest) for your platform. The main usage of annatto is through the command line interface. Run ```bash annatto --help ``` to get more help on the sub-commands. The most important command is `annatto run `, which runs all the modules as defined in the given [workflow] file. ## Modules Annatto comes with a number of modules, which have different types: **Importer** modules allow importing files from different formats. More than one importer can be used in a workflow, but then the corpus data needs to be merged using one of the merger manipulators. When running a workflow, the importers are executed first and in parallel. **Graph operation** modules change the imported corpus data. They are executed one after another (non-parallel) and in the order they have been defined in the workflow. **Exporter** modules export the data into different formats. More than one exporter can be used in a workflow. When running a workflow, the exporters are executed last and in parallel. To list all available formats (importer, exporter) and graph operations run ```bash annatto list ``` To show information about modules for the given format or graph operation use ```bash annatto info ``` The documentation for the modules are also included [here](https://github.com/korpling/annatto/blob/v0.18.0/docs/README.md). ## Creating a workflow file Annatto workflow files list which importers, graph operations and exporters to execute. We use an [TOML file](https://toml.io/) with the ending `.toml` to configure the workflow. TOML files can be as simple as key-value pairs, like `config-key = "config-value"`. But they allow representing more complex structures, such as lists. The [TOML website](https://toml.io/) has a great "Quick Tour" section which explains the basics concepts of TOML with examples. ### Import An import step starts with the header `[[import]]`, and a configuration value for the key `path` where to read the corpus from and the key `format` which declares in which format the corpus is encoded. The file path is relative to the workflow file. Importers also have an additional configuration header, that follows the `[[import]]` section and is marked with the `[import.config]` header. ```toml [[import]] path = "textgrid/exampleCorpus/" format = "textgrid" [import.config] tier_groups = { tok = [ "pos", "lemma", "Inf-Struct" ] } skip_timeline_generation = true skip_audio = true skip_time_annotations = true audio_extension = "wav" ``` You can have more than one importer, and you can simply list all the different importers at the beginning of the workflow file. An importer always needs to have a configuration header, even if it does not set any specific configuration option. ```toml [[import]] path = "a/mycorpus/" format = "format-a" [import.config] [[import]] path = "b/mycorpus/" format = "format-b" [import.config] [[import]] path = "c/mycorpus/" format = "format-c" [import.config] # ... ``` ### Graph operations Graph operations use the header `[[graph_op]]` and the key `action` to describe which action to execute. Since there are no files to import/export, they don't have a `path` configuration. ```toml [[graph_op]] action = "check" [graph_op.config] # Empty list of tests tests = [] ``` ### Export Exporters work similar to importers, but use the keyword `[[export]]` instead. ```toml [[export]] path = "output/exampleCorpus" format = "graphml" [export.config] add_vis = "# no vis" guess_vis = true ``` ### Full example You cannot mix import, graph operations and export headers. You have to first list all the import steps, then the graph operations and then the export steps. ```toml [[import]] path = "conll/ExampleCorpus" format = "conllu" config = {} [[graph_op]] action = "check" [graph_op.config] report = "list" [[graph_op.config.tests]] query = "tok" expected = [ 1, inf ] description = "There is at least one token." [[graph_op.config.tests]] query = "node ->dep node" expected = [ 1, inf ] description = "There is at least one dependency relation." [[export]] path = "grapml/" format = "graphml" [export.config] add_vis = "# no vis" guess_vis = true ``` ## Developing annatto You need to install Rust to compile the project. We recommend installing the following Cargo subcommands for developing annis-web: - [cargo-release](https://crates.io/crates/cargo-release) for creating releases - [cargo-about](https://crates.io/crates/cargo-about) for re-generating the third party license file - [cargo-llvm-cov](https://crates.io/crates/cargo-llvm-cov) for determining the code coverage - [cargo-insta](https://crates.io/crates/cargo-insta) allows reviewing the test snapshot files - [cargo-dist](https://crates.io/crates/cargo-dist) for configuring the GitHub actions that create the release binaries. ### Execute tests You can run the tests with the default `cargo test` command. To calculate the code coverage, you can use `cargo-llvm-cov`: ```bash cargo llvm-cov --open --all-features --ignore-filename-regex 'tests?\.rs' ``` ### Performing a release You need to have [`cargo-release`](https://crates.io/crates/cargo-release) installed to perform a release. Execute the follwing `cargo` command once to install it. ```bash cargo install cargo-release cargo-about ``` To perform a release, switch to the main branch and execute: ```bash cargo release [LEVEL] --execute ``` The [level](https://github.com/crate-ci/cargo-release/blob/HEAD/docs/reference.md#bump-level) should be `patch`, `minor` or `major` depending on the changes made in the release. Running the release command will also trigger a CI workflow to create release binaries on GitHub. ## Funding Die Forschungsergebnisse dieser Veröffentlichung wurden gefördert durch die Deutsche Forschungsgemeinschaft (DFG) – SFB 1412, 416591334 sowie FOR 2537, 313607803, GZ LU 856/16-1. This research was funded by the German Research Foundation (DFG, Deutsche Forschungsgemeinschaft) – SFB 1412, 416591334 and FOR 2537, 313607803, GZ LU 856/16-1.