[](https://github.com/tolkit/telomeric-identifier)
[](https://crates.io/crates/tidk)
[](https://bioconda.github.io/recipes/tidk/README.html)
[![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.10091385.svg)](https://doi.org/10.5281/zenodo.10091385)
# A Telomere Identification toolKit (`tidk`)
`tidk` is a toolkit to identify and visualise telomeric repeats for the Darwin Tree of Life genomes. `tidk` works especially well on chromosomal genomes, but can also work on PacBio HiFi reads as well (see the telomeric repeat database for many examples). There are a few modules in the tool, which may be useful to anyone investigating telomeric repeat sequences in a genome.
1. `explore` - tries to find the telomeric repeat unit in the genome.
2. `find` and `search` are essentially the same. They identify a repeat sequence in windows across the genome. `find` uses an in-built table of telomeric repeats, in `search` you supply your own.
3. `plot` does what is says on the tin, and plots the csv output of `find` or `search` as an SVG.
4. `build` builds the telomeric repeat database and saves on your local machine for use in `tidk find`.
## Install
The easiest way to install is through conda:
```bash
conda install -c bioconda tidk
```
Otherwise...
As with other Rust projects, you will have to complile yourself. Download rust, clone this repo, `cd` into it, and then run:
`cargo install --path=.`
To install into `$PATH` as `tidk`.
## Usage
Below is some usage guidance. From 0.2.3 onwards there have been breaking changes to the CLI interface. They will be pointed out below, and in the release changelog.
### Build
Before using `tidk find`, you will need to fetch the data using `tidk build`. You can do this from version 0.2.6 onwards.
### Explore
`tidk explore` will attempt to find the simple telomeric repeat unit in the genome provided. It will report this repeat in its canonical form (e.g. TTAGG -> AACCT). Unlike previous versions, only a simple TSV is printed to STDOUT. Use the `distance` parameter to search only in a proportion of the chromosome arms. The default is 1% of the length of the chromosome either side, but feel free to change this. In particular with raw reads (PacBio), I'd recommend setting the distance flag to 0.5 (`--distance 0.5` or `--distance=0.5`), to process the full length of each read.
For example:
`tidk explore --minimum 5 --maximum 12 fastas/iyBomHort1_1.20210303.curated_primary.fa > out.tsv` searches the genome for repeats from length 5 to length 12 sequentially on the Bombus hortorum genome.
```
Use a range of kmer sizes to find potential telomeric repeats.
One of either length, or minimum and maximum must be specified.
Usage: tidk explore [OPTIONS]
Arguments:
The input fasta file
Options:
-l, --length [] Length of substring
-m, --minimum [] Minimum length of substring [default: 5]
-x, --maximum [] Maximum length of substring [default: 12]
-t, --threshold [] Positions of repeats are only reported if they occur sequentially in a greater number than the threshold [default: 100]
--distance [] The distance from the end of the chromosome as a proportion of chromosome length. Must range from 0-0.5. [default: 0.01]
-v, --verbose Print verbose output.
--log Output a log file.
-h, --help Print help
-V, --version Print version
```
### Find
`tidk find` will take an input clade, and match the known or putative telomeric repeat for that clade (or repeats plural) and search the genome. Now uses a custom curated telomeric repeat database. As more telomeric repeats are found and added, the dictionary of sequences used will increase.
```
Supply the name of a clade your organsim belongs to, and this submodule will find all telomeric repeat matches for that clade.
Usage: tidk find [OPTIONS] [FASTA]
Arguments:
[FASTA] The input fasta file
Options:
-w, --window [] Window size to calculate telomeric repeat counts in [default: 10000]
-c, --clade The clade of organism to identify telomeres in [possible values: Accipitriformes, Actiniaria, Anura, Apiales, Aplousobranchia, Asterales, Buxales, Caprimulgiformes, Carangiformes, Carcharhiniformes, Cardiida, Carnivora, Caryophyllales, Cheilostomatida, Chiroptera, Chlamydomonadales, Coleoptera, Crassiclitellata, Cypriniformes, Eucoccidiorida, Fabales, Fagales, Forcipulatida, Hemiptera, Heteronemertea, Hirudinida, Hymenoptera, Hypnales, Labriformes, Lamiales, Lepidoptera, Malpighiales, Myrtales, Odonata, Orthoptera, Pectinida, Perciformes, Phlebobranchia, Phyllodocida, Plecoptera, Pleuronectiformes, Poales, Rodentia, Rosales, Salmoniformes, Sapindales, Solanales, Symphypleona, Syngnathiformes, Trichoptera, Trochida, Venerida]
-o, --output