kamino-cli

Crates.iokamino-cli
lib.rskamino-cli
version0.5.0
created_at2026-01-14 18:45:15.41119+00
updated_at2026-01-20 15:36:59.593284+00
descriptionBuild phylogenomic datasets in seconds.
homepage
repositoryhttps://github.com/rderelle/kamino
max_upload_size
id2043442
size141,462
Romain Derelle (rderelle)

documentation

README

Cargo Build & Test Clippy check codecov Crates.io install with bioconda



kamino logo



From the Spanish word for path.

Builds an amino-acid alignment in a reference-free, alignment-free manner from a set of proteomes.
Not ‘better’ than traditional marker-based pipelines, but simpler and faster to run.

Typical usages range from between-species to within-phylum phylogenetic analyses (bacteria, archaea and eukaryotes).



under the hood

kamino performs the following successive steps:

  • lists proteome files from the input directory (-i or -I)
  • recodes proteins with a 6-letters recoding scheme (-r)
  • simplifies proteomes by discarding out-branching k-mers
  • builds a global assembly graph and identifies variant groups as described here (-d)
  • converts variant group paths back to amino acids using a sliding window
  • mask long polymorphism runs within variant groups (-m)
  • filters variant groups by missing data and middle-length thresholds (-f and -l)
  • extracts middle positions and incorporate 'constant' positions (-c)
  • outputs the final amino acid alignment (-o)

installation

You can either compile the code locally using rustc, or install a precompiled binary from Bioconda:

conda install bioconda::kamino

running kamino

Input consists of proteome files in FASTA format (gzipped or not), with one file per sample. Files can be placed in a single directory (specified with the -i argument), or their paths can be provided in a tab-delimited file using -I.

A basic run using four threads can be performed with either of the following commands:

kamino -i <input_dir> -t 4
kamino -I <tabular_file> -t 4

examples

All analyses were performed on a MacBook "M4 Pro" using v0.4.0 and 4 threads (other parameters set to default unless specified):

dataset taxonomic diversity runtime (min) memory (GB) alignment size (aa)
50 Mycobacterium within-genera 0.1 2 19,283
400 Mycobacterium within-genera 0.9 8 13,753
50 Polyporales (fungi) within-order 0.5 8 21,808
46 Drosophila within-genera 0.7 7 194,021
55 Mammalia within-class 1.6 14 291,437
55 Mammalia -k 13 within-class 1.9 8 191,962

FAQ

  • When not to use kamino?

    • low diversity datasets (ie, within-species), for which genome-based approaches will be more powerful
    • very large datasets (eg, thousands of bacterial proteomes or hundreds of vertebrate proteomes)
    • very divergent datasets (eg, animal kingdom)
    • distant outgroup composed of a few isolates: these might have disproportionately more missing data
    • list to be completed ...
  • Is the output reproducible?

Yes, kamino is fully deterministic so will produce the exact same alignment for a given version, set of parameters and input proteomes.

  • How to get more phylogenetic positions?

Increase the k-mer size (-k), increase the maximum depth of the graph traversal (-d), or lower the minimum proportion of isolates with amino acid per position (-m) if that is acceptable for downstream analyses.


This codebase is provided under the MIT License. Some parts of the code were drafted using AI assistance.

Commit count: 69

cargo fmt