kamino-cli

Crates.io	kamino-cli
lib.rs	kamino-cli
version	0.5.0
created_at	2026-01-14 18:45:15.41119+00
updated_at	2026-01-20 15:36:59.593284+00
description	Build phylogenomic datasets in seconds.
homepage
repository	https://github.com/rderelle/kamino
max_upload_size
id	2043442
size	141,462

Romain Derelle (rderelle)

documentation

README

kamino logo

From the Spanish word for path.

Builds an amino-acid alignment in a reference-free, alignment-free manner from a set of proteomes.
Not ‘better’ than traditional marker-based pipelines, but simpler and faster to run.

Typical usages range from between-species to within-phylum phylogenetic analyses (bacteria, archaea and eukaryotes).

under the hood

kamino performs the following successive steps:

lists proteome files from the input directory (-i or -I)
recodes proteins with a 6-letters recoding scheme (-r)
simplifies proteomes by discarding out-branching k-mers
builds a global assembly graph and identifies variant groups as described here (-d)
converts variant group paths back to amino acids using a sliding window
mask long polymorphism runs within variant groups (-m)
filters variant groups by missing data and middle-length thresholds (-f and -l)
extracts middle positions and incorporate 'constant' positions (-c)
outputs the final amino acid alignment (-o)

installation

You can either compile the code locally using rustc, or install a precompiled binary from Bioconda:

conda install bioconda::kamino

running kamino

Input consists of proteome files in FASTA format (gzipped or not), with one file per sample. Files can be placed in a single directory (specified with the -i argument), or their paths can be provided in a tab-delimited file using -I.

A basic run using four threads can be performed with either of the following commands:

kamino -i <input_dir> -t 4
kamino -I <tabular_file> -t 4

examples

All analyses were performed on a MacBook "M4 Pro" using v0.4.0 and 4 threads (other parameters set to default unless specified):

dataset	taxonomic diversity	runtime (min)	memory (GB)	alignment size (aa)
50 Mycobacterium	within-genera	0.1	2	19,283
400 Mycobacterium	within-genera	0.9	8	13,753
50 Polyporales (fungi)	within-order	0.5	8	21,808
46 Drosophila	within-genera	0.7	7	194,021
55 Mammalia	within-class	1.6	14	291,437
55 Mammalia `-k 13`	within-class	1.9	8	191,962

FAQ

When not to use kamino?
- low diversity datasets (ie, within-species), for which genome-based approaches will be more powerful
- very large datasets (eg, thousands of bacterial proteomes or hundreds of vertebrate proteomes)
- very divergent datasets (eg, animal kingdom)
- distant outgroup composed of a few isolates: these might have disproportionately more missing data
- list to be completed ...
Is the output reproducible?

Yes, kamino is fully deterministic so will produce the exact same alignment for a given version, set of parameters and input proteomes.

How to get more phylogenetic positions?

Increase the k-mer size (-k), increase the maximum depth of the graph traversal (-d), or lower the minimum proportion of isolates with amino acid per position (-m) if that is acceptable for downstream analyses.

This codebase is provided under the MIT License. Some parts of the code were drafted using AI assistance.

Commit count: 69