| Crates.io | miniphy |
| lib.rs | miniphy |
| version | 2.0.0-alpha.8 |
| created_at | 2025-02-11 12:16:42.89784+00 |
| updated_at | 2025-02-11 15:34:17.397296+00 |
| description | Create an ordered FASTA TAR file |
| homepage | |
| repository | https://github.com/karel-brinda/miniphy2 |
| max_upload_size | |
| id | 1551357 |
| size | 11,553,694 |
MiniPhy2 is the second version of the MiniPhy workflow for phylogenetic compression of large bacterial genome collections. This version has been entirely rewritten in Rust and minimizes on-disk operations; therefore, it is much more suitable for very large collections. The resulting compression performance should be near-identical compared to the original MiniPhy.
Prerequisites:
Installation from git:
git clone git://github.com/karel-brinda/miniphy2
cd miniphy2
make
./miniphy2 -h
#./target/release/miniphy2 -h
Downloading automatically built binaries: Go to https://github.com/karel-brinda/miniphy2/actions?query=CI and find the corresponding artifact.
General Syntax
miniphy2 [command] [options] [arguments]
compress commandPurpose: Compresses a single batch in a provided order (e.g., from AttoTree)
$ ./miniphy2 compress --help
Compress
Usage: miniphy2 compress [OPTIONS] <INPUT>...
Arguments:
<INPUT>... Files to include in the tar archive
Options:
-l, --list The provided files are lists of files
-f, --force Rewrite the output file if it already exists
-u, --uncompressed No TAR compression (otherwise compressed by xz -9 -T1 in memory)
-o, --output <FILE> Output file, - for stdout [default: -]
-p, --prefix <STR> Path prefix for files in the TAR file (e.g, batch1_ or batch1/) [default: ./]
-h, --help Print help
The input for compression are genome batches (of max. approximatelly 10k genomes), obtained for instance through MiniPhy 1. The following steps will compress a single batch.
Generate a file containing genome paths:
find /batch/directory -name '*.fa' > input.txt
The resulting input.txt is the list of genome file locations.
Use AttoTree with the default parameters:
attotree -L input.txt -o tree.nw
cat tree.nw | grep -o '[^,:()]*:' | sed 's/:$//' | grep -Ev ^$ \
| awk -v d="/batch/directory" '{print d "/" $0 ".fa"}' \
> phylogenetic_order.txt
Output: phylogenetic_order.txt, an ordered sequence of genome file paths
for MiniPhy2 processing.
./miniphy2 compress -p 'batchX/' -lfo compressed_genomes.tar.xz phylogenetic_order.txt
The options instruct MiniPhy2 to compress the genomes from
phylogenetic_order.txt in that order using xz -9 -T1 and save it into
compressed_genomes.tar.xz, with rewritting if the file already exists.
Additionally, it will prepend batchX/ to each file name in output archive, so
everything will be in a directory with this name.
Please use Github issues.
See Releases.