| Crates.io | chromsize |
| lib.rs | chromsize |
| version | 0.0.32 |
| created_at | 2024-08-15 17:25:31.285474+00 |
| updated_at | 2025-09-22 09:39:03.905382+00 |
| description | just get your chrom sizes |
| homepage | https://github.com/alejandrogzi/chromsize |
| repository | https://github.com/alejandrogzi/chromsize |
| max_upload_size | |
| id | 1339053 |
| size | 48,720 |
annoyed to have to create an index and cut it?
have to look for that old script every time?
got you. just get your chrom sizes. very fast.
but first, how is this better than any other option? yeah, just check the image below.
googled 'get chromosome sizes from fasta', grab every command/tool I found and benchmarked it. surprisingly, you can lose 14 seconds of your life just waiting for those chrom sizes to be calculated. crazy.
What's new on v.0.0.32?
- now accepts .2bit files as input too!
- --fasta argument now is --input (or -i) [accounts for .2bit files]
Usage: chromsize --input <FASTA/FASTA.GZ/2BIT> --output <OUTPUT> [-t <THREADS>]
Arguments:
-i, --input <FASTA>: FASTA file
-o, --output <OUTPUT>: path to chrom.sizes
Options:
-t, --threads <THREADS>: number of threads [default: your max ncpus]
-a, --accession-only only keep the accession id part of the header (stop after blank)
--help: print help
--version: print version
to install rust and use chromsize on your system follow this steps:
get installer: curl https://sh.rustup.rs -sSf | sh on unix, or go here for other options
run cargo install chromsize (make sure ~/.cargo/bin is in your $PATH before running it)
use chromsize with the required arguments
use chromsize;
fn main() {
let input = PathBuf::new("/path/to/fasta.fa"); // INFO: can be .2bit too
let output = PathBuf::new("/path/to/chrom.sizes");
let sizes: Vec<(String, u64)> = chromsize::chromsize(&input);
chromsize::write(sizes, &output)
}
build the port to install it as a pkg:
git clone https://github.com/alejandrogzi/chromsize.git && cd chromsize/py-chromsize
hatch shell
maturin develop --release
use it as a binary wrapper [for .fa and .2bit]:
import chromsize as cs
input = "/path/to/fasta.fa" # INFO: can be .2bit too
output = "/path/to/chrom.sizes"
cs.write_chromsizes(input, output)
or just get them directly
import chromsize as cs
input = "/path/to/fasta.fa" # INFO: can be .2bit too
sizes = cs.get_chromsizes(input)
>>> print(sizes)
[
('chr1', 123),
('chr2', 456),
...
]
to build chromsize from this repo, do:
git clone https://github.com/alejandrogzi/chromsize.git && cd chromsizecargo run --release -- -i <FASTA/FASTA.GZ/2BIT> -o <OUTPUT>to build the development container image:
git clone https://github.com/alejandrogzi/chromsize.git && cd chromsizestart docker or systemctl start dockerdocker image build --tag chromsize .docker run --rm -v "[dir_where_your_fa_is]:/dir" chromsize -f /dir/<INPUT> -o /dir/<OUTPUT>to use chromsize through Conda just:
conda install chromsize -c bioconda or conda create -n chromsize -c bioconda chromsizeNextflow (not available yet)
do not believe me? run the benchmark on your own:
ASSEMBLIES const with the .fa you've downloadcargo run release --bin chromsize-benchmark -- -d /dir/where/my/fastas/are -a show-output ignore-failurehere is all the info and metadata from my experiment:
| Tool | Command | Reference | Discussion |
|---|---|---|---|
| seqkit | seqkit fx2tab --length --name --header-line {assembly} > chrom.sizes |
1 | 2 |
| chromsize | target/release/chromsize -f {assembly} -o chrom.sizes |
3 | |
| pyfaidx | faidx {assembly} -i chromsizes > chrom.sizes |
4 | 5 |
| samtools | samtools faidx {assembly} && wait | cut -f1,2 {assembly}.fai > chrom.sizes |
6 | 5 |
| faSize | faSize -detailed -tab {assembly} > chrom.sizes |
7 | |
| awk1 | awk '/^>/ {if (seqlen){print seqlen}; print ;seqlen=0;next; } { seqlen += length($0)}END{print seqlen}' {assembly} > chrom.sizes |
8 | 9 |
| awk2 | awk '/^>/{if (l!=") print l; print; l=0; next}{l+=length($0)}END{print l}' {assembly} > chrom.sizes |
8 | 9 |
| bioawk1 | bioawk -c fastx '{print > $name ORS length($seq)}' {assembly} > chrom.sizes |
10 | 9 |
| awk3 | cat {assembly} | awk '$0 ~ > {if (NR > 1) {print c;} c=0;printf substr($0,2,100) "\t"; } $0 !~ ">" {c+=length($0);} END { print c; }' > chrom.sizes |
8 | 11 |
| bioawk2 | bioawk -c fastx '{ print $name, length($seq) }' < {assembly} > chrom.sizes |
10 | 2 |
| Species | Assembly | Size (Gb) | chromsize | seqKit | awk1 | awk2 | awk3 | bioawk1 | bioawk2 | faSize | pyfaidx | samtools |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| S. cerevisiae | R64 | 0.01 | 0.004 | 0.016 (X 4.0) | 0.043 (X 10.7) | 0.043 (X 10.7) | 0.05 (X 12.5) | 0.03 (X 7.5) | 0.03 (X 7.5) | 0.054 (X 13.5) | 0.101 (X 25.2) | 0.064 (X 16.0) |
| C. elegans | ce11 | 0.10 | 0.02 | 0.103 (X 5.1) | 0.409 (X 20.4) | 0.408 (X 20.4) | 0.492 (X 24.6) | 0.274 (X 13.7) | 0.274 (X 13.7) | 0.426 (X 21.3) | 0.225 (X 11.2) | 0.472 (X 23.6) |
| D. melanogaster | dm6 | 0.14 | 0.028 | 0.147 (X 5.2) | 0.581 (X 20.7) | 0.583 (X 20.8) | 0.714 (X 25.5) | 0.426 (X 15.2) | 0.418 (X 14.9) | 0.633 (X 22.6) | 0.337 (X 12.0) | 0.667 (X 23.8) |
| D. rerio | danRer11 | 1.37 | 0.22 | 0.742 (X 3.4) | 6.815 (X 31.0) | 6.803 (X 30.9) | 8.216 (X 37.3) | 3.946 (X 17.9) | 3.95 (X 18.0) | 7.202 (X 32.7) | 3.029 (X 13.8) | 7.633 (X 34.7) |
| C. familiaris | canFam4 | 2.48 | 0.311 | 1.209 (X 3.9) | 10.158 (X 32.7) | 10.124 (X 32.6) | 12.206 (X 39.2) | 6.55 (X 21.1) | 6.518 (X 21.0) | 10.671 (X 34.3) | 4.741 (X 15.2) | 11.394 (X 36.6) |
| H. sapiens | GRCh38 | 3.10 | 0.43 | 1.696 (X 3.9) | 12.393 (X 28.8) | 12.432 (X 28.9) | 13.681 (X 31.8) | 7.414 (X 17.2) | 7.284 (X 16.9) | 13.102 (X 30.5) | 6.37 (X 14.8) | 14.074 (X 32.7) |
| B. bombina | aBomBom1 | 9.80 | 1.554 | 8.501 (X 5.5) | 41.676 (X 26.8) | 41.696 (X 26.8) | 49.064 (X 31.6) | 24.202 (X 15.6) | 24.374 (X 15.7) | 43.856 (X 28.2) | 19.755 (X 12.7) | 45.387 (X 29.2) |
| A. mexicanum | AmbMex60DD | 28.20 | 3.327 | 14.375 (X 4.3) | 118.923 (X 35.7) | 118.422 (X 35.6) | 137.781 (X 41.4) | 57.626 (X 17.3) | 57.591 (X 17.3) | 121.257 (X 36.4) | 54.82 (X 16.5) | 128.374 (X 38.6) |
| P. annectens | PAN1.0 | 40.10 | 4.606 | 18.664 (X 4.1) | 167.85 (X 36.4) | 165.701 (X 36.0) | 196.833 (X 42.7) | 91.747 (X 19.9) | 91.924 (X 20.0) | 170.475 (X 37.0) | 77.707 (X 16.9) | 181.562 (X 39.4) |
| Tool | Cores | Time |
|---|---|---|
| seqkit | 16 | 18.993 s ± 0.132 s |
| chromsize | default (max_cpus: 16) | 7.631 s ± 0.010 s |
| seqkit | default (4) | 18.525 s ± 0.520 s |
| chromsize | 4 | 8.035 s ± 0.077 s |
| seqkit | 2 | 18.535 s ± 0.376 s |
| chromsize | 2 | 8.284 s ± 0.030 s |