colonizer

Crates.io	colonizer
lib.rs	colonizer
version	0.1.0
created_at	2025-09-13 20:12:12.606533+00
updated_at	2025-09-13 20:12:12.606533+00
description	Catalogue of Life (ChecklistBank) client + CLI: search usages, browse tree, vernacular names, and an Inspire mode for crate-name ideas (with Wikipedia summaries).
homepage	https://github.com/oolonek/colonizer
repository	https://github.com/oolonek/colonizer
max_upload_size
id	1838093
size	109,972

Pierre-Marie Allard (oolonek)

documentation

https://docs.rs/colonizer

README

Colonizer

Rust crate and CLI to work with the Catalogue of Life (CoL) via the ChecklistBank API.

Features

Fetch the latest official CoL release dataset key automatically.
Resolve CoL usage ID for an exact scientific name.
List usages at a given rank with automatic pagination.
List roots and children in the taxonomic tree.
Show a taxon’s classification chain.
Suggest usages for partial queries.
Retrieve vernacular (common) names.
Download full CoL packages from the static download server.
Inspire mode: pick a random vernacular name for a given language (optionally one-word only), hyphenize it for crate names, and include a short Wikipedia summary plus Wikipedia/Wikidata links when available.

Install

From crates.io (binary + library): cargo install colonizer
Build from source: cargo build --release

CLI Usage

colonizer latest — print latest CoL dataset key.
colonizer id "Homo sapiens" — print CoL ID for the name.
colonizer list-rank GENUS --max 100 — list genera (ID, label, rank).
colonizer roots — list tree roots (e.g., domains).
colonizer children Homo --rank GENUS — list children of a taxon by name (or --by-id 636X2).
colonizer classify "Homo sapiens" --rank SPECIES — show classification chain.
colonizer suggest "homo sa" --limit 10 — quick suggestions.
colonizer vernacular "Homo sapiens" --rank SPECIES — common names.
colonizer inspire --lang fra [--one-word] — print a random hyphenized vernacular, taxonID, scientificName, and a CoL link. Adds a short Wikipedia summary and links when available.
--dataset <key> — operate on a specific dataset instead of the latest CoL release.
--json — return JSON instead of TSV.

Inspire (random vernacular names)

Text output example:
- colonizer inspire --lang fra
  - scalaire 6FXL3 Epitonium clathrus https://www.catalogueoflife.org/data/taxon/6FXL3
  - [fr wiki] Epitonium clathrus, le scalaire, est une espèce ...
  - https://fr.wikipedia.org/wiki/Epitonium_clathrus
  - https://www.wikidata.org/wiki/Q1995213
JSON output example:
- colonizer --json inspire --lang eng --one-word
  - { "lang": "eng", "vernacularName": "bogue", "vernacularHyphenized": "bogue", "taxonID": "MHY3", "scientificName": "Boops boops", "link": "https://www.catalogueoflife.org/data/taxon/MHY3", "oneWord": true, "wikipediaLang": "en", "wikipediaSummary": "Boops boops, commonly called the boce, ...", "wikipediaUrl": "https://en.wikipedia.org/wiki/Boops_boops", "wikidataUrl": "https://www.wikidata.org/wiki/Q950498" }
Options:
- --lang <iso-639-2>: vernacular language (e.g., fra, eng, spa). Defaults to fra.
- --one-word: only return vernaculars that are a single token (no spaces). Useful for short crate names; retries internally if needed.
Notes:
- The random pick is API-based; it does not download any TSV locally.
- Some datasets include vernaculars whose taxonID is not a valid ChecklistBank usage id. The command validates IDs and retries when needed.
- Wikipedia summary is best-effort: it tries your requested language (mapped from the 3-letter code) and falls back to English. If no page/summary is found, the wiki lines are omitted.

Downloading the full CoL

Easiest and fastest: use the static downloads hosted by ChecklistBank.
- Latest monthly release:
  - DWCA: curl -L -o col_latest_dwca.zip https://download.checklistbank.org/col/latest_dwca.zip
  - CoLDP: curl -L -o col_latest_coldp.zip https://download.checklistbank.org/col/latest_coldp.zip
  - Text tree: curl -L -o col_latest_txtree.zip https://download.checklistbank.org/col/latest_txtree.zip
- Specific month (YYYY-MM-DD):
  - curl -L -o col_2025-08-20_dwca.zip https://download.checklistbank.org/col/monthly/2025-08-20_dwca.zip
From the CLI:
- Latest DWCA: colonizer download dwca --latest
- Match your selected dataset’s month (default): colonizer download coldp
- Specific date: colonizer download txtree --date 2025-08-20
When to use the API export:
- If you need custom subsets (e.g., a clade root), include/exclude synonyms, or alternate formats beyond the static defaults, POST an export via the API (/dataset/{key}/export) and then GET /export/{id} to fetch the file. The CLI may add this as an advanced option.

Library Usage

use colonizer::ColClient;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let col = ColClient::from_latest()?;
    let id = col.id_for_name("Homo sapiens", None)?;
    println!("id: {:?}", id);

    let genera = col.list_by_rank("GENUS", Some(100))?;
    println!("{} genera fetched", genera.len());
    Ok(())
}

Notes

Listing by rank across all taxa can be extremely large (e.g., species). Use --max to cap results.
The crate targets https://api.checklistbank.org, which powers the Catalogue of Life.
When using name-based targets for commands like children, classify, and vernacular, add --rank to disambiguate homonyms (e.g., GENUS, SPECIES). Use --by-id to pass a usage ID directly.

Commit count: 0

colonizer

documentation

README

Colonizer

cargo fmt