colonizer

Crates.iocolonizer
lib.rscolonizer
version0.1.0
created_at2025-09-13 20:12:12.606533+00
updated_at2025-09-13 20:12:12.606533+00
descriptionCatalogue of Life (ChecklistBank) client + CLI: search usages, browse tree, vernacular names, and an Inspire mode for crate-name ideas (with Wikipedia summaries).
homepagehttps://github.com/oolonek/colonizer
repositoryhttps://github.com/oolonek/colonizer
max_upload_size
id1838093
size109,972
Pierre-Marie Allard (oolonek)

documentation

https://docs.rs/colonizer

README

Colonizer

Rust crate and CLI to work with the Catalogue of Life (CoL) via the ChecklistBank API.

Features

  • Fetch the latest official CoL release dataset key automatically.
  • Resolve CoL usage ID for an exact scientific name.
  • List usages at a given rank with automatic pagination.
  • List roots and children in the taxonomic tree.
  • Show a taxon’s classification chain.
  • Suggest usages for partial queries.
  • Retrieve vernacular (common) names.
  • Download full CoL packages from the static download server.
  • Inspire mode: pick a random vernacular name for a given language (optionally one-word only), hyphenize it for crate names, and include a short Wikipedia summary plus Wikipedia/Wikidata links when available.

Install

  • From crates.io (binary + library): cargo install colonizer
  • Build from source: cargo build --release

CLI Usage

  • colonizer latest — print latest CoL dataset key.
  • colonizer id "Homo sapiens" — print CoL ID for the name.
  • colonizer list-rank GENUS --max 100 — list genera (ID, label, rank).
  • colonizer roots — list tree roots (e.g., domains).
  • colonizer children Homo --rank GENUS — list children of a taxon by name (or --by-id 636X2).
  • colonizer classify "Homo sapiens" --rank SPECIES — show classification chain.
  • colonizer suggest "homo sa" --limit 10 — quick suggestions.
  • colonizer vernacular "Homo sapiens" --rank SPECIES — common names.
  • colonizer inspire --lang fra [--one-word] — print a random hyphenized vernacular, taxonID, scientificName, and a CoL link. Adds a short Wikipedia summary and links when available.
  • --dataset <key> — operate on a specific dataset instead of the latest CoL release.
  • --json — return JSON instead of TSV.

Inspire (random vernacular names)

  • Text output example:
    • colonizer inspire --lang fra
      • scalaire 6FXL3 Epitonium clathrus https://www.catalogueoflife.org/data/taxon/6FXL3
      • [fr wiki] Epitonium clathrus, le scalaire, est une espèce ...
      • https://fr.wikipedia.org/wiki/Epitonium_clathrus
      • https://www.wikidata.org/wiki/Q1995213
  • JSON output example:
    • colonizer --json inspire --lang eng --one-word
      • { "lang": "eng", "vernacularName": "bogue", "vernacularHyphenized": "bogue", "taxonID": "MHY3", "scientificName": "Boops boops", "link": "https://www.catalogueoflife.org/data/taxon/MHY3", "oneWord": true, "wikipediaLang": "en", "wikipediaSummary": "Boops boops, commonly called the boce, ...", "wikipediaUrl": "https://en.wikipedia.org/wiki/Boops_boops", "wikidataUrl": "https://www.wikidata.org/wiki/Q950498" }
  • Options:
    • --lang <iso-639-2>: vernacular language (e.g., fra, eng, spa). Defaults to fra.
    • --one-word: only return vernaculars that are a single token (no spaces). Useful for short crate names; retries internally if needed.
  • Notes:
    • The random pick is API-based; it does not download any TSV locally.
    • Some datasets include vernaculars whose taxonID is not a valid ChecklistBank usage id. The command validates IDs and retries when needed.
    • Wikipedia summary is best-effort: it tries your requested language (mapped from the 3-letter code) and falls back to English. If no page/summary is found, the wiki lines are omitted.

Downloading the full CoL

  • Easiest and fastest: use the static downloads hosted by ChecklistBank.

    • Latest monthly release:
      • DWCA: curl -L -o col_latest_dwca.zip https://download.checklistbank.org/col/latest_dwca.zip
      • CoLDP: curl -L -o col_latest_coldp.zip https://download.checklistbank.org/col/latest_coldp.zip
      • Text tree: curl -L -o col_latest_txtree.zip https://download.checklistbank.org/col/latest_txtree.zip
    • Specific month (YYYY-MM-DD):
      • curl -L -o col_2025-08-20_dwca.zip https://download.checklistbank.org/col/monthly/2025-08-20_dwca.zip
  • From the CLI:

    • Latest DWCA: colonizer download dwca --latest
    • Match your selected dataset’s month (default): colonizer download coldp
    • Specific date: colonizer download txtree --date 2025-08-20
  • When to use the API export:

    • If you need custom subsets (e.g., a clade root), include/exclude synonyms, or alternate formats beyond the static defaults, POST an export via the API (/dataset/{key}/export) and then GET /export/{id} to fetch the file. The CLI may add this as an advanced option.

Library Usage

use colonizer::ColClient;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let col = ColClient::from_latest()?;
    let id = col.id_for_name("Homo sapiens", None)?;
    println!("id: {:?}", id);

    let genera = col.list_by_rank("GENUS", Some(100))?;
    println!("{} genera fetched", genera.len());
    Ok(())
}

Notes

  • Listing by rank across all taxa can be extremely large (e.g., species). Use --max to cap results.

  • The crate targets https://api.checklistbank.org, which powers the Catalogue of Life.

  • When using name-based targets for commands like children, classify, and vernacular, add --rank to disambiguate homonyms (e.g., GENUS, SPECIES). Use --by-id to pass a usage ID directly.

Commit count: 0

cargo fmt