popgetter-cli

Crates.iopopgetter-cli
lib.rspopgetter-cli
version0.2.2
created_at2025-01-10 21:31:48.992895+00
updated_at2025-04-22 14:52:31.950577+00
descriptionCLI for popgetter
homepage
repository
max_upload_size
id1511768
size209,534
publishers (github:urban-analytics-technology-platform:publishers)

documentation

README

popgetter-cli

Library and associated command-line application for exploring and fetching popgetter data.

Quickstart

  • Install Rust
  • Install CLI:
    cargo install popgetter-cli
    
  • Run the CLI with e.g.:
    popgetter --help
    

Examples

List countries with countries subcommand

Get a list of available data:

popgetter countries

Searching metadata with metrics subcommand

Summarising and specific metadata fields

Get a summary of all data:

popgetter metrics --summary

Get a summary of data for a given country:

popgetter metrics --summary --country "united states"

Get the list of metadata fields:

popgetter metrics --display-metadata-columns

Get a list of geometry levels for a given country:

popgetter metrics --country "united states" \
  --unique geometry_level

Searching metrics

An example search using a regex for search text combined with a given country and geometry level:

popgetter metrics \
  --text " car[^a-z] | cars " \
  --country "northern ireland" \
  --geometry-level sdz21

Downloading data

An example search using a regex for search text combined with a given country and geometry level:

popgetter data
  --id 38757cf9 \
  --output-file popgetter.geojson \
  --output-format geojson
  --dev

where the --dev flag is used here to enable output with CRS transformed to EPSG:4326 since all data is provided here in EPSG:4326.

Downloading data with recipes

Recipe files provide an alternative to using the command line flags. An example recipe can be downloaded with:

popgetter recipe test_recipe.json \
  --output-format csv --output-file popgetter.csv

LLM integration (experimental)

It is possible to also search and generate data requests supported by LLMs.

The below steps are required for this experimental functionality implemented in the popgetter-llm crate.

  • Install with llm feature:
cargo install popgetter-cli --features llm
  • Set-up two Azure LLM endpoints for:
    • Text embeddings (text-embedding-3-small)
    • Text generation (gpt-4o)
  • Assign the API key for the two endpoints to the following environment variable, with e.g.:
export AZURE_OPEN_AI_KEY="REPLACE_WITH_API_KEY"

Note: currently only Azure endpoints are supported.

  • Install and run Docker

  • Initialize the Qdrant database:

    cd ../popgetter-llm/
    docker compose up
    
  • Construct the database with embeddings derived from metadata using the popgetter CLI:

popgetter llm init

This process will take several hours to run and will construct the Qdrant database for all the metadata (around 3GB total size).

  • With the database populated, search queries can be performed using the embeddings to:

    • Return search results based on embedding similarity
    • Generate a data request specifications directly from the query
  • For search results based on embedding similarity, e.g.:

popgetter llm query \
  "cars and household size" \
  --limit 10 \
  --output-format SearchResults \
  --country "United States"
  • With output-format set to --output-format SearchResultsToRecipe, the metric IDs from the search results are included in a recipe:
popgetter llm query \
  "cars and household size" \
  --limit 10 \
  --output-format SearchResultsToRecipe \
  --country "United States"
  • With output-format set to --output-format DataRequestSpec, the data request specification is produced directly from the search results through a second prompt:
RUST_LOG=info popgetter llm query \
  "cars and household size" \
  --limit 10 \
  --output-format DataRequestSpec \
  --country "United States"

Note: This output format is highly experimental and may produce incorrect data request specifications.

Commit count: 0

cargo fmt