| Crates.io | opendatasync |
| lib.rs | opendatasync |
| version | 1.1.1 |
| created_at | 2025-12-22 20:20:56.350437+00 |
| updated_at | 2025-12-22 20:20:56.350437+00 |
| description | Pull down data from Wikidata, OpenStreetMap, and Wikimedia Commons. |
| homepage | |
| repository | |
| max_upload_size | |
| id | 2000213 |
| size | 17,049,657 |
A command-line utility for synchronizing open data from OpenStreetMap (Overpass API), Wikidata, and Wikimedia Commons. Designed for static site generation workflows that need to fetch and cache structured data from multiple open data sources.
Repository: codeberg.org/houston-open-source-society/opendatasync
This tool fetches data from open data APIs and outputs structured JSON/CSV files for consumption by static site generators or other tools. It's particularly useful for building websites that display information about places, organizations, or media from authoritative open data sources.
This tool is part of the Houston Open Source Society (HOSS) build pipeline. See the HOSS website architecture documentation for details on how opendatasync integrates with the static site generation workflow.
cargo build --release
The binary will be available at target/release/opendatasync.
After cloning the repository, install the git hooks to enable version validation:
git config core.hooksPath .githooks
This installs a pre-push hook that validates Cargo.toml version matches git tags before pushing, preventing version synchronization errors.
flowchart TB
subgraph Input
A[Batch TOML Config]
B[QID/OSM ID Files]
end
subgraph opendatasync
D[Parse Configuration]
E[Fetch Overpass Data]
F[Fetch Wikidata Entities]
G[Traverse Wikidata Properties]
H[Fetch Wikimedia Commons]
end
subgraph Output
I[overpass.json]
J[wikidata-wbgetentities.json]
K[wikicommons-categorymembers.json]
L[wikicommons-imageinfo.json]
end
A --> D
B --> D
D --> E
D --> F
D --> H
F --> G
G -->|"OSM properties (P402, P10689, P11693)"| E
G -->|"Commons category (P373)"| H
E -->|"OSM way/node/relation data"| I
F -->|"Entity labels, descriptions"| J
G -->|"Related entities"| J
H -->|"Category members"| K
H -->|"Image metadata & URLs"| L
The utility provides three main commands: wikidata, wikicommons, overpass, and batch.
These options can be used with any subcommand:
--verbose - Enable verbose logging--output-dir <DIR> - Output directory for saving results to files--cache-dir <DIR> - Cache directory for storing API responses (default: ./cache)Fetch Wikidata entities with optional property resolution and traversal.
opendatasync wikidata wbgetentities --qid Q30 --qid Q1342
Options:
-q, --qid <QID> - Wikidata QID(s) to fetch (can be specified multiple times)--ids-from-file <FILE> - Read QIDs from a file (one per line, # for comments)-f, --format <FORMAT> - Output format: csv or json (default: csv)--resolve-headers <MODE> - Resolve property IDs to names in headers: none, all, select (default: all)--select-header <PID> - Specific header properties to resolve (required with --resolve-headers=select)--resolve-data <MODE> - Resolve data property values to names: none, all, select (default: all)--select-data <PID> - Specific data properties to resolve (required with --resolve-data=select)--keep-qids - Keep QID columns alongside resolved names (default: true)--keep-filename - Keep filename columns for Commons media (default: false)--only-values - In JSON output, prune non-value fields (default: true)--preserve-qualifiers-for <PID> - When --only-values is true, preserve qualifiers for these specific properties (e.g., P6375, P969). Properties in this list will always use object format {"value": ..., "qualifiers": {...}} for consistent parsing, even when qualifiers are empty--traverse-properties <PID> - Recursively follow and fetch entities referenced by these properties--traverse-osm - Shortcut to traverse OSM properties (P402, P10689, P11693) (default: true)--traverse-commons - Traverse Commons category (P373) and fetch members (default: true)--traverse-commons-depth <DEPTH> - Depth: category (members only) or page (members + imageinfo) (default: page)Examples:
# Fetch entity with full resolution (default)
opendatasync wikidata wbgetentities --qid Q30
# Fetch multiple entities from file
opendatasync wikidata wbgetentities --ids-from-file wikidata_qids.txt --format json
# Selective property resolution
opendatasync wikidata wbgetentities --qid Q84 \
--resolve-headers select --select-header P31 --select-header P17 \
--resolve-data select --select-data Q6256 --select-data Q515
# Traverse specific Wikidata properties
opendatasync wikidata wbgetentities --qid Q30 \
--traverse-properties P527 --traverse-properties P361
# Preserve qualifiers for specific properties in JSON output
opendatasync wikidata wbgetentities --qid Q30 \
--format json --only-values \
--preserve-qualifiers-for P6375 --preserve-qualifiers-for P969
Fetch claims (statements) for a specific Wikidata entity and property.
opendatasync wikidata wbgetclaims --entity Q30 --property P36
Options:
-e, --entity <QID> - Entity QID-p, --property <PID> - Property ID-f, --format <FORMAT> - Output format: csv or json (default: csv)Fetch members of Wikimedia Commons categories.
opendatasync wikicommons categorymembers --category "Category:Houston, Texas"
Options:
-c, --category <NAME> - Category name(s) (can be specified multiple times)--ids-from-file <FILE> - Read categories from file (one per line)--traverse-pageid - Fetch detailed imageinfo for each page (default: false)--recurse-subcategory-pattern <REGEX> - Recursively fetch subcategories matching pattern-f, --format <FORMAT> - Output format: csv or json (default: csv)Examples:
# Fetch category members
opendatasync wikicommons categorymembers --category "Category:Houston, Texas"
# Recursively fetch subcategories
opendatasync wikicommons categorymembers \
--category "Bayland Community Center, Houston" \
--recurse-subcategory-pattern "Houston Open Source Society" \
--traverse-pageid
Fetch detailed image information for specific page IDs.
opendatasync wikicommons imageinfo --pageid 12345 --pageid 67890
Options:
-p, --pageid <ID> - Page ID(s) (can be specified multiple times)--ids-from-file <FILE> - Read page IDs from file (one per line)-f, --format <FORMAT> - Output format: csv or json (default: csv)Fetch OpenStreetMap elements by ID via Overpass API.
opendatasync overpass query --way 56065095 --node 123456789
Options:
--node <ID> - OSM node ID(s)--way <ID> - OSM way ID(s)--relation <ID> - OSM relation ID(s)--ids-from-file <FILE> - Read OSM IDs from file (format: node/123, way/456, relation/789)--bbox - Bounding box filter: south,west,north,east--timeout <SECONDS> - Query timeout (default: 25)-f, --format <FORMAT> - Output format: csv or json (default: json)Examples:
# Fetch mixed element types
opendatasync overpass query --way 56065095 --node 123456789 --relation 456789123
# Fetch with bounding box
opendatasync overpass query --way 111222333 --bbox 40.0,-80.0,40.5,-79.5
# Load from file
opendatasync overpass query --ids-from-file osm_ids.txt
Process multiple data sources from a single batch file.
opendatasync batch --batch-file batch.toml
Batch File Format:
The batch file is in TOML format and can contain multiple [[wikidata]], [[wikicommons]], and [[overpass]] entries.
Example batch.toml:
# Wikidata entities
[[wikidata]]
qids = ["Q30", "Q1342", "Q16"]
resolve_headers = "all"
resolve_data = "all"
traverse_osm = true
only_values = true
preserve_qualifiers_for = ["P6375", "P969"]
[[wikidata]]
ids_from_file = "./wikidata_qids.txt"
# Wikimedia Commons
[[wikicommons]]
[wikicommons.categorymembers]
categories = ["Category:Houston, Texas", "Category:Pittsburgh"]
traverse_pageid = true
[[wikicommons]]
[wikicommons.categorymembers]
categories = ["Bayland Community Center, Houston"]
recurse_subcategory_pattern = "Houston Open Source Society"
traverse_pageid = true
# OpenStreetMap data
[[overpass]]
nodes = [123456789, 987654321]
ways = [56065095, 12345678]
relations = [456789123]
timeout = 25
[[overpass]]
ids_from_file = "./osm_ids.txt"
See example_batch.toml for a complete example.
Running batch mode:
# Process batch file with caching
opendatasync --cache-dir ./cache --output-dir ./data batch --batch-file batch.toml
# With verbose logging
opendatasync --verbose --cache-dir ./cache --output-dir ./data batch --batch-file batch.toml
Batch mode outputs separate JSON files for each data source type:
wikidata-wbgetentities.jsonwikidata-wbgetclaims.jsonwikicommons-categorymembers.jsonwikicommons-imageinfo.jsonoverpass.jsonThis tool is also available as a Podman image for use in CI/CD pipelines and containerized environments:
# Pull the latest image
podman pull codeberg.org/houston-open-source-society/opendatasync:latest
# Run with batch file
podman run --rm -v $(pwd):/data \
codeberg.org/houston-open-source-society/opendatasync:latest \
--cache-dir /data/cache --output-dir /data/output \
batch --batch-file /data/batch.toml
# Run single command
podman run --rm \
codeberg.org/houston-open-source-society/opendatasync:latest \
wikidata wbgetentities --qid Q30 --format json
The Podman image is automatically built and published via Forgejo Actions on every tagged release.
This tool is used in the HOSS website build pipeline to fetch open data about venues, organizations, and media. The workflow is:
gather.sh script scans markdown files for Wikidata QIDs and OSM IDsopendatasync processes the batch file and fetches data from APIsload_data(), making it available to templatesFor detailed information about the complete data pipeline and integration examples, see the HOSS Architecture Documentation.
clap - Command-line argument parsing with derive macrosthiserror - Ergonomic error handlingreqwest / reqwest-middleware / reqwest-tracing - HTTP client with middleware supportserde / serde_json - JSON serializationcsv - CSV output formattingtoml - Batch configuration file parsingtokio - Async runtimetracing / tracing-subscriber - Structured loggingThis project uses Rust 2024 edition. To build:
cargo build --release
The release profile is optimized for size with LTO, single codegen unit, and symbol stripping enabled.
This project maintains version synchronization across three systems:
To create a new release:
Update the version in Cargo.toml:
[package]
version = "0.2.0"
Commit the version change:
git add Cargo.toml
git commit -m "chore: bump version to 0.2.0"
Create a matching git tag:
git tag v0.2.0
Push both commit and tag:
git push && git push --tags
When you push a tag:
Cargo.toml version matches the tag.forgejo/workflows/ci.yml) is triggered and:
opendatasync --version shows the correct versioncodeberg.org/houston-open-source-society/opendatasync:0.2.0 (version-specific)codeberg.org/houston-open-source-society/opendatasync:latest (latest release)The project uses a layered defense approach:
.githooks/pre-push): Catches version mismatches before pushing tagsIf validation fails at any stage, the release process stops and no artifacts are published.
After a successful release, you'll have:
v0.2.0opendatasync 0.2.0codeberg.org/houston-open-source-society/opendatasync:0.2.0codeberg.org/houston-open-source-society/opendatasync:latestopendatasync and rrule-calc in its build pipelineCopyright © 2025 Evan Carroll
This software is licensed under the Anti-Capitalist Software License (v 1.4).
See LICENSE.md for full license text, or visit https://anticapitalist.software/