| Crates.io | midas_fetcher |
| lib.rs | midas_fetcher |
| version | 0.1.4 |
| created_at | 2025-07-06 09:05:41.881646+00 |
| updated_at | 2025-07-10 05:10:40.112744+00 |
| description | High-performance concurrent downloader for UK Met Office MIDAS Open weather data with intelligent caching and resumable downloads |
| homepage | https://github.com/rjl-climate/midas_fetcher |
| repository | https://github.com/rjl-climate/midas_fetcher |
| max_upload_size | |
| id | 1740006 |
| size | 1,275,130 |
High-performance concurrent downloader for UK Met Office MIDAS Open weather data
A command-line tool and Rust library designed to efficiently download large volumes of historical weather data from the UK Met Office MIDAS Open Archive. Built for climate researchers and data scientists who need reliable, fast, and resumable downloads while respecting CEDA's infrastructure.
NOTE There are two companion apps that build on this tool.
Midas Processor: A rust app to convert the MIDAS dataset downloaded by this tool into a .parquet file for efficient downstream processing.
Midas Analyser A python toolkit for analysing a MIDAS dataset
MIDAS Open is a comprehensive collection of meteorological observation datasets released annually by the UK Met Office under the Open Government Licence. The dataset is hosted by the Centre for Environmental Data Analysis (CEDA) and contains:
The data are structured in paths like:
ukmo-midas-open/data/<dataset>/<release-version>/<historic-county>/<site>/<qc-version>/files
CEDA currently provides no specialized tools for bulk downloading MIDAS Open data. Climate researchers and data scientists face significant challenges:
MIDAS Fetcher solves these problems through intelligent automation and sophisticated technical architecture:
You need a free CEDA account to download MIDAS Open data:
.env files with restricted permissions (Unix: 600)Rust Toolchain (1.80+ with 2024 edition support):
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
source ~/.cargo/env
rustup default stable
git clone https://github.com/rjl-climate/midas_fetcher.git
cd midas_fetcher
cargo build --release
The binary will be available at target/release/midas_fetcher.
Alternative: Add to PATH
cargo install --path .
Future: Pre-built binaries will be available on GitHub releases
midas_fetcher auth setup
# Follow interactive prompts to securely store CEDA credentials
# Verify authentication works
midas_fetcher auth verify
# Download latest file manifest from CEDA
midas_fetcher manifest update
# Check manifest information
midas_fetcher manifest info
# Interactive dataset selection
midas_fetcher download
# Download specific dataset (includes all file types: data, capability, metadata, and station logs)
midas_fetcher download --dataset uk-daily-temperature-obs
# Download with filters
midas_fetcher download --dataset uk-daily-temperature-obs --county devon --limit 100
# Dry run to see what would be downloaded
midas_fetcher download --dataset uk-daily-temperature-obs --dry-run
# Verify cache integrity by checking file hashes against manifest
midas_fetcher cache verify
# Verify specific dataset only
midas_fetcher cache verify --dataset uk-daily-temperature-obs
# Check cache information
midas_fetcher cache info
MIDAS Fetcher uses a unified configuration system that automatically creates sensible defaults while allowing customization for specific needs.
The configuration file is automatically created on first run at:
macOS/Linux:
~/.config/midas-fetcher/config.toml
Windows:
%APPDATA%\midas-fetcher\config.toml
These are the main settings you might want to adjust:
| Setting | Default | Purpose | Safety Level |
|---|---|---|---|
rate_limit_rps |
15 | CEDA server request rate | ⚠️ Critical |
worker_count |
8 | Download concurrency | ⚠️ Performance |
cache_root |
Auto | Custom cache location | ✅ Safe |
request_timeout_secs |
60 | Download timeout | ✅ Safe |
connect_timeout_secs |
30 | Connection timeout | ✅ Safe |
[client]
rate_limit_rps = 15 # Total requests per second across ALL workers
⚠️ IMPORTANT: This controls how fast you hit CEDA's servers. The 15 RPS default is shared across all workers and is respectful to CEDA infrastructure. Don't increase this unless you have explicit permission from CEDA. Too aggressive settings can result in IP blocking.
[coordinator]
worker_count = 8 # Number of concurrent download workers
⚠️ Performance Impact: More workers = faster downloads, but diminishing returns beyond 8-12 workers. The rate limit (15 RPS) is shared across all workers, so adding workers won't exceed the server politeness limits.
[cache]
cache_root = "/custom/path/to/cache" # Uncomment and modify to use custom location
By default, cache uses the same unified directory as the config file:
~/.config/midas-fetcher/cache/%APPDATA%\midas-fetcher\cache\[client]
request_timeout_secs = 60 # How long to wait for downloads
connect_timeout_secs = 30 # How long to wait for connections
Increase these if you have a slow connection or are downloading large files.
⚠️ WARNING: The configuration file contains many advanced settings for HTTP connections, retry logic, queue management, and progress reporting. Do not modify these unless you understand their implications. Incorrect settings can cause:
- Download failures
- Server overload (potentially resulting in IP blocks)
- Performance degradation
- Cache corruption
Advanced settings you should NOT modify without expertise:
http2, tcp_nodelay, pool_*)max_retries, retry_*)work_timeout_secs)# See where your config file is located
midas_fetcher cache info
# Edit the configuration file
nano ~/.config/midas-fetcher/config.toml
# or on macOS:
open ~/.config/midas-fetcher/config.toml
# Remove the config file to regenerate defaults
rm ~/.config/midas-fetcher/config.toml
# Next run will recreate with default settings
midas_fetcher auth status
Settings are applied in this order (later overrides earlier):
~/.config/midas-fetcher/config.toml)--workers, --cache-dir, etc.)[client]
rate_limit_rps = 5 # More conservative for shared networks
[coordinator]
worker_count = 4 # Fewer workers for limited bandwidth
[client]
request_timeout_secs = 120 # Longer timeout for slow connections
[client]
rate_limit_rps = 15 # Default (don't increase without CEDA permission)
[coordinator]
worker_count = 12 # More workers for fast connections
[coordinator.worker]
download_timeout_secs = 300 # Shorter timeout for fast networks
💡 Tip: Test any configuration changes with
--limit 10first to ensure they work before doing large downloads.
# Basic download (includes all file types: data, capability, metadata, and station logs)
midas_fetcher download --dataset <dataset-name>
# With filtering
midas_fetcher download \
--dataset uk-daily-temperature-obs \
--county devon \
--quality-0 \ # Use QC version 0 (default is version 1)
--limit 1000
# Performance tuning
midas_fetcher download \
--dataset uk-daily-temperature-obs \
--workers 8 \
--force # Restart incomplete downloads
Each dataset download includes all available file types:
qcv-1/county/station/capability/county/station/station-metadata/station-log-files/cache/uk-daily-temperature-obs/
├── qcv-1/ # Quality-controlled data files
│ ├── devon/
│ │ └── 01381_twist/
│ └── ...
├── capability/ # Station capability files
│ ├── devon/
│ │ └── 01381_twist/
│ └── ...
├── station-metadata/ # Master station metadata
│ └── uk-daily-temperature-obs_station-metadata.csv
├── station-log-files/ # Individual station change logs
│ ├── station_log_01381_twist_2020.txt
│ └── ...
└── change_log.txt # Dataset-level change log
midas_fetcher auth setup # Interactive credential setup
midas_fetcher auth verify # Test authentication
midas_fetcher auth status # Show current status
midas_fetcher auth clear # Remove stored credentials
midas_fetcher manifest update # Download latest manifest
midas_fetcher manifest check # Check for updates
midas_fetcher manifest info # Show manifest statistics
midas_fetcher manifest list # List available datasets
midas_fetcher manifest list --datasets-only # Just dataset names
midas_fetcher cache verify # Verify cache integrity by checking file hashes
midas_fetcher cache verify --dataset <name> # Verify specific dataset only
midas_fetcher cache info # Cache statistics, location, and file counts
midas_fetcher cache clean # Remove temporary and failed files
--verbose # Detailed progress information
--quiet # Suppress non-essential output
--config FILE # Use custom configuration file
--cache-dir DIR # Use custom cache directory
MIDAS Fetcher uses a concurrent architecture designed for efficiency, reliability, and respectful server interaction:
The fundamental challenge is coordinating multiple workers accessing shared filesystem state. MIDAS Fetcher treats the filesystem as a distributed system requiring explicit coordination:
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Worker 1 │ │ Worker 2 │ │ Worker 3 │
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
│ │ │
└─────────────┬────┴─────────────────┘
│
┌──────▼──────┐
│ Work Queue │
│ + Reservoir │
└─────────────┘
The cache system ensures data integrity through multiple layers:
dataset/quality/county/station structure matches CEDAThe work-stealing queue prevents worker starvation and enables linear scaling:
// Simplified algorithm:
loop {
if let Some(work) = queue.steal_work() {
if cache.try_reserve(work.hash) {
download_and_save(work).await;
cache.mark_completed(work.hash);
}
// If reservation fails, immediately try next file
} else {
sleep_briefly().await;
}
}
Benefits:
The HTTP client implements multiple layers of protection for CEDA's infrastructure:
# For fast connections and powerful machines
midas_fetcher download --workers 12 --dataset uk-daily-temperature-obs
# For shared or limited connections
midas_fetcher download --workers 4 --dataset uk-daily-temperature-obs
# For testing or development
midas_fetcher download --workers 2 --limit 10 --dataset uk-daily-temperature-obs
Contributions are welcome! This tool aims to serve the UK climate research community and can benefit from diverse perspectives and use cases.
git clone https://github.com/rjl-climate/midas_fetcher.git
cd midas_fetcher
# Run all tests
cargo test --all
# Check code quality
cargo clippy --all -- -D warnings
cargo fmt --all
# Test CLI functionality
cargo run -- --help
cargo run -- auth setup
Please use GitHub Issues with:
This tool exists thanks to the Centre for Environmental Data Analysis (CEDA) and the UK Met Office for:
Built with excellent crates from the Rust ecosystem:
This tool is actively developed to meet real research needs. If you:
Please open an issue or discussion! The goal is maximum utility for the climate research community.
This project is licensed under:
You may choose either license for your use.
Core functionality: Concurrent downloads with work-stealing queue
CEDA authentication: Secure credential management with session handling
Cache management: Atomic operations with MD5 verification
CLI interface: Complete command-line tool with progress monitoring
Status: Beta Maintainer: Richard Lyon richlyon@fastmail.com First Release: 2025 Latest Update: July 2025 (v0.1.3)