| Crates.io | ushcn |
| lib.rs | ushcn |
| version | 0.2.5 |
| created_at | 2024-07-17 18:15:11.443987+00 |
| updated_at | 2025-06-29 13:38:33.293306+00 |
| description | US Historical Climatology Network data downloader |
| homepage | |
| repository | https://github.com/rjl-climate/US-Historical-Climate-Network-downloader |
| max_upload_size | |
| id | 1306416 |
| size | 272,791 |
NOAA maintains datasets of daily and monthly climate data for the US from 1875 to present. Data includes maximum and minimum temperatures, precipitation, and other climate variables from 1,200+ weather stations with complete geographic coordinates.
This tool downloads and processes two complementary datasets:
The data is distributed as fixed-width text files that require processing for analysis:
USC00011084192601TMAX-9999 -9999 -9999 -9999 -9999 -9999 ...
USC00011084192602TMIN 33 6 22 6 67 6 0 6 11 6 17 ...
USC00011084192602PRCP 0 6 381 6 0 6 0 6 0 6 0 ...
...
This repository provides a Rust binary that downloads the daily and monthly datasets, processes them, injects lat/lon coordinate data, and saves them as optimized Apache Parquet files:
# Daily data (long format with all measurements)
id date element value lat lon
0 USC00324418 1898-06-14 TMAX 12.3 34.428 -86.2467
1 USC00324418 1898-06-14 TMIN 6.7 34.428 -86.2467
2 USC00324418 1898-06-14 PRCP 1.2 34.428 -86.2467
3 USC00324418 1898-06-15 TMAX 14.4 34.428 -86.2467
...
# Monthly data (separate files by quality level)
# ushcn-monthly-raw-2025-06-27.parquet (original data)
# ushcn-monthly-tob-2025-06-27.parquet (time-adjusted)
# ushcn-monthly-fls52-2025-06-27.parquet (fully corrected)
The binary downloads and processes all data types automatically:
# Download to temporary directory (default)
> ushcn
Downloading and processing US Historical Climate Network data...
Downloading USHCN stations data...
USHCN Stations: /Users/richardlyon/ushcn-stations-2025-06-27.parquet
Downloading GHCN stations data...
GHCN Stations: /Users/richardlyon/ghcnd-stations-2025-06-27.parquet
Processing daily data...
✓ Created daily parquet file with 1,268,938 readings
Daily: Created 1 daily file: /Users/richardlyon/ushcn-daily-2025-06-27.parquet
Processing monthly data...
✓ Created RAW monthly parquet file with 443,135 readings
✓ Created TOB monthly parquet file with 443,117 readings
✓ Created FLS52 monthly parquet file with 491,028 readings
Monthly: Created 3 monthly dataset files: ushcn-monthly-raw-2025-06-27.parquet, ushcn-monthly-tob-2025-06-27.parquet, ushcn-monthly-fls52-2025-06-27.parquet
# Use persistent cache (faster on subsequent runs)
> ushcn --cache
The tool generates multiple parquet files optimized for analysis with complete coordinate data:
ushcn-daily-{date}.parquet - Long format with one row per measurement (~37M rows with 100% lat/lon coverage)ushcn-monthly-{dataset}-{date}.parquet - Separate files for raw, time-adjusted, and fully corrected data (~5M rows each with 100% lat/lon coverage)ushcn-stations-{date}.parquet - USHCN station coordinates (1,218 stations)ghcnd-stations-{date}.parquet - GHCN station coordinates (129,000+ stations)The optimized parquet files work seamlessly with pandas and other Python data analysis tools. For a comprehensive example of its use, see the author's software package Urban Heat Island Contamination in USHCN Temperature Records.
Simple examples:
import pandas as pd
import matplotlib.pyplot as plt
# Load daily data (long format with coordinates)
daily_df = pd.read_parquet("ushcn-daily-2025-06-27.parquet")
daily_df['date'] = pd.to_datetime(daily_df['date'])
print(f"Daily data: {len(daily_df):,} rows with {daily_df['lat'].notna().sum():,} coordinate pairs")
# Output: Daily data: 37,874,655 rows with 37,874,655 coordinate pairs
# Filter for temperature data and plot
tmax_data = daily_df[daily_df['element'] == 'TMAX'].set_index('date')
tmax_monthly = tmax_data.groupby(pd.Grouper(freq='M'))['value'].mean()
tmax_monthly.plot(title="Average Monthly Maximum Temperature")
# Compare raw vs. corrected monthly data (both with full coordinates)
raw_monthly = pd.read_parquet("ushcn-monthly-raw-2025-06-27.parquet")
corrected_monthly = pd.read_parquet("ushcn-monthly-fls52-2025-06-27.parquet")
print(f"Monthly coverage: {raw_monthly['lat'].notna().sum() / len(raw_monthly) * 100:.1f}%")
# Output: Monthly coverage: 100.0%
# Geospatial analysis with complete coordinate data
import geopandas as gpd
station_coords = daily_df[['id', 'lat', 'lon']].drop_duplicates()
gdf = gpd.GeoDataFrame(station_coords,
geometry=gpd.points_from_xy(station_coords.lon, station_coords.lat))
