nppes

Crates.ionppes
lib.rsnppes
version0.0.7
created_at2025-05-24 19:49:48.781616+00
updated_at2025-06-07 11:13:41.371697+00
descriptionA Rust library for parsing, querying, and exporting NPPES healthcare provider data.
homepage
repositoryhttps://github.com/nikothomas/nppes
max_upload_size
id1687709
size352,056
Nikolas Yanek-Chrones (nikothomas)

documentation

README

NPPES Data Library

A comprehensive Rust library for working with National Plan and Provider Enumeration System (NPPES) healthcare provider data.

Overview

The NPPES dataset contains information about healthcare providers in the United States, including:

  • ~8 million healthcare provider records
  • 330+ data columns including NPI numbers, provider information, taxonomy codes
  • Entity types: Individual providers (code 1) vs Organizations (code 2)
  • Healthcare provider taxonomy codes for specialties
  • Geographic information and licensing data

Features

  • Type-safe data structures for all NPPES file formats
  • High-performance CSV parsing with parallel processing
  • Analytics and querying functionality for data analysis
  • Schema validation against official NPPES documentation
  • Support for all NPPES reference files (Other Names, Practice Locations, Endpoints)
  • Download capability to fetch the latest data from CMS

NPPES Data Files Supported

Main Data File

  • File: npidata_pfile_yyyymmdd-yyyymmdd.csv (approx. 10GB)
  • Contains: ~8M healthcare provider records with 330+ columns

Reference Files

  • Other Name Reference: Additional organization names for Type 2 NPIs
  • Practice Location Reference: Non-primary practice locations
  • Endpoint Reference: Healthcare endpoints associated with NPIs
  • Taxonomy Reference: Healthcare provider classification codes (NUCC)

Installation

Add this to your Cargo.toml:

[dependencies]
nppes = "0.0.6"

The CLI binary is called npcli.

Usage

Basic Usage

use nppes::prelude::*;
use nppes::Result;

fn main() -> Result<()> {
    // Load main NPPES data
    let reader = NppesReader::new();
    let providers = reader.load_main_data("data/npidata_pfile_20050523-20250511.csv")?;
    println!("Loaded {} providers", providers.len());

    // Load taxonomy reference data
    let taxonomy_data = reader.load_taxonomy_data("data/nucc_taxonomy_250.csv")?;
    Ok(())
}

Command Line Interface (CLI)

You can use the CLI tool npcli to download, query, and export NPPES data.

Example: Download the latest NPPES data

npcli download --out-dir ./data

Example: Show statistics for a dataset

npcli stats --data-dir ./data

Example: Query providers by state and specialty

npcli query --data-dir ./data --state CA --specialty Cardiology

Example: Export data to JSON

npcli export --data-dir ./data --output ca_cardiologists.json --format json --state CA --specialty Cardiology

Analytics and Querying

use nppes::prelude::*;
use nppes::Result;

fn main() -> Result<()> {
    let reader = NppesReader::new();
    let providers = reader.load_main_data("data/npidata_pfile_20050523-20250511.csv")?;
    let taxonomy_data = reader.load_taxonomy_data("data/nucc_taxonomy_250.csv")?;

    // Create analytics engine
    let analytics = NppesAnalytics::new(&providers)
        .with_taxonomy_reference(&taxonomy_data);

    // Get dataset statistics
    let stats = analytics.dataset_stats();
    stats.print_summary();

    // Find providers by state
    let ca_providers = analytics.find_by_state("CA");
    println!("California providers: {}", ca_providers.len());

    // Find providers by taxonomy code
    let physicians = analytics.find_by_taxonomy_code("208600000X");
    println!("Internal Medicine physicians: {}", physicians.len());

    // Complex queries with builder pattern
    let query_results = ProviderQuery::new(&analytics)
        .entity_type(EntityType::Individual)
        .state("NY")
        .active_only()
        .execute();

    println!("Active individual providers in NY: {}", query_results.len());
    Ok(())
}

Working with Individual Records

use nppes::prelude::*;
use nppes::Result;

fn main() -> Result<()> {
    let reader = NppesReader::new();
    let providers = reader.load_main_data("data/npidata_pfile_20050523-20250511.csv")?;
    let analytics = NppesAnalytics::new(&providers);
    
    // Find a specific provider by NPI
    let npi = Npi::new("1234567890".to_string())?;
    if let Some(provider) = analytics.find_by_npi(&npi) {
        println!("Provider: {}", provider.display_name());
        println!("Entity Type: {:?}", provider.entity_type);
        println!("Active: {}", provider.is_active());
        
        // Get primary taxonomy
        if let Some(primary_taxonomy) = provider.primary_taxonomy() {
            println!("Primary specialty: {}", primary_taxonomy.code);
        }
    }
    Ok(())
}

Data Enrichment

use nppes::prelude::*;
use nppes::Result;

fn main() -> Result<()> {
    let reader = NppesReader::new();
    let providers = reader.load_main_data("data/npidata_pfile_20050523-20250511.csv")?;
    let taxonomy_data = reader.load_taxonomy_data("data/nucc_taxonomy_250.csv")?;
    let analytics = NppesAnalytics::new(&providers)
        .with_taxonomy_reference(&taxonomy_data);

    // Enrich providers with human-readable taxonomy descriptions
    let enriched_providers = analytics.enrich_with_taxonomy_descriptions()?;

    for enriched in enriched_providers.iter().take(10) {
        println!("Provider: {}", enriched.provider.display_name());
        
        for taxonomy in &enriched.enriched_taxonomies {
            if let Some(display_name) = &taxonomy.display_name {
                println!("  Specialty: {}", display_name);
            }
        }
    }
    Ok(())
}

Advanced Analytics

use nppes::prelude::*;
use nppes::Result;
use chrono::NaiveDate;

fn main() -> Result<()> {
    let reader = NppesReader::new();
    let providers = reader.load_main_data("data/npidata_pfile_20050523-20250511.csv")?;
    let taxonomy_data = reader.load_taxonomy_data("data/nucc_taxonomy_250.csv")?;
    let analytics = NppesAnalytics::new(&providers)
        .with_taxonomy_reference(&taxonomy_data);

    // Get top states by provider count
    let top_states = analytics.top_states_by_provider_count(10);
    for (state, count) in top_states {
        println!("{}: {} providers", state, count);
    }

    // Get top specialties
    let top_specialties = analytics.top_taxonomy_codes_by_provider_count(10);
    for (code, count) in top_specialties {
        if let Some(taxonomy_ref) = analytics.get_taxonomy_description(&code) {
            if let Some(display_name) = &taxonomy_ref.display_name {
                println!("{}: {} providers", display_name, count);
            }
        }
    }

    // Date-based queries
    let start_date = NaiveDate::from_ymd_opt(2023, 1, 1).unwrap();
    let end_date = NaiveDate::from_ymd_opt(2023, 12, 31).unwrap();

    let new_providers = analytics.providers_enumerated_between(start_date, end_date);
    println!("Providers enumerated in 2023: {}", new_providers.len());
    Ok(())
}

Configuration Options

Reader Configuration

use nppes::prelude::*;

let reader = NppesReader::new()
    .with_header_validation(true)  // Validate CSV headers (default: true)
    .with_skip_invalid_records(false); // Skip invalid records (default: false)

Error Handling

The library uses a comprehensive error system with detailed error types. Refer to NppesError for all variants.

use nppes::prelude::*;
use nppes::Result;

fn main() -> Result<()> {
    let reader = NppesReader::new();
    match reader.load_main_data("invalid_path.csv") {
        Ok(providers) => println!("Loaded {} providers", providers.len()),
        Err(e) => match e {
            NppesError::FileNotFound { path, .. } => {
                eprintln!("File not found: {}", path.display());
            }
            NppesError::CsvParse { message, .. } => {
                eprintln!("CSV parsing error: {}", message);
            }
            NppesError::DataValidation { message, .. } => {
                eprintln!("Data validation error: {}", message);
            }
            _ => eprintln!("Other error: {}", e),
        }
    }
    Ok(())
}

Data Structures

Core Types

  • NppesRecord: Main provider record with all NPPES data
  • EntityType: Individual vs Organization provider type
  • Npi: Type-safe NPI number wrapper
  • TaxonomyCode: Healthcare specialty/taxonomy information
  • Address: Mailing and practice location addresses

Reference Types

  • TaxonomyReference: Healthcare taxonomy lookup data
  • OtherNameRecord: Additional organization names
  • PracticeLocationRecord: Secondary practice locations
  • EndpointRecord: Healthcare endpoints

Performance Considerations

  • Memory Usage: The main NPPES file is large (e.g., ~10GB with ~8M records). 16GB+ RAM is recommended for full dataset processing. Use NppesReader::estimate_memory_usage to check memory requirements before loading.
  • Parallel Processing: Parallel data loading and querying are enabled by default for maximum performance.
  • Indexing: For frequent lookups, build indexes on NPI, state, and taxonomy codes using NppesDataset::build_indexes().
  • Streaming: For memory-constrained environments, consider creating a custom streaming or chunked processing pipeline.

License

MIT License

Contributing

Contributions welcome! Please see CONTRIBUTING.md for guidelines.

Commit count: 10

cargo fmt