pdbrust

Crates.io	pdbrust
lib.rs	pdbrust
version	0.6.0
created_at	2025-12-18 22:48:47.152776+00
updated_at	2026-01-21 13:40:49.47035+00
description	A comprehensive Rust library for parsing and analyzing Protein Data Bank (PDB) files
homepage
repository	https://github.com/hfooladi/pdbrust
max_upload_size
id	1993733
size	3,926,681

Hosein Fooladi (HFooladi)

documentation

https://docs.rs/pdbrust

README

PDBRust

A fast Rust library for parsing and analyzing PDB and mmCIF protein structure files. Also available as a Python package with 40-260x speedups over pure Python implementations.

Installation

Python

pip install pdbrust

Rust

[dependencies]
pdbrust = "0.6"

With optional features:

[dependencies]
pdbrust = { version = "0.5", features = ["filter", "descriptors", "rcsb", "gzip"] }

Quick Start

Python

import pdbrust

# Parse a PDB file
structure = pdbrust.parse_pdb_file("protein.pdb")
print(f"Atoms: {structure.num_atoms}")
print(f"Chains: {structure.get_chain_ids()}")

# Filter and analyze
cleaned = structure.remove_ligands().keep_only_chain("A")
rg = cleaned.radius_of_gyration()
print(f"Radius of gyration: {rg:.2f} Å")

# Get coordinates as numpy arrays (fast!)
import numpy as np
coords = structure.get_coords_array()      # Shape: (N, 3)
ca_coords = structure.get_ca_coords_array() # Shape: (CA, 3)

# Download from RCSB PDB
from pdbrust import download_structure, FileFormat
structure = download_structure("1UBQ", FileFormat.pdb())

Rust

use pdbrust::{parse_pdb_file, PdbStructure};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let structure = parse_pdb_file("protein.pdb")?;

    println!("Atoms: {}", structure.atoms.len());
    println!("Chains: {:?}", structure.get_chain_ids());

    Ok(())
}

Features

Feature	Description
`filter`	Filter atoms, extract chains, remove ligands, clean structures
`descriptors`	Radius of gyration, amino acid composition, geometric metrics
`quality`	Structure quality assessment (altlocs, missing residues, etc.)
`summary`	Combined quality + descriptors in one call
`geometry`	RMSD calculation, structure alignment (Kabsch), per-residue RMSD
`dssp`	DSSP 4-like secondary structure assignment (H, G, I, P, E, B, T, S, C)
`rcsb`	Search and download structures from RCSB PDB
`rcsb-async`	Async/concurrent bulk downloads with rate limiting
`gzip`	Parse gzip-compressed files (.ent.gz, .pdb.gz, .cif.gz)
`parallel`	Parallel processing with Rayon

Examples

Filter and Clean Structures

use pdbrust::parse_pdb_file;

let structure = parse_pdb_file("protein.pdb")?;

// Extract CA coordinates
let ca_coords = structure.get_ca_coords(None);

// Chain operations with fluent API
let chain_a = structure
    .remove_ligands()
    .keep_only_chain("A")
    .keep_only_ca();

Compute Structural Descriptors

let structure = parse_pdb_file("protein.pdb")?;

let rg = structure.radius_of_gyration();
let max_dist = structure.max_ca_distance();
let composition = structure.aa_composition();

// Or get everything at once
let descriptors = structure.structure_descriptors();

Parse Gzip-Compressed Files

use pdbrust::parse_gzip_pdb_file;

// Parse gzip-compressed PDB files from the PDB archive
let structure = parse_gzip_pdb_file("pdb1ubq.ent.gz")?;
println!("Atoms: {}", structure.atoms.len());

Geometry: RMSD and Alignment

use pdbrust::{parse_pdb_file, geometry::AtomSelection};

let structure1 = parse_pdb_file("model1.pdb")?;
let structure2 = parse_pdb_file("model2.pdb")?;

// Calculate RMSD (without alignment)
let rmsd = structure1.rmsd_to(&structure2)?;
println!("RMSD: {:.3} Å", rmsd);

// Align structures using Kabsch algorithm
let (aligned, result) = structure1.align_to(&structure2)?;
println!("Alignment RMSD: {:.3} Å ({} atoms)", result.rmsd, result.num_atoms);

// Per-residue RMSD for flexibility analysis
let per_res = structure1.per_residue_rmsd_to(&structure2)?;
for r in per_res.iter().filter(|r| r.rmsd > 2.0) {
    println!("Flexible: {}{} {:.2} Å", r.chain_id, r.residue_seq, r.rmsd);
}

// Different atom selections
let rmsd_bb = structure1.rmsd_to_with_selection(&structure2, AtomSelection::Backbone)?;
let rmsd_all = structure1.rmsd_to_with_selection(&structure2, AtomSelection::AllAtoms)?;

Download from RCSB PDB

use pdbrust::rcsb::{download_structure, rcsb_search, SearchQuery, FileFormat};

// Download a structure
let structure = download_structure("1UBQ", FileFormat::Pdb)?;

// Search RCSB
let query = SearchQuery::new()
    .with_text("kinase")
    .with_organism("Homo sapiens")
    .with_resolution_max(2.0);

let results = rcsb_search(&query, 10)?;

Bulk Downloads with Async

use pdbrust::rcsb::{download_multiple_async, AsyncDownloadOptions, FileFormat};

#[tokio::main]
async fn main() {
    let pdb_ids = vec!["1UBQ", "8HM2", "4INS", "1HHB", "2MBP"];

    // Download with default options (5 concurrent, 100ms rate limit)
    let results = download_multiple_async(&pdb_ids, FileFormat::Pdb, None).await;

    // Or with custom options
    let options = AsyncDownloadOptions::default()
        .with_max_concurrent(10)
        .with_rate_limit_ms(50);
    let results = download_multiple_async(&pdb_ids, FileFormat::Cif, Some(options)).await;

    for (pdb_id, result) in results {
        match result {
            Ok(structure) => println!("{}: {} atoms", pdb_id, structure.atoms.len()),
            Err(e) => eprintln!("{}: {}", pdb_id, e),
        }
    }
}

Python:

from pdbrust import download_multiple, AsyncDownloadOptions, FileFormat

# Download multiple structures concurrently
results = download_multiple(["1UBQ", "8HM2", "4INS"], FileFormat.pdb())

# With custom options
options = AsyncDownloadOptions(max_concurrent=10, rate_limit_ms=50)
results = download_multiple(pdb_ids, FileFormat.cif(), options)

for r in results:
    if r.success:
        print(f"{r.pdb_id}: {len(r.get_structure().atoms)} atoms")
    else:
        print(f"{r.pdb_id}: {r.error}")

Secondary Structure Assignment (DSSP)

use pdbrust::parse_pdb_file;

let structure = parse_pdb_file("protein.pdb")?;

// Compute DSSP-like secondary structure
let ss = structure.assign_secondary_structure();
println!("Helix: {:.1}%", ss.helix_fraction * 100.0);
println!("Sheet: {:.1}%", ss.sheet_fraction * 100.0);
println!("Coil:  {:.1}%", ss.coil_fraction * 100.0);

// Get as compact string (e.g., "HHHHEEEECCCC")
let ss_string = structure.secondary_structure_string();

// Get composition tuple
let (helix, sheet, coil) = structure.secondary_structure_composition();

Python:

import pdbrust

structure = pdbrust.parse_pdb_file("protein.pdb")

# Get secondary structure assignment
ss = structure.assign_secondary_structure()
print(f"Helix: {ss.helix_fraction*100:.1f}%")
print(f"Sheet: {ss.sheet_fraction*100:.1f}%")

# Compact string representation
ss_string = structure.secondary_structure_string()
print(f"SS: {ss_string}")  # e.g., "CCCCHHHHHHHCCEEEEEECCC"

# Iterate over residue assignments
for res in ss:
    print(f"{res.chain_id}{res.residue_seq}: {res.ss.code()}")

B-factor Analysis

use pdbrust::parse_pdb_file;

let structure = parse_pdb_file("protein.pdb")?;

// B-factor statistics
let mean_b = structure.b_factor_mean();
let mean_b_ca = structure.b_factor_mean_ca();
let std_b = structure.b_factor_std();
println!("Mean B-factor: {:.2} Å²", mean_b);
println!("Std deviation: {:.2} Å²", std_b);

// Per-residue B-factor profile
let profile = structure.b_factor_profile();
for res in &profile {
    println!("{}{}: mean={:.2}, max={:.2}",
        res.chain_id, res.residue_seq, res.mean, res.max);
}

// Identify flexible/rigid regions
let flexible = structure.flexible_residues(50.0);  // B > 50 Å²
let rigid = structure.rigid_residues(15.0);        // B < 15 Å²

// Normalize for cross-structure comparison
let normalized = structure.normalize_b_factors();

Python:

import pdbrust

structure = pdbrust.parse_pdb_file("protein.pdb")

# B-factor statistics
print(f"Mean B-factor: {structure.b_factor_mean():.2f} Å²")
print(f"CA mean: {structure.b_factor_mean_ca():.2f} Å²")

# Per-residue profile
profile = structure.b_factor_profile()
for res in profile:
    print(f"{res.chain_id}{res.residue_seq}: {res.mean:.2f} Å²")

# Flexible regions
flexible = structure.flexible_residues(50.0)
print(f"Found {len(flexible)} flexible residues")

Selection Language (PyMOL/VMD-style)

use pdbrust::parse_pdb_file;

let structure = parse_pdb_file("protein.pdb")?;

// Basic selections
let chain_a = structure.select("chain A")?;
let ca_atoms = structure.select("name CA")?;
let backbone = structure.select("backbone")?;

// Residue selections
let res_range = structure.select("resid 1:100")?;
let alanines = structure.select("resname ALA")?;

// Boolean operators
let chain_a_ca = structure.select("chain A and name CA")?;
let heavy_atoms = structure.select("protein and not hydrogen")?;
let complex = structure.select("(chain A or chain B) and backbone")?;

// Numeric comparisons
let low_bfactor = structure.select("bfactor < 30.0")?;
let high_occ = structure.select("occupancy >= 0.5")?;

// Validate without executing
pdbrust::PdbStructure::validate_selection("chain A and name CA")?;

Python:

import pdbrust

structure = pdbrust.parse_pdb_file("protein.pdb")

# Select atoms using familiar syntax
chain_a = structure.select("chain A")
ca_atoms = structure.select("name CA")
backbone = structure.select("backbone and not hydrogen")

# Complex selections
active_site = structure.select("resid 50:60 and chain A")
flexible = structure.select("chain A and bfactor > 40.0")

Common Workflows

See the examples/ directory for complete working code:

Workflow	Example	Features Used
Load, clean, analyze, export	analysis_workflow.rs	filter, descriptors, quality, summary
Filter and clean structures	filtering_demo.rs	filter
Selection language	selection_demo.rs	filter
B-factor analysis	b_factor_demo.rs	descriptors
Secondary structure (DSSP)	secondary_structure_demo.rs	dssp
RMSD and structure alignment	geometry_demo.rs	geometry
Search and download from RCSB	rcsb_workflow.rs	rcsb, descriptors
Async bulk downloads	async_download_demo.rs	rcsb-async, descriptors
Process multiple files	batch_processing.rs	descriptors, summary

Python examples are available in pdbrust-python/examples/:

basic_usage.py - Parsing and structure access
writing_files.py - Write PDB/mmCIF files
geometry_rmsd.py - RMSD and alignment
numpy_integration.py - Numpy arrays, distance matrices, contact maps
rcsb_search.py - RCSB search and download

Run Rust examples with:

cargo run --example analysis_workflow --features "filter,descriptors,quality,summary"
cargo run --example filtering_demo --features "filter"
cargo run --example selection_demo --features "filter"
cargo run --example b_factor_demo --features "descriptors"
cargo run --example secondary_structure_demo --features "dssp"
cargo run --example geometry_demo --features "geometry"
cargo run --example rcsb_workflow --features "rcsb,descriptors"
cargo run --example async_download_demo --features "rcsb-async,descriptors"
cargo run --example batch_processing --features "descriptors,summary"

For a complete getting started guide, see docs/GETTING_STARTED.md.

Performance

Benchmarks against equivalent Python code show 40-260x speedups for in-memory operations:

Operation	Speedup
Parsing	2-3x
get_ca_coords	240x
max_ca_distance	260x
radius_of_gyration	100x

Full PDB Archive Validation

PDBRust has been validated against the entire Protein Data Bank:

Metric	Value
Total Structures Tested	230,655
Success Rate	100%
Failed Parses	0
Total Atoms Parsed	2,057,302,767
Processing Rate	~92 files/sec
Largest Structure	2ku2 (1,290,100 atoms)

Run the full benchmark yourself:

cargo run --release --example full_pdb_benchmark \
    --features "gzip,parallel,descriptors,quality,summary" \
    -- /path/to/pdb/archive --output-dir ./results

Python Package

Pre-built wheels available for Linux, macOS, and Windows (Python 3.9-3.13):

pip install pdbrust

Platform Notes

The Python package includes full functionality on macOS and Windows. On Linux, the RCSB download/search features are not available in the pre-built wheels due to cross-compilation constraints. All other features (parsing, filtering, analysis, geometry, numpy arrays, etc.) work on all platforms.

Platform	Parsing	Filtering	Descriptors	Geometry	DSSP	RCSB
macOS	✓	✓	✓	✓	✓	✓
Windows	✓	✓	✓	✓	✓	✓
Linux	✓	✓	✓	✓	✓	-

See pdbrust-python/README.md for full Python API documentation.

Documentation

Citation

If you use PDBRust in your research, please cite it using the metadata in our CITATION.cff file:

@software{pdbrust,
  author = {Fooladi, Hosein},
  title = {PDBRust: A High-Performance Rust Library for PDB/mmCIF Parsing and Analysis},
  year = {2025},
  publisher = {Zenodo},
  doi = {10.5281/zenodo.18232203},
  url = {https://doi.org/10.5281/zenodo.18232203},
  version = {0.6.0}
}

Or in text format:

Fooladi, H. (2025). PDBRust: A High-Performance Rust Library for PDB/mmCIF Parsing and Analysis. Zenodo. https://doi.org/10.5281/zenodo.18232203

License

MIT

Commit count: 94

pdbrust

documentation

README

PDBRust

Installation

Python

Rust

Quick Start

Python

Rust

Features

Examples

Filter and Clean Structures

Compute Structural Descriptors

Parse Gzip-Compressed Files

Geometry: RMSD and Alignment

Download from RCSB PDB

Bulk Downloads with Async

Secondary Structure Assignment (DSSP)

B-factor Analysis

Selection Language (PyMOL/VMD-style)

Common Workflows

Performance

Full PDB Archive Validation

Python Package

Platform Notes

Documentation

Citation

License

cargo fmt