peacoqc-rs

Crates.iopeacoqc-rs
lib.rspeacoqc-rs
version0.1.3
created_at2026-01-14 15:42:58.862413+00
updated_at2026-01-18 16:30:22.973968+00
descriptionPeacoQC quality control algorithms for flow cytometry
homepage
repositoryhttps://github.com/jrmoynihan/flow/peacoqc-rs
max_upload_size
id2043070
size471,161
James Moynihan (jrmoynihan)

documentation

README

PeacoQC-RS

Rust License: MIT

PeacoQC-RS is a Rust implementation of PeacoQC (Peak-based Quality Control) algorithms for flow cytometry data. This library provides efficient, trait-based quality control methods that work with any FCS data structure through a simple trait interface.

Core Features

  • Peak Detection: Automatic peak detection using kernel density estimation
  • Isolation Forest: Outlier detection using isolation tree method
  • MAD Outlier Detection: Median Absolute Deviation-based outlier identification
  • Margin Event Removal: Detection and removal of margin events
  • Doublet Detection: Identification of doublet/multiplet events
  • Monotonic Channel Detection: Detection of channels with monotonic trends (indicating technical issues)
  • Consecutive Bins Filtering: Removal of short consecutive regions
  • Trait-Based Design: Works with any data structure via PeacoQCData trait

Installation

Add this to your Cargo.toml:

[dependencies]
peacoqc-rs = { path = "../peacoqc-rs", version = "0.1.0", features = ["flow-fcs"] }

Or from crates.io (when published):

[dependencies]
peacoqc-rs = { version = "0.1.0", features = ["flow-fcs"] }

Feature Flags

  • flow-fcs (default): Enable integration with the flow-fcs crate for FCS file support

Quick Start

Basic Usage

use peacoqc_rs::{PeacoQCConfig, PeacoQCData, QCMode, peacoqc};

// Assuming you have an FCS struct that implements PeacoQCData
let config = PeacoQCConfig {
    channels: vec!["FL1-A".to_string(), "FL2-A".to_string()],
    determine_good_cells: QCMode::All,
    ..Default::default()
};

let result = peacoqc(&fcs, &config)?;

// Apply the `good_cells` boolean mask from the PeacoQCResult struct
let clean_fcs = fcs.filter(&result.good_cells)?;

println!("Removed {:.2}% of events", result.percentage_removed);

// Export QC results for downstream analysis
result.export_csv_boolean("qc_results.csv")?;
result.export_json_metadata(&config, "qc_metadata.json")?;

See examples/basic_usage.rs for a complete working example.

Interoperability via Traits

PeacoQC-RS uses trait-based design for maximum interoperability. To use PeacoQC with your own FCS data structure, simply implement the PeacoQCData trait:

use peacoqc_rs::{PeacoQCData, Result};

struct MyFcs {
    // your data fields
}

impl PeacoQCData for MyFcs {
    fn n_events(&self) -> usize {
        // return number of events
    }

    fn channel_names(&self) -> Vec<String> {
        // return channel names
    }

    fn get_channel_range(&self, channel: &str) -> Option<(f64, f64)> {
        // return channel range if available
    }

    fn get_channel_f64(&self, channel: &str) -> Result<Vec<f64>> {
        // return channel data as Vec<f64>
    }
}

Additionally, implement FcsFilter to enable filtering:

use peacoqc_rs::{FcsFilter, Result};

impl FcsFilter for MyFcs {
    fn filter(&self, mask: &[bool]) -> Result<Self> {
        // return a new instance with filtered data
    }
}

Integration with flow-fcs

If you enable the flow-fcs feature flag, PeacoQC-RS provides trait implementations for the Fcs struct provided by it:

use flow_fcs::Fcs;
use peacoqc_rs::{PeacoQCConfig, QCMode, peacoqc};

let fcs = Fcs::open("data.fcs")?;

let config = PeacoQCConfig {
    channels: fcs.get_fluorescence_channels(), // Auto-detect channels
    determine_good_cells: QCMode::All,
    ..Default::default()
};

let result = peacoqc(&fcs, &config)?;
// Apply the `good_cells` boolean mask from the PeacoQCResult struct
let clean_fcs = fcs.filter(&result.good_cells)?;

API Overview

Main Functions

fn peacoqc<T: PeacoQCData>(fcs: &T, config: &PeacoQCConfig) -> Result<PeacoQCResult>
  • Main quality control function that runs the complete PeacoQC pipeline
  • Processes channels and bins in parallel for optimal performance
fn remove_margins<T: PeacoQCData>(fcs: &T, config: &MarginConfig) -> Result<MarginResult>
  • Remove margin events from FCS data
fn remove_doublets<T: PeacoQCData>(fcs: &T, config: &DoubletConfig) -> Result<DoubletResult>
  • Detect and remove doublet/multiplet events

Configuration

  • PeacoQCConfig: Main configuration for quality control

    • channels: Channels to analyze
    • determine_good_cells: QC mode (All, IsolationTree, MAD, None)
    • mad: MAD threshold (default: 6.0)
    • it_limit: Isolation Tree limit (default: 0.6)
    • consecutive_bins: Consecutive bins threshold (default: 5)
  • MarginConfig: Configuration for margin event removal

  • DoubletConfig: Configuration for doublet detection

Results

  • PeacoQCResult: Complete QC results
    • good_cells: Boolean mask (true = keep, false = remove)
    • percentage_removed: Percentage of events removed
    • peaks: Peak detection results per channel
    • n_bins: Number of bins used
    • events_per_bin: Events per bin
    • export_csv_boolean(): Export as boolean CSV (0/1 values)
    • export_csv_numeric(): Export as numeric CSV (2000/6000 values, R-compatible)
    • export_json_metadata(): Export comprehensive QC metrics as JSON

Export Formats

PeacoQC-RS supports multiple export formats for QC results, enabling integration with various downstream analysis tools.

Boolean CSV (Recommended)

Export QC results as a CSV file with 0/1 values:

result.export_csv_boolean("qc_results.csv")?;

Format:

PeacoQC
1
1
0
1
  • 1 = good event (keep)
  • 0 = bad event (remove)

Use cases:

  • pandas: df[df['PeacoQC'] == 1]
  • R: df[df$PeacoQC == 1, ]
  • SQL: WHERE PeacoQC = 1
  • General data analysis workflows

Numeric CSV (R-Compatible)

Export QC results as a CSV file with numeric codes matching the R PeacoQC package:

result.export_csv_numeric("qc_results_r.csv", 2000, 6000)?;

Format:

PeacoQC
2000
2000
6000
2000
  • 2000 (or custom good_value) = good event (keep)
  • 6000 (or custom bad_value) = bad event (remove)

Use cases:

  • Compatibility with existing R PeacoQC workflows
  • FlowJo CSV import
  • Legacy analysis pipelines

JSON Metadata

Export comprehensive QC metrics and configuration as JSON:

result.export_json_metadata(&config, "qc_metadata.json")?;

Format:

{
  "n_events_before": 713904,
  "n_events_after": 631400,
  "n_events_removed": 82504,
  "percentage_removed": 11.56,
  "it_percentage": 0.0,
  "mad_percentage": 11.56,
  "consecutive_percentage": 0.0,
  "n_bins": 1427,
  "events_per_bin": 500,
  "channels_analyzed": ["FL1-A", "FL2-A"],
  "config": {
    "qc_mode": "All",
    "mad": 6.0,
    "it_limit": 0.6,
    "consecutive_bins": 5,
    "remove_zeros": false
  }
}

Use cases:

  • Programmatic access to QC metrics
  • Reporting and documentation
  • Provenance tracking
  • Quality control dashboards

Custom Column Names

You can specify custom column names for CSV exports:

result.export_csv_boolean_with_name("qc_results.csv", "QC_Status")?;
result.export_csv_numeric_with_name("qc_results_r.csv", 2000, 6000, "PeacoQC_Status")?;

Quality Control Methods

1. Peak Detection

Uses kernel density estimation (KDE) with Gaussian kernels to detect peaks in binned data. Peaks are identified using Silverman's rule for bandwidth selection.

2. Isolation Tree

An isolation forest-based outlier detection method. Events in bins with low isolation scores are flagged as outliers.

3. MAD (Median Absolute Deviation)

Detects outliers using the median absolute deviation method. Events exceeding a MAD threshold are flagged.

4. Consecutive Bins Filtering

Removes short consecutive regions that may represent artifacts rather than real biological populations.

5. Monotonic Channel Detection

Detects channels with monotonic trends (increasing or decreasing) which may indicate technical problems:

  • Increasing: Possible accumulation, clog developing
  • Decreasing: Possible depletion, pressure loss

Uses kernel smoothing (matching R's stats::ksmooth with bandwidth=50) to smooth bin medians, then checks if smoothed values satisfy monotonicity conditions using cummax/cummin. Channels are flagged if >75% of smoothed values are non-decreasing (increasing) or non-increasing (decreasing). This matches the original R implementation's algorithm.

Performance

PeacoQC-RS is optimized for performance:

  • Parallel Processing: Uses rayon for parallel computation:
    • Multiple channels processed in parallel (all channels simultaneously)
    • Multiple bins within each channel processed in parallel
    • Provides significant speedup on multi-core systems (typically 2-8x depending on core count)
  • Efficient Data Structures: Uses Polars DataFrames (via flow-fcs feature flag) for columnar storage
  • Minimal Allocations: Optimized to reduce memory allocations
  • SIMD Support: Leverages Polars' SIMD operations for fast numeric computations

Benchmarks

Run benchmarks with:

cargo bench --bench peacoqc_bench

Benchmarks are currently being developed and will provide performance metrics for various dataset sizes.

Test Coverage

The library includes comprehensive unit tests covering:

  • Peak detection accuracy
  • Isolation tree outlier detection
  • MAD outlier identification
  • Margin event removal
  • Doublet detection
  • Monotonic channel detection
  • Statistical functions (median, MAD, density estimation)

Run tests with:

cargo test

Examples

Basic Usage Example

See examples/basic_usage.rs for a complete example demonstrating:

  1. Creating synthetic FCS data
  2. Removing margin events
  3. Removing doublets
  4. Running full PeacoQC analysis
  5. Applying the quality control filter

Run with:

cargo run --example basic_usage

Error Handling

All functions return Result<T, PeacoQCError>. The PeacoQCError enum covers:

  • InvalidChannel: Invalid or non-numeric channel
  • ChannelNotFound: Channel not found in data
  • InsufficientData: Not enough events for analysis
  • StatsError: Statistical computation failed
  • ConfigError: Configuration error
  • NoPeaksDetected: No peaks detected in data
  • PolarsError: Polars DataFrame error (when using flow-fcs feature)

License

MIT License - see LICENSE file for details

Attribution

This Rust implementation is based on the original PeacoQC algorithm and R package. We gratefully acknowledge the original authors:

Original Paper:

Original R Implementation:

This Rust version provides:

  • Improved performance through native compilation
  • Better memory efficiency
  • Type safety
  • Trait-based extensibility

Contributing

Contributions are welcome! Please feel free to open issues or submit a Pull Request on Github.

Commit count: 0

cargo fmt