emdoc

Crates.ioemdoc
lib.rsemdoc
version0.1.1
created_at2026-01-10 12:54:22.447766+00
updated_at2026-01-10 12:58:01.64685+00
descriptionA fast, lossless serialEM MDOC parser and writer for cryo-electron microscopy. Eg. cryo-ET mdoc file.
homepage
repositoryhttps://github.com/elemeng/emdoc
max_upload_size
id2034165
size171,344
(elemeng)

documentation

README

๐Ÿ“ฆ emdoc

emdoc โ€” A fast, high-performance serialEM MDOC parser and writer for cryo-EM โšก

  • serialem + mdoc = emdoc
  • for rust & python users

Rust Crates.io License: MIT


โœจ Features

Feature Description
โšก Streaming Parse Process gigabyte-scale MDOC files with BufRead
๐Ÿ”„ Lossless Round-trip Preserve every character & comment
๐Ÿ“ Order Preservation Field order guaranteed
๐Ÿ”ง No Schema Lock-in Works with any MDOC variant
๐ŸŽฏ Typed Access get::<f32>() or get_checked::<f32>() for safety
๐Ÿ Python Integration to_python_dict(), to_numpy_arrays()
๐Ÿ“Š Polars Support Zero-copy DataFrame conversion
๐Ÿ› ๏ธ Normalization APIs Clean up inconsistent formatting
๐Ÿš€ Zero-copy Streaming Visitor pattern for huge files

๐Ÿ“ฅ Installation

Cargo.toml

[dependencies]
mdoc = { version = "0.1.0", features = ["serde"] }

# Optional features
# mdoc = { version = "0.1.0", features = ["serde", "python", "polars"] }

Python

# Coming soon! For now, build with maturin
maturin develop --features python

๐Ÿš€ Quick Start

1๏ธโƒฃ Parse a File

use emdoc::Mdoc;

// Load entire MDOC into memory
let mdoc = Mdoc::from_file("data.mdoc")?;
println!("๐Ÿ“Š Found {} tilt images", mdoc.tilt_series().len());

// Access header
for entry in mdoc.header() {
    println!("๐Ÿท๏ธ  {:?}", entry);
}

2๏ธโƒฃ Read Typed Fields

let z0 = mdoc.tilt_image(0).unwrap();
println!("๐Ÿ”ฌ Tilt Angle: {:?}", z0.tilt_angle());
println!("๐ŸŽฏ Defocus: {:?}", z0.defocus());
println!("๐Ÿ“ Magnification: {:?}", z0.magnification());

// Type-safe access with error handling
let angle: f32 = z0.get_checked("TiltAngle")?;

3๏ธโƒฃ Modify & Save

let mut mdoc = Mdoc::from_file("input.mdoc")?;

// Update a field
mdoc.update_field(0, "Defocus", "-3.5");

// Add new tilt image
let new_image = mdoc.add_tilt_image(42);
new_image.set("TiltAngle", "45.0");
new_image.set("Magnification", "50000");

// Lossless write (preserves original formatting)
mdoc.write_lossless(std::fs::File::create("output.mdoc")?)?;

๐Ÿ“š API Reference

๐Ÿ—๏ธ Core Types

Type Description Key Methods
Mdoc Root MDOC container from_file(), write(), validate()
ZBlock Tilt image metadata block get(), set(), tilt_angle()
HeaderEntry Header line enum Comment, KeyValue, Unknown
ParseError Parse failures InvalidBlock, InvalidZValue
FieldError Field access errors Missing, InvalidType

๐Ÿ”ง Mdoc Methods

๐Ÿ“‚ Constructors

Method Emoji Signature Description
from_reader ๐Ÿ“– from_reader(R: BufRead) โ†’ Result<Mdoc, ParseError> Stream parse from any BufRead
from_file ๐Ÿ’พ from_file<P: AsRef<Path>>(path: P) โ†’ Result<Mdoc, ParseError> Parse from file path

โœ๏ธ Writers

Method Emoji Signature Description
write โœ๏ธ write<W: Write>(&self, w: W) โ†’ io::Result<()> Write normalized format
write_lossless ๐Ÿ”„ write_lossless<W: Write>(&self, w: W) โ†’ io::Result<()> Preserve original formatting!

๐Ÿ” Accessors

Method Emoji Signature Description
header ๐Ÿท๏ธ header(&self) โ†’ &[HeaderEntry] Get header entries
tilt_series ๐Ÿ“Š tilt_series(&self) โ†’ &[ZBlock] Get all Z blocks
tilt_image ๐ŸŽฏ tilt_image(&self, z: usize) โ†’ Option<&ZBlock> Get specific Z block
tilt_image_mut ๐Ÿ› ๏ธ tilt_image_mut(&self, z: usize) โ†’ Option<&mut ZBlock> Get mutable Z block

๐Ÿ› ๏ธ Mutators

Method Emoji Signature Description
add_tilt_image โž• add_tilt_image(&mut self, z: usize) โ†’ &mut ZBlock Add or replace Z block
remove_tilt_image ๐Ÿ—‘๏ธ remove_tilt_image(&mut self, z: usize) โ†’ bool Remove Z block
update_field ๐Ÿ“ update_field(&mut self, z: usize, key: K, value: V) โ†’ bool Set single field

โœ… Validation & Normalization

Method Emoji Signature Description
validate โœ… validate(&self) โ†’ Result<(), Vec<ValidationError>> Check for duplicate Z values
normalize_spaces ๐Ÿงน normalize_spaces(&mut self) Collapse multiple spaces
normalize_format ๐ŸŽจ normalize_format(&mut self) Standardize Key = value; format
capture_raw_values ๐Ÿ“ธ capture_raw_values(&mut self) Deprecated - now auto-captured

๐Ÿ”Œ Serialization

Method Emoji Feature Signature Description
to_json ๐Ÿ“ค serde to_json(&self) โ†’ Result<String, serde_json::Error> Pretty JSON export
from_json ๐Ÿ“ฅ serde from_json(json: &str) โ†’ Result<Mdoc, serde_json::Error> JSON import

๐Ÿ“Š Data Science

Method Emoji Feature Signature Description
to_polars_df ๐Ÿ“ˆ polars to_polars_df(&self) โ†’ Result<DataFrame, PolarsError> Zero-copy DataFrame
to_python_dict ๐Ÿ python to_python_dict(&self) โ†’ PyResult<PyObject> Python dict conversion
to_numpy_arrays ๐Ÿ”ข python to_numpy_arrays(&self) โ†’ PyResult<(PyObject, PyObject)> (tilt_angles, defocus) arrays

๐Ÿ” ZBlock Methods

๐Ÿ“– Readers

Method Emoji Signature Description
z #๏ธโƒฃ z(&self) โ†’ usize Get Z value
get_raw ๐Ÿ“ get_raw(&self, key: &str) โ†’ Option<&str> Get raw string value
get ๐ŸŽฏ get<T: FromMdocValue>(&self, key: &str) โ†’ Option<T> Typed access
get_checked โœ… get_checked<T: FromMdocValue>(&self, key: &str) โ†’ Result<T, FieldError> Typed with error

Convenience Fields

Method Emoji Return Type Description
tilt_angle ๐Ÿ“ Option<f32> TiltAngle field
defocus ๐Ÿ”ฌ Option<f32> Defocus field
magnification ๐Ÿ” Option<i32> Magnification field
subframe_path ๐Ÿ“ Option<String> SubFramePath field
basename ๐Ÿท๏ธ Option<String> Basename of SubFramePath
stage_position ๐Ÿ“ Option<(f32, f32)> Parse StagePosition into XY
min_max_mean ๐Ÿ“Š Option<(f32, f32, f32)> Parse MinMaxMean

โœ๏ธ Writers

Method Emoji Signature Description
set ๐Ÿ“ set(&mut self, key: K, value: V) Set or add field
remove ๐Ÿ—‘๏ธ remove(&mut self, key: &str) โ†’ bool Remove single field
retain_fields ๐Ÿงน retain_fields<F>(&mut self, f: F) Batch remove (efficient!)

๐Ÿ”„ Round-trip

Method Emoji Signature Description
has_raw_lines ๐Ÿ“ธ has_raw_lines(&self) โ†’ bool Check if raw lines stored
get_raw_line ๐Ÿ“ get_raw_line(&self, key: &str) โ†’ Option<&str> Get original line

๐ŸŒŠ Streaming APIs

Visitor Pattern

pub trait MdocVisitor {
    fn header(&mut self, entry: &str);      // ๐Ÿท๏ธ Header line
    fn begin_tilt(&mut self, z: usize);     // ๐ŸŽฌ Start Z block
    fn field(&mut self, key: &str, value: &str); // ๐Ÿ“„ Field
    fn end_tilt(&mut self);                 // ๐Ÿ End Z block
}

pub fn parse_stream<R: BufRead>(
    reader: R,
    visitor: &mut dyn MdocVisitor,
) -> Result<(), ParseError>

Transform API

pub fn transform<R: BufRead, W: Write>(
    reader: R,
    writer: W,
    f: impl FnMut(&mut FieldEdit),
) -> Result<(), ParseError>

// FieldEdit has: key, value, new_value (set via .set())

๐ŸŽฏ Performance Tips

Pattern โœ… Good โŒ Bad Why
Batch Removal retain_fields() Multiple remove() Single index rebuild O(n) vs O(nร—m)
Streaming parse_stream() from_reader() huge files Constant memory for GB files
Typed Access get_checked() get().unwrap() Proper error handling
Raw Lines Auto-captured! ๐ŸŽ‰ Manual capture_raw_values() No-op since v0.1.0
Validation validate() before save Assume valid Catches duplicates early

๐Ÿ”ฌ Cryo-EM Specific Examples

๐Ÿ“Š Plot Defocus vs Tilt Angle (Python)

import emdoc
import matplotlib.pyplot as plt

# Load MDOC
mdoc_data = emdoc.Mdoc.from_file("tilt_series.mdoc")
tilt_angles, defocus_values = mdoc_data.to_numpy_arrays()

# ๐Ÿ“ˆ Quick plot
plt.figure(figsize=(10, 6))
plt.scatter(tilt_angles, defocus_values, alpha=0.7, s=50)
plt.xlabel("Tilt Angle (ยฐ)")
plt.ylabel("Defocus (ฮผm)")
plt.title("Defocus vs Tilt Angle")
plt.grid(True, alpha=0.3)
plt.show()

๐Ÿ”„ Streaming Filter (Rust)

use std::fs::File;
use emdoc::{parse_stream, MdocVisitor};

struct TiltAngleFilter {
    min_angle: f32,
    max_angle: f32,
}

impl MdocVisitor for TiltAngleFilter {
    fn begin_tilt(&mut self, z: usize) {
        println!("๐ŸŽฌ Processing Z = {}", z);
    }
    
    fn field(&mut self, key: &str, value: &str) {
        if key == "TiltAngle" {
            let angle: f32 = value.parse().unwrap();
            if angle < self.min_angle || angle > self.max_angle {
                println!("โš ๏ธ  Tilt angle {} out of range!", angle);
            }
        }
    }
}

๐Ÿงช Testing

# Run tests
cargo test

# Test Python integration
cargo test --features python

# Benchmark parsing
cargo bench --features bench

๐Ÿ“œ License

MIT License - see LICENSE file for details.


๐Ÿค Contributing

We love contributions! Please submit a pull request or open an issue on GitHub.

Commit count: 8

cargo fmt