๐ฆ emdoc
emdoc โ A fast, high-performance serialEM MDOC parser and writer for cryo-EM โก
- serialem + mdoc = emdoc
- for rust & python users

โจ Features
| Feature |
Description |
| โก Streaming Parse |
Process gigabyte-scale MDOC files with BufRead |
| ๐ Lossless Round-trip |
Preserve every character & comment |
| ๐ Order Preservation |
Field order guaranteed |
| ๐ง No Schema Lock-in |
Works with any MDOC variant |
| ๐ฏ Typed Access |
get::<f32>() or get_checked::<f32>() for safety |
| ๐ Python Integration |
to_python_dict(), to_numpy_arrays() |
| ๐ Polars Support |
Zero-copy DataFrame conversion |
| ๐ ๏ธ Normalization APIs |
Clean up inconsistent formatting |
| ๐ Zero-copy Streaming |
Visitor pattern for huge files |
๐ฅ Installation
Cargo.toml
[dependencies]
mdoc = { version = "0.1.0", features = ["serde"] }
# Optional features
# mdoc = { version = "0.1.0", features = ["serde", "python", "polars"] }
Python
# Coming soon! For now, build with maturin
maturin develop --features python
๐ Quick Start
1๏ธโฃ Parse a File
use emdoc::Mdoc;
// Load entire MDOC into memory
let mdoc = Mdoc::from_file("data.mdoc")?;
println!("๐ Found {} tilt images", mdoc.tilt_series().len());
// Access header
for entry in mdoc.header() {
println!("๐ท๏ธ {:?}", entry);
}
2๏ธโฃ Read Typed Fields
let z0 = mdoc.tilt_image(0).unwrap();
println!("๐ฌ Tilt Angle: {:?}", z0.tilt_angle());
println!("๐ฏ Defocus: {:?}", z0.defocus());
println!("๐ Magnification: {:?}", z0.magnification());
// Type-safe access with error handling
let angle: f32 = z0.get_checked("TiltAngle")?;
3๏ธโฃ Modify & Save
let mut mdoc = Mdoc::from_file("input.mdoc")?;
// Update a field
mdoc.update_field(0, "Defocus", "-3.5");
// Add new tilt image
let new_image = mdoc.add_tilt_image(42);
new_image.set("TiltAngle", "45.0");
new_image.set("Magnification", "50000");
// Lossless write (preserves original formatting)
mdoc.write_lossless(std::fs::File::create("output.mdoc")?)?;
๐ API Reference
๐๏ธ Core Types
| Type |
Description |
Key Methods |
Mdoc |
Root MDOC container |
from_file(), write(), validate() |
ZBlock |
Tilt image metadata block |
get(), set(), tilt_angle() |
HeaderEntry |
Header line enum |
Comment, KeyValue, Unknown |
ParseError |
Parse failures |
InvalidBlock, InvalidZValue |
FieldError |
Field access errors |
Missing, InvalidType |
๐ง Mdoc Methods
๐ Constructors
| Method |
Emoji |
Signature |
Description |
from_reader |
๐ |
from_reader(R: BufRead) โ Result<Mdoc, ParseError> |
Stream parse from any BufRead |
from_file |
๐พ |
from_file<P: AsRef<Path>>(path: P) โ Result<Mdoc, ParseError> |
Parse from file path |
โ๏ธ Writers
| Method |
Emoji |
Signature |
Description |
write |
โ๏ธ |
write<W: Write>(&self, w: W) โ io::Result<()> |
Write normalized format |
write_lossless |
๐ |
write_lossless<W: Write>(&self, w: W) โ io::Result<()> |
Preserve original formatting! |
๐ Accessors
| Method |
Emoji |
Signature |
Description |
header |
๐ท๏ธ |
header(&self) โ &[HeaderEntry] |
Get header entries |
tilt_series |
๐ |
tilt_series(&self) โ &[ZBlock] |
Get all Z blocks |
tilt_image |
๐ฏ |
tilt_image(&self, z: usize) โ Option<&ZBlock> |
Get specific Z block |
tilt_image_mut |
๐ ๏ธ |
tilt_image_mut(&self, z: usize) โ Option<&mut ZBlock> |
Get mutable Z block |
๐ ๏ธ Mutators
| Method |
Emoji |
Signature |
Description |
add_tilt_image |
โ |
add_tilt_image(&mut self, z: usize) โ &mut ZBlock |
Add or replace Z block |
remove_tilt_image |
๐๏ธ |
remove_tilt_image(&mut self, z: usize) โ bool |
Remove Z block |
update_field |
๐ |
update_field(&mut self, z: usize, key: K, value: V) โ bool |
Set single field |
โ
Validation & Normalization
| Method |
Emoji |
Signature |
Description |
validate |
โ
|
validate(&self) โ Result<(), Vec<ValidationError>> |
Check for duplicate Z values |
normalize_spaces |
๐งน |
normalize_spaces(&mut self) |
Collapse multiple spaces |
normalize_format |
๐จ |
normalize_format(&mut self) |
Standardize Key = value; format |
capture_raw_values |
๐ธ |
capture_raw_values(&mut self) |
Deprecated - now auto-captured |
๐ Serialization
| Method |
Emoji |
Feature |
Signature |
Description |
to_json |
๐ค |
serde |
to_json(&self) โ Result<String, serde_json::Error> |
Pretty JSON export |
from_json |
๐ฅ |
serde |
from_json(json: &str) โ Result<Mdoc, serde_json::Error> |
JSON import |
๐ Data Science
| Method |
Emoji |
Feature |
Signature |
Description |
to_polars_df |
๐ |
polars |
to_polars_df(&self) โ Result<DataFrame, PolarsError> |
Zero-copy DataFrame |
to_python_dict |
๐ |
python |
to_python_dict(&self) โ PyResult<PyObject> |
Python dict conversion |
to_numpy_arrays |
๐ข |
python |
to_numpy_arrays(&self) โ PyResult<(PyObject, PyObject)> |
(tilt_angles, defocus) arrays |
๐ ZBlock Methods
๐ Readers
| Method |
Emoji |
Signature |
Description |
z |
#๏ธโฃ |
z(&self) โ usize |
Get Z value |
get_raw |
๐ |
get_raw(&self, key: &str) โ Option<&str> |
Get raw string value |
get |
๐ฏ |
get<T: FromMdocValue>(&self, key: &str) โ Option<T> |
Typed access |
get_checked |
โ
|
get_checked<T: FromMdocValue>(&self, key: &str) โ Result<T, FieldError> |
Typed with error |
Convenience Fields
| Method |
Emoji |
Return Type |
Description |
tilt_angle |
๐ |
Option<f32> |
TiltAngle field |
defocus |
๐ฌ |
Option<f32> |
Defocus field |
magnification |
๐ |
Option<i32> |
Magnification field |
subframe_path |
๐ |
Option<String> |
SubFramePath field |
basename |
๐ท๏ธ |
Option<String> |
Basename of SubFramePath |
stage_position |
๐ |
Option<(f32, f32)> |
Parse StagePosition into XY |
min_max_mean |
๐ |
Option<(f32, f32, f32)> |
Parse MinMaxMean |
โ๏ธ Writers
| Method |
Emoji |
Signature |
Description |
set |
๐ |
set(&mut self, key: K, value: V) |
Set or add field |
remove |
๐๏ธ |
remove(&mut self, key: &str) โ bool |
Remove single field |
retain_fields |
๐งน |
retain_fields<F>(&mut self, f: F) |
Batch remove (efficient!) |
๐ Round-trip
| Method |
Emoji |
Signature |
Description |
has_raw_lines |
๐ธ |
has_raw_lines(&self) โ bool |
Check if raw lines stored |
get_raw_line |
๐ |
get_raw_line(&self, key: &str) โ Option<&str> |
Get original line |
๐ Streaming APIs
Visitor Pattern
pub trait MdocVisitor {
fn header(&mut self, entry: &str); // ๐ท๏ธ Header line
fn begin_tilt(&mut self, z: usize); // ๐ฌ Start Z block
fn field(&mut self, key: &str, value: &str); // ๐ Field
fn end_tilt(&mut self); // ๐ End Z block
}
pub fn parse_stream<R: BufRead>(
reader: R,
visitor: &mut dyn MdocVisitor,
) -> Result<(), ParseError>
Transform API
pub fn transform<R: BufRead, W: Write>(
reader: R,
writer: W,
f: impl FnMut(&mut FieldEdit),
) -> Result<(), ParseError>
// FieldEdit has: key, value, new_value (set via .set())
๐ฏ Performance Tips
| Pattern |
โ
Good |
โ Bad |
Why |
| Batch Removal |
retain_fields() |
Multiple remove() |
Single index rebuild O(n) vs O(nรm) |
| Streaming |
parse_stream() |
from_reader() huge files |
Constant memory for GB files |
| Typed Access |
get_checked() |
get().unwrap() |
Proper error handling |
| Raw Lines |
Auto-captured! ๐ |
Manual capture_raw_values() |
No-op since v0.1.0 |
| Validation |
validate() before save |
Assume valid |
Catches duplicates early |
๐ฌ Cryo-EM Specific Examples
๐ Plot Defocus vs Tilt Angle (Python)
import emdoc
import matplotlib.pyplot as plt
# Load MDOC
mdoc_data = emdoc.Mdoc.from_file("tilt_series.mdoc")
tilt_angles, defocus_values = mdoc_data.to_numpy_arrays()
# ๐ Quick plot
plt.figure(figsize=(10, 6))
plt.scatter(tilt_angles, defocus_values, alpha=0.7, s=50)
plt.xlabel("Tilt Angle (ยฐ)")
plt.ylabel("Defocus (ฮผm)")
plt.title("Defocus vs Tilt Angle")
plt.grid(True, alpha=0.3)
plt.show()
๐ Streaming Filter (Rust)
use std::fs::File;
use emdoc::{parse_stream, MdocVisitor};
struct TiltAngleFilter {
min_angle: f32,
max_angle: f32,
}
impl MdocVisitor for TiltAngleFilter {
fn begin_tilt(&mut self, z: usize) {
println!("๐ฌ Processing Z = {}", z);
}
fn field(&mut self, key: &str, value: &str) {
if key == "TiltAngle" {
let angle: f32 = value.parse().unwrap();
if angle < self.min_angle || angle > self.max_angle {
println!("โ ๏ธ Tilt angle {} out of range!", angle);
}
}
}
}
๐งช Testing
# Run tests
cargo test
# Test Python integration
cargo test --features python
# Benchmark parsing
cargo bench --features bench
๐ License
MIT License - see LICENSE file for details.
๐ค Contributing
We love contributions! Please submit a pull request or open an issue on GitHub.