| Crates.io | kira-mmcif |
| lib.rs | kira-mmcif |
| version | 0.1.0 |
| created_at | 2026-01-18 17:17:18.7792+00 |
| updated_at | 2026-01-18 17:17:18.7792+00 |
| description | Low-level, streaming mmCIF parser focused on protein coordinates. |
| homepage | |
| repository | https://github.com/ARyaskov/kira-mmcif |
| max_upload_size | |
| id | 2052696 |
| size | 40,572 |
Low-level, streaming mmCIF parser focused on protein coordinates. The crate reads _atom_site data and exposes a stable, Gemmi-inspired API with a deterministic, protein-oriented data contract.
Scope (by design):
_atom_site only.. or A, ignores others.use kira_mmcif::{read_structure, MmCifError, Structure};
let structure: Structure = read_structure("input.cif")?;
Signature:
pub fn read_structure<P: AsRef<Path>>(path: P) -> Result<Structure, MmCifError>;
Add to Cargo.toml:
[dependencies]
kira-mmcif = "*"
pub struct Structure {
pub models: Vec<Model>,
}
pub struct Model {
pub chains: Vec<Chain>,
}
pub struct Chain {
pub id: ChainId,
pub residues: Vec<Residue>,
}
pub struct Residue {
pub name: ResidueName,
pub seq_id: i32,
pub atoms: SmallVec<[Atom; 4]>,
}
pub struct Atom {
pub name: AtomName,
pub x: f32,
pub y: f32,
pub z: f32,
}
Enums and IDs:
pub enum AtomName { N, CA, C, O }
pub enum ResidueName {
ALA, ARG, ASN, ASP, CYS, GLN, GLU, GLY, HIS, ILE,
LEU, LYS, MET, PHE, PRO, SER, THR, TRP, TYR, VAL,
UNK,
}
pub struct ChainId(pub u8); // 'A'..'Z' or 'a'..'z' => 0..25
Utility mapping (public methods):
impl AtomName {
pub fn from_label_atom_id(label: &str) -> Option<Self>;
pub fn as_u8(self) -> u8; // N=0, CA=1, C=2, O=3
}
impl ResidueName {
pub fn from_label_comp_id(label: &str) -> Self; // unknown => UNK
pub fn as_u8(self) -> u8; // AA index, UNK=255
}
impl ChainId {
pub fn from_label_asym_id(label: &str) -> Option<Self>;
pub fn as_u8(self) -> u8;
}
This is the stable contract for downstream analysis pipelines.
pub struct ProteinIR {
pub atoms: AtomSoA,
pub residues: Vec<ResidueIR>,
pub chains: Vec<ChainIR>,
}
pub struct AtomSoA {
pub x: Vec<f32>,
pub y: Vec<f32>,
pub z: Vec<f32>,
pub residue_idx: Vec<u32>,
pub atom_kind: Vec<u8>, // N=0, CA=1, C=2, O=3
}
pub struct ResidueIR {
pub chain_id: u8,
pub residue_name: u8, // AA index
pub residue_number: i32,
pub atom_offset: u32,
pub atom_count: u8,
pub has_n: bool,
pub has_ca: bool,
pub has_c: bool,
pub has_o: bool,
}
pub struct ChainIR {
pub chain_id: u8,
pub residue_start: u32,
pub residue_end: u32, // inclusive
}
Adapter usage:
use kira_mmcif::{ProteinIR, Structure};
let protein_ir = ProteinIR::try_from(&structure)?;
pub enum MmCifError {
Io(std::io::Error),
Parse(String),
MissingField(&'static str),
InvalidChainId(String),
InvalidModelCount(usize),
}
Required _atom_site fields:
_atom_site.group_PDB_atom_site.label_atom_id_atom_site.label_comp_id_atom_site.label_asym_id_atom_site.label_seq_id_atom_site.Cartn_x_atom_site.Cartn_y_atom_site.Cartn_zSupported extras (optional):
_atom_site.label_alt_id (altLoc filter)_atom_site.pdbx_PDB_model_num (MODEL filter)Filtering behavior:
group_PDB == "ATOM" is kept.1 is kept if the model column is present.. or A (and ? treated as missing) is kept if the altLoc column is present.AtomName::from_label_atom_id must match).Ordering guarantees:
label_asym_id ordering as they appear in the file.label_seq_id within each chain.use kira_mmcif::{read_structure, ProteinIR};
let structure = read_structure("protein.cif")?;
let protein_ir = ProteinIR::try_from(&structure)?;
println!("chains: {}", protein_ir.chains.len());
println!("residues: {}", protein_ir.residues.len());
println!("atoms: {}", protein_ir.atoms.x.len());
# Ok::<(), Box<dyn std::error::Error>>(())