| Crates.io | zinchi |
| lib.rs | zinchi |
| version | 0.2.0 |
| created_at | 2025-10-06 17:26:02.077806+00 |
| updated_at | 2025-10-09 10:46:48.885754+00 |
| description | A compact binary representation for InChI Keys, reducing their size from 27 bytes to 9-14 bytes |
| homepage | https://github.com/OliverBScott/zinchi |
| repository | https://github.com/OliverBScott/zinchi |
| max_upload_size | |
| id | 1870456 |
| size | 42,565 |
A compact binary representation for InChI Keys.
This crate provides a space-efficient binary encoding for International Chemical Identifier (InChI) keys, reducing their size from the standard 27-byte ASCII representation to either 9 or 14 bytes. The implementation is based on the work by John Mayfield (NextMove Software): Data Compression of InChI Keys and 2D Coordinates.
Note: This is a personal project created for fun and to explore Rust. While it implements a real compression algorithm, it's primarily a learning exercise rather than a production-critical library.
Add this to your Cargo.toml:
[dependencies]
zinchi = "0.1"
use zinchi::InChIKey;
// Parse an InChI key from a string
let key: InChIKey = "ZZJLMZYUGLJBSO-UHFFFAOYSA-N".parse().expect("Failed to parse InChIKey")
// Convert back to string
println!("{}", key);
// Access individual components
println!("Standard: {}", key.is_standard());
println!("Version: {}", key.version());
println!("Protonation: {}", key.get_protonation());
use zinchi::InChIKey;
let key: InChIKey = "ZZJLMZYUGLJBSO-UHFFFAOYSA-N".parse()?;
// Pack to binary (9 or 14 bytes)
let packed = key.packed_bytes();
println!("Packed size: {} bytes", packed.len());
// Unpack from binary
let unpacked = InChIKey::unpack_from(&packed)?;
assert_eq!(key, unpacked);
use zinchi::InChIKey;
let key: InChIKey = "ZZJLMZYUGLJBSO-UHFFFAOYSA-N".parse()?;
// Pack into an existing buffer
let mut buffer = [0u8; 14];
let size = key.pack_into(&mut buffer);
// Use only the relevant bytes
let packed_data = &buffer[..size];
An InChI key has the format: AAAAAAAAAAAAAA-BBBBBBBBFV-P
S for standard, N for non-standardAN for neutral, or A-M for protonated statesStandard InChI keys with the common second block UHFFFAOYSA (empty stereochemistry hash) are packed into just 9 bytes. All other InChI keys require 14 bytes.
This represents a 48-66% reduction in size compared to the ASCII representation.
The first block (14 characters) is decoded into four 14-bit triples and one 9-bit pair, then packed into 9 bytes. The second block (8 characters) is decoded into two 14-bit triples and one 9-bit pair, then packed into 5 bytes. Additional metadata (standard flag, version, protonation) is encoded into spare bits.
When the serde feature is enabled, InChIKey implements Serialize and Deserialize:
use zinchi::InChIKey;
let key: InChIKey = "ZZJLMZYUGLJBSO-UHFFFAOYSA-N".parse()?;
// JSON serialization (human-readable)
let json = serde_json::to_string(&key)?;
assert_eq!(json, "\"ZZJLMZYUGLJBSO-UHFFFAOYSA-N\"");
// Binary serialization with bincode (compact)
let bytes = bincode::serde::encode_to_vec(&key, bincode::config::standard())?;
// Uses the 9 or 14 byte packed representation
The serialization format automatically adapts:
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.