Molecular Graph
The molecular graph is the central data structure in molprint. Every algorithm — ring perception, fingerprinting, SMARTS matching — operates on it.
Type definition
#![allow(unused)] fn main() { // mol/graph.rs pub type MolGraph = UnGraph<Atom, BondType>; }
MolGraph is a type alias for petgraph's UnGraph<Atom, BondType>, an undirected graph where each node carries an Atom and each edge carries a BondType. Hydrogen atoms are not stored as nodes in the graph (they are implicit); instead, Atom::h_count records the total hydrogen count.
The Atom struct
#![allow(unused)] fn main() { pub struct Atom { pub element: Element, // atomic number-backed enum pub charge: i8, // formal charge pub isotope: Option<u16>, pub aromatic: bool, pub h_count: u8, // total H count (implicit + explicit) pub explicit_h: Option<u8>, // Some if bracket atom with H spec pub map_num: u16, // atom-map number (for reaction SMILES) pub in_ring: bool, // set by ring perception pub ring_sizes: Vec<u8>, // one entry per SSSR ring containing this atom } }
h_count is computed after the full graph is constructed by assign_implicit_hydrogens(). For bracket atoms (e.g., [NH3]), the H count is explicit; for organic subset atoms (e.g., N in CC(N)C), it is inferred from the atom's default valence minus its current bond order sum.
Bond types
#![allow(unused)] fn main() { pub enum BondType { Single, Double, Triple, Aromatic, } }
Each BondType has a valence_contribution() method that returns its integer bond order (Single=1, Double=2, Triple=3, Aromatic=1). The aromatic contribution of 1 is intentional: the additional electron from the pi system is accounted for separately in the implicit H calculation.
The MolGraphExt trait
petgraph's UnGraph is a generic graph; it doesn't know about chemistry. The MolGraphExt trait adds chemistry-aware methods:
#![allow(unused)] fn main() { pub trait MolGraphExt { fn num_atoms(&self) -> usize; fn num_bonds(&self) -> usize; fn atom(&self, idx: NodeIndex) -> &Atom; fn atom_mut(&mut self, idx: NodeIndex) -> &mut Atom; fn bond_between(&self, a: NodeIndex, b: NodeIndex) -> Option<BondType>; fn heavy_neighbors(&self, idx: NodeIndex) -> Vec<NodeIndex>; fn degree(&self, idx: NodeIndex) -> usize; fn compute_implicit_h(&self, idx: NodeIndex) -> u8; fn assign_implicit_hydrogens(&mut self); } }
compute_implicit_h is the most subtle method. It handles three cases:
- Bracket atom with H spec (e.g.,
[NH3]): return the explicit count directly. - Non-organic-subset element (e.g.,
[Fe]): return 0. These atoms get no implicit Hs. - Organic subset atom: find the smallest standard valence ≥ current bond order sum; return
valence − bond_sum − |charge|. For aromatic atoms, an extra +1 is added to the effective bond sum to account for the pi-electron contribution.
Implicit hydrogen calculation for aromatic atoms
This was the single most difficult piece to get right. The logic is:
effective_bond_sum = bond_sum + 1 (add 1 for the pi electron)
h_count = valence - effective_bond_sum
But this +1 must only be applied when the atom's valence is not already fully saturated by its sigma bonds. For example, trimethylamine-oxide's nitrogen in an aromatic context is a different case than a pyridine nitrogen. The code checks whether valence >= bond_sum before deciding to apply the adjustment. If the valence is already fully consumed by heavy-atom bonds, no pi adjustment is added, and 0 implicit Hs are assigned.
Node and edge indexing
petgraph assigns NodeIndex and EdgeIndex values sequentially as nodes and edges are added. These indices are stable within a single MolGraph lifetime (no removals happen in molprint). The SMILES parser, ring perception, and all fingerprint algorithms use NodeIndex::new(i) to iterate over atoms.