Bayesian Model Selection

HARP executes Bayesian model selection (BMS) between an atomic level model (\(M_0\)) and a residue level model (\(M_1\)) to compute the probability, \(P(M_0 | \mathcal{Y})\), where \(\mathcal{Y}\) is the local cryoEM map for a residue., according to

\[ P(M_0 | \mathcal{Y}) = \frac{P(\mathcal{Y} | M_0) P(M_0)}{P(\mathcal{Y} | M_0) P(M_0) + P(\mathcal{Y} | M_1) P(M_1)} \] where the priors \(P(M_0) = P(M_1) = \frac{1}{2}\), and \(P(\mathcal{Y} | M_0)\) and \(P(\mathcal{Y} | M_1)\) are the likelihoods that the latent structural information in \(\mathcal{Y}\) is explained by an atomic level template (for \(M_0\)) or a residue level template (for \(M_1\)), according to Equation 2.2.2. in Ray, et al.

bms_residue

harp.bayes_model_select.bms_residue (grid, data, subresidue, adfs, blobs, subgrid_size=8., sigmacutoff=5, offset=0.5, atom_types=None, atom_weights=None)

This function runs the HARP calculation at the local residue-level by executing BMS between \(M_0\) and \(M_1\) for a single residue of the cryoEM structure. This function is typically called by bms_molecule (see below), and therefore, it is unlikely to be independently used.

Returns a tuple of (record_prob_good, record_ln_ev).

Arguments

Variable Descrption
grid harp.density.gridclass
The grid defining the voxel locations for a density map.
data numpy.ndarray
The experimental cryoEM density map for a molecule.
subresidue harp.molecule.atomcollection
The collection of atoms that correspond to the specific residue.
adfs numpy.ndarray
A 1-D array of Gaussian width \(\sigma_0\) (in \(\unicode{x212B}\)) for the atomic profiles in \(M_0\). The default is an array of 10 \(\log_{10}\)-spaced points points between 0.25 and 1.0.
blobs numpy.ndarray
A 1-D array of Gaussian width \(\sigma_1\) (in \(\unicode{x212B}\)) for the residue super-atom profiles in \(M_1\). The default is an array of 20 \(\log_{10}\)-spaced points points between 0.25 and 2.8.
subgrid_size float
The size of the cubic subgrid (in \(\unicode{x212B}\)) from the residue center-of-mass that defines the local cryoEM density map for the residue. The default value is 8.0.
sigmacuttoff float
The cutoff distance (in terms of \(\sigma\)) from the position of each atom (or superatom) that the template is calculated up to. The default value is 5.
offset float
The offset from the grid voxel edge (in terms of grid voxel size) where the integrated template for the density for that voxel is assigned. The default value is 0.5.
atom_types numpy.ndarray
A 1-D array of strings for the elements present in a residue. The default is ['H', 'C', 'N', 'O', 'P', 'S'].
atom_weights numpy.ndarray
A 1-D array of elements-wise weights, normalized to C, in the same order as atom_types. The default is [0.05, 1., 1., 1., 2., 2.].

Output

Variable Descrption
record_prob_good numpy.ndarray
A 1-D array of the same size as the number of atoms in the residue, containing the probability \(P(M_0| \mathcal{Y})\) for a specific residue.
Note: Every element of the array contains the same probability value which is evaluated for the entire residue. This value is assigned to each atom of the residue.
record_ln_ev numpy.ndarray
A 2-D array whose size in the first dimension is equal to the number of atoms in the residue and second dimension is (adfs.size + blobs.size), containing the log-probabilities, \(\log P ({\mathcal{Y}|M_n})\), for the specific residue. record_ln_ev[:, :adfs.size] contains the log-probabilites for each \(\sigma_0\) in adfs and record_ln_ev[:, adfs.size:] contains the log-probabilites for each \(\sigma_1\) in blobs.
Note: Every record_ln_ev[i] contains the same 1D array which is evaluated for the entire residue. This 1D array is assigned to each atom of the residue.

bms_molecule

harp.bayes_model_select.bms_molecule (grid, data, mol, adfs = None, blobs = None, subgrid_size = 8., sigmacutoff = 5, offset= 0.5, emit = print, chains = None, atom_types = None, atom_weights = None)

This function runs the HARP calculation for a specific molecular model by calling bms_residue (see above) in a loop for all the residues in all the chains of the molecule.

Returns a tuple of (record_prob_good, record_ln_ev).

Arguments

Variable Descrption
grid harp.density.gridclass
The grid defining the voxel locations for a density map.
data numpy.ndarray
The experimental cryoEM density map for a molecule.
mol harp.molecule.atomcollection
The collection of atoms that correspond to the molecule.
adfs numpy.ndarray
A 1-D array of Gaussian width \(\sigma_0\) (in \(\unicode{x212B}\)) for the atomic profiles in \(M_0\). The default is an array of 10 \(\log_{10}\)-spaced points points between 0.25 and 1.0.
blobs numpy.ndarray
A 1-D array of Gaussian width \(\sigma_1\) (in \(\unicode{x212B}\)) for the residue super-atom profiles in \(M_1\). The default is an array of 20 \(\log_{10}\)-spaced points points between 0.25 and 2.8.
subgrid_size float
The size of the cubic subgrid (in \(\unicode{x212B}\)) from the residue center-of-mass that defines the local cryoEM density map for the residue. The default value is 8.0.
sigmacuttoff float
The cutoff distance (in terms of \(\sigma\)) from the position of each atom (or superatom) that the template is calculated up to. The default value is 5.
offset float
The offset from the grid voxel edge (in terms of grid voxel size) where the integrated template for the density for that voxel is assigned. The default value is 0.5.
emit Python function
The function which determines how the HARP result will be displayes. Default is print.
chains numpy.ndarray
A 1-D array of strings specifying the chains of the molecule the HARP calculation is executed for. The default is None, which corresponds to the calculation being run for all chains.
atom_types numpy.ndarray
A 1-D array of strings for the elements present in a residue. The default is ['H', 'C', 'N', 'O', 'P', 'S'].
atom_weights numpy.ndarray
A 1-D array of elements-wise weights, normalized to C, in the same order as atom_types. The default is [0.05, 1., 1., 1., 2., 2.].

Output

Variable Descrption
record_prob_good numpy.ndarray
A 1-D array of the same size as the number of atoms in the molecule, containing the probability \(P(M_0| \mathcal{Y})\) for each residue in the molecule.
Note: Every atom of a particular residue has the same probability value.
record_ln_ev numpy.ndarray
A 2-D array whose size in the first dimension is equal to the number of atoms in the molecule and second dimension is (adfs.size + blobs.size), containing the log-probabilities, \(\log P ({\mathcal{Y}|M_n})\), for each residue in the molecule. record_ln_ev[:, :adfs.size] contains the log-probabilites for each \(\sigma_0\) in adfs and record_ln_ev[:, adfs.size:] contains the log-probabilites for each \(\sigma_1\) in blobs.
Note: Every atom i of a particular residue has the same record_ln_ev[i, :].