Bayesian Model Selection
HARP executes Bayesian model selection (BMS) between an atomic level model (\(M_0\)) and a residue level model (\(M_1\)) to compute the probability, \(P(M_0 | \mathcal{Y})\), where \(\mathcal{Y}\) is the local cryoEM map for a residue., according to
\[ P(M_0 | \mathcal{Y}) = \frac{P(\mathcal{Y} | M_0) P(M_0)}{P(\mathcal{Y} | M_0) P(M_0) + P(\mathcal{Y} | M_1) P(M_1)} \] where the priors \(P(M_0) = P(M_1) = \frac{1}{2}\), and \(P(\mathcal{Y} | M_0)\) and \(P(\mathcal{Y} | M_1)\) are the likelihoods that the latent structural information in \(\mathcal{Y}\) is explained by an atomic level template (for \(M_0\)) or a residue level template (for \(M_1\)), according to Equation 2.2.2. in Ray, et al.
bms_residue
harp.bayes_model_select.bms_residue (grid, data, subresidue, adfs, blobs, subgrid_size=8., sigmacutoff=5, offset=0.5, atom_types=None, atom_weights=None)
This function runs the HARP calculation at the local residue-level by executing BMS between \(M_0\) and \(M_1\) for a single residue of the cryoEM structure. This function is typically called by bms_molecule
(see below), and therefore, it is unlikely to be independently used.
Returns a tuple of (record_prob_good, record_ln_ev)
.
Arguments
Variable | Descrption |
---|---|
grid |
harp.density.gridclass The grid defining the voxel locations for a density map. |
data |
numpy.ndarray The experimental cryoEM density map for a molecule. |
subresidue |
harp.molecule.atomcollection The collection of atoms that correspond to the specific residue. |
adfs |
numpy.ndarray A 1-D array of Gaussian width \(\sigma_0\) (in \(\unicode{x212B}\)) for the atomic profiles in \(M_0\). The default is an array of 10 \(\log_{10}\)-spaced points points between 0.25 and 1.0. |
blobs |
numpy.ndarray A 1-D array of Gaussian width \(\sigma_1\) (in \(\unicode{x212B}\)) for the residue super-atom profiles in \(M_1\). The default is an array of 20 \(\log_{10}\)-spaced points points between 0.25 and 2.8. |
subgrid_size |
float The size of the cubic subgrid (in \(\unicode{x212B}\)) from the residue center-of-mass that defines the local cryoEM density map for the residue. The default value is 8.0. |
sigmacuttoff |
float The cutoff distance (in terms of \(\sigma\)) from the position of each atom (or superatom) that the template is calculated up to. The default value is 5. |
offset |
float The offset from the grid voxel edge (in terms of grid voxel size) where the integrated template for the density for that voxel is assigned. The default value is 0.5. |
atom_types |
numpy.ndarray A 1-D array of strings for the elements present in a residue. The default is ['H', 'C', 'N', 'O', 'P', 'S'] . |
atom_weights |
numpy.ndarray A 1-D array of elements-wise weights, normalized to C, in the same order as atom_types . The default is [0.05, 1., 1., 1., 2., 2.]. |
Output
Variable | Descrption |
---|---|
record_prob_good |
numpy.ndarray A 1-D array of the same size as the number of atoms in the residue, containing the probability \(P(M_0| \mathcal{Y})\) for a specific residue. Note: Every element of the array contains the same probability value which is evaluated for the entire residue. This value is assigned to each atom of the residue. |
record_ln_ev |
numpy.ndarray A 2-D array whose size in the first dimension is equal to the number of atoms in the residue and second dimension is ( adfs.size + blobs.size ), containing the log-probabilities, \(\log P ({\mathcal{Y}|M_n})\), for the specific residue. record_ln_ev[:, :adfs.size] contains the log-probabilites for each \(\sigma_0\) in adfs and record_ln_ev[:, adfs.size:] contains the log-probabilites for each \(\sigma_1\) in blobs . Note: Every record_ln_ev[i] contains the same 1D array which is evaluated for the entire residue. This 1D array is assigned to each atom of the residue. |
bms_molecule
harp.bayes_model_select.bms_molecule (grid, data, mol, adfs = None, blobs = None, subgrid_size = 8., sigmacutoff = 5, offset= 0.5, emit = print, chains = None, atom_types = None, atom_weights = None)
This function runs the HARP calculation for a specific molecular model by calling bms_residue
(see above) in a loop for all the residues in all the chains of the molecule.
Returns a tuple of (record_prob_good, record_ln_ev)
.
Arguments
Variable | Descrption |
---|---|
grid |
harp.density.gridclass The grid defining the voxel locations for a density map. |
data |
numpy.ndarray The experimental cryoEM density map for a molecule. |
mol |
harp.molecule.atomcollection The collection of atoms that correspond to the molecule. |
adfs |
numpy.ndarray A 1-D array of Gaussian width \(\sigma_0\) (in \(\unicode{x212B}\)) for the atomic profiles in \(M_0\). The default is an array of 10 \(\log_{10}\)-spaced points points between 0.25 and 1.0. |
blobs |
numpy.ndarray A 1-D array of Gaussian width \(\sigma_1\) (in \(\unicode{x212B}\)) for the residue super-atom profiles in \(M_1\). The default is an array of 20 \(\log_{10}\)-spaced points points between 0.25 and 2.8. |
subgrid_size |
float The size of the cubic subgrid (in \(\unicode{x212B}\)) from the residue center-of-mass that defines the local cryoEM density map for the residue. The default value is 8.0. |
sigmacuttoff |
float The cutoff distance (in terms of \(\sigma\)) from the position of each atom (or superatom) that the template is calculated up to. The default value is 5. |
offset |
float The offset from the grid voxel edge (in terms of grid voxel size) where the integrated template for the density for that voxel is assigned. The default value is 0.5. |
emit |
Python function The function which determines how the HARP result will be displayes. Default is print . |
chains |
numpy.ndarray A 1-D array of strings specifying the chains of the molecule the HARP calculation is executed for. The default is None , which corresponds to the calculation being run for all chains. |
atom_types |
numpy.ndarray A 1-D array of strings for the elements present in a residue. The default is ['H', 'C', 'N', 'O', 'P', 'S'] . |
atom_weights |
numpy.ndarray A 1-D array of elements-wise weights, normalized to C, in the same order as atom_types . The default is [0.05, 1., 1., 1., 2., 2.]. |
Output
Variable | Descrption |
---|---|
record_prob_good |
numpy.ndarray A 1-D array of the same size as the number of atoms in the molecule, containing the probability \(P(M_0| \mathcal{Y})\) for each residue in the molecule. Note: Every atom of a particular residue has the same probability value. |
record_ln_ev |
numpy.ndarray A 2-D array whose size in the first dimension is equal to the number of atoms in the molecule and second dimension is ( adfs.size + blobs.size ), containing the log-probabilities, \(\log P ({\mathcal{Y}|M_n})\), for each residue in the molecule. record_ln_ev[:, :adfs.size] contains the log-probabilites for each \(\sigma_0\) in adfs and record_ln_ev[:, adfs.size:] contains the log-probabilites for each \(\sigma_1\) in blobs . Note: Every atom i of a particular residue has the same record_ln_ev[i, :] . |