Bayesian Model Selection

HARP executes Bayesian model selection (BMS) between an atomic level model (\(M_0\)) and a residue level model (\(M_1\)) to compute the probability, \(P(M_0 | \mathcal{Y})\), where \(\mathcal{Y}\) is the local cryoEM map for a residue., according to

\[ P(M_0 | \mathcal{Y}) = \frac{P(\mathcal{Y} | M_0) P(M_0)}{P(\mathcal{Y} | M_0) P(M_0) + P(\mathcal{Y} | M_1) P(M_1)} \] where the priors \(P(M_0) = P(M_1) = \frac{1}{2}\), and \(P(\mathcal{Y} | M_0)\) and \(P(\mathcal{Y} | M_1)\) are the likelihoods that the latent structural information in \(\mathcal{Y}\) is explained by an atomic level template (for \(M_0\)) or a residue level template (for \(M_1\)), according to Equation 2.2.2. in Ray, et al.

bms_residue

harp.bayes_model_select.bms_residue (grid, data, subresidue, adfs, blobs, subgrid_size=8., sigmacutoff=5, offset=0.5, atom_types=None, atom_weights=None)

This function runs the HARP calculation at the local residue-level by executing BMS between \(M_0\) and \(M_1\) for a single residue of the cryoEM structure. This function is typically called by bms_molecule (see below), and therefore, it is unlikely to be independently used.

Returns a tuple of (record_prob_good, record_ln_ev).

Arguments

Variable	Descrption
`grid`	`harp.density.gridclass` The grid defining the voxel locations for a density map.
`data`	`numpy.ndarray` The experimental cryoEM density map for a molecule.
`subresidue`	`harp.molecule.atomcollection` The collection of atoms that correspond to the specific residue.
`adfs`	`numpy.ndarray` A 1-D array of Gaussian width \(\sigma_0\) (in \(\unicode{x212B}\)) for the atomic profiles in \(M_0\). The default is an array of 10 \(\log_{10}\)-spaced points points between 0.25 and 1.0.
`blobs`	`numpy.ndarray` A 1-D array of Gaussian width \(\sigma_1\) (in \(\unicode{x212B}\)) for the residue super-atom profiles in \(M_1\). The default is an array of 20 \(\log_{10}\)-spaced points points between 0.25 and 2.8.
`subgrid_size`	`float` The size of the cubic subgrid (in \(\unicode{x212B}\)) from the residue center-of-mass that defines the local cryoEM density map for the residue. The default value is 8.0.
`sigmacuttoff`	`float` The cutoff distance (in terms of \(\sigma\)) from the position of each atom (or superatom) that the template is calculated up to. The default value is 5.
`offset`	`float` The offset from the grid voxel edge (in terms of grid voxel size) where the integrated template for the density for that voxel is assigned. The default value is 0.5.
`atom_types`	`numpy.ndarray` A 1-D array of strings for the elements present in a residue. The default is `['H', 'C', 'N', 'O', 'P', 'S']`.
`atom_weights`	`numpy.ndarray` A 1-D array of elements-wise weights, normalized to C, in the same order as `atom_types`. The default is [0.05, 1., 1., 1., 2., 2.].

Output

Variable Descrption

record_prob_good numpy.ndarray
A 1-D array of the same size as the number of atoms in the residue, containing the probability \(P(M_0| \mathcal{Y})\) for a specific residue.
Note: Every element of the array contains the same probability value which is evaluated for the entire residue. This value is assigned to each atom of the residue.

record_ln_ev numpy.ndarray
A 2-D array whose size in the first dimension is equal to the number of atoms in the residue and second dimension is (adfs.size + blobs.size), containing the log-probabilities, \(\log P ({\mathcal{Y}|M_n})\), for the specific residue. record_ln_ev[:, :adfs.size] contains the log-probabilites for each \(\sigma_0\) in adfs and record_ln_ev[:, adfs.size:] contains the log-probabilites for each \(\sigma_1\) in blobs.
Note: Every record_ln_ev[i] contains the same 1D array which is evaluated for the entire residue. This 1D array is assigned to each atom of the residue.

Variable	Descrption
`record_prob_good`	`numpy.ndarray` A 1-D array of the same size as the number of atoms in the residue, containing the probability \(P(M_0\| \mathcal{Y})\) for a specific residue. Note: Every element of the array contains the same probability value which is evaluated for the entire residue. This value is assigned to each atom of the residue.
`record_ln_ev`	`numpy.ndarray` A 2-D array whose size in the first dimension is equal to the number of atoms in the residue and second dimension is (`adfs.size + blobs.size`), containing the log-probabilities, \(\log P ({\mathcal{Y}\|M_n})\), for the specific residue. `record_ln_ev[:, :adfs.size]` contains the log-probabilites for each \(\sigma_0\) in `adfs` and `record_ln_ev[:, adfs.size:]` contains the log-probabilites for each \(\sigma_1\) in `blobs`. Note: Every `record_ln_ev[i]` contains the same 1D array which is evaluated for the entire residue. This 1D array is assigned to each atom of the residue.

bms_molecule

harp.bayes_model_select.bms_molecule (grid, data, mol, adfs = None, blobs = None, subgrid_size = 8., sigmacutoff = 5, offset= 0.5, emit = print, chains = None, atom_types = None, atom_weights = None)

This function runs the HARP calculation for a specific molecular model by calling bms_residue (see above) in a loop for all the residues in all the chains of the molecule.

Returns a tuple of (record_prob_good, record_ln_ev).

Arguments

Variable	Descrption
`grid`	`harp.density.gridclass` The grid defining the voxel locations for a density map.
`data`	`numpy.ndarray` The experimental cryoEM density map for a molecule.
`mol`	`harp.molecule.atomcollection` The collection of atoms that correspond to the molecule.
`adfs`	`numpy.ndarray` A 1-D array of Gaussian width \(\sigma_0\) (in \(\unicode{x212B}\)) for the atomic profiles in \(M_0\). The default is an array of 10 \(\log_{10}\)-spaced points points between 0.25 and 1.0.
`blobs`	`numpy.ndarray` A 1-D array of Gaussian width \(\sigma_1\) (in \(\unicode{x212B}\)) for the residue super-atom profiles in \(M_1\). The default is an array of 20 \(\log_{10}\)-spaced points points between 0.25 and 2.8.
`subgrid_size`	`float` The size of the cubic subgrid (in \(\unicode{x212B}\)) from the residue center-of-mass that defines the local cryoEM density map for the residue. The default value is 8.0.
`sigmacuttoff`	`float` The cutoff distance (in terms of \(\sigma\)) from the position of each atom (or superatom) that the template is calculated up to. The default value is 5.
`offset`	`float` The offset from the grid voxel edge (in terms of grid voxel size) where the integrated template for the density for that voxel is assigned. The default value is 0.5.
`emit`	Python `function` The function which determines how the HARP result will be displayes. Default is `print`.
`chains`	`numpy.ndarray` A 1-D array of strings specifying the chains of the molecule the HARP calculation is executed for. The default is `None`, which corresponds to the calculation being run for all chains.
`atom_types`	`numpy.ndarray` A 1-D array of strings for the elements present in a residue. The default is `['H', 'C', 'N', 'O', 'P', 'S']`.
`atom_weights`	`numpy.ndarray` A 1-D array of elements-wise weights, normalized to C, in the same order as `atom_types`. The default is [0.05, 1., 1., 1., 2., 2.].

Output

Variable Descrption

record_prob_good numpy.ndarray
A 1-D array of the same size as the number of atoms in the molecule, containing the probability \(P(M_0| \mathcal{Y})\) for each residue in the molecule.
Note: Every atom of a particular residue has the same probability value.

record_ln_ev numpy.ndarray
A 2-D array whose size in the first dimension is equal to the number of atoms in the molecule and second dimension is (adfs.size + blobs.size), containing the log-probabilities, \(\log P ({\mathcal{Y}|M_n})\), for each residue in the molecule. record_ln_ev[:, :adfs.size] contains the log-probabilites for each \(\sigma_0\) in adfs and record_ln_ev[:, adfs.size:] contains the log-probabilites for each \(\sigma_1\) in blobs.
Note: Every atom i of a particular residue has the same record_ln_ev[i, :].

Variable	Descrption
`record_prob_good`	`numpy.ndarray` A 1-D array of the same size as the number of atoms in the molecule, containing the probability \(P(M_0\| \mathcal{Y})\) for each residue in the molecule. Note: Every atom of a particular residue has the same probability value.
`record_ln_ev`	`numpy.ndarray` A 2-D array whose size in the first dimension is equal to the number of atoms in the molecule and second dimension is (`adfs.size + blobs.size`), containing the log-probabilities, \(\log P ({\mathcal{Y}\|M_n})\), for each residue in the molecule. `record_ln_ev[:, :adfs.size]` contains the log-probabilites for each \(\sigma_0\) in `adfs` and `record_ln_ev[:, adfs.size:]` contains the log-probabilites for each \(\sigma_1\) in `blobs`. Note: Every atom `i` of a particular residue has the same `record_ln_ev[i, :]`.