Molecules

This page describes how molecular models are handled in HARP.

atomcollection

class harp.molecule.atomcollection (atomid, resid, resname, atomnamechain, element, conf, xyz, occupancy, bfactor, hetatom, modelnum, authresid=None)

An atomcollection object stores all information about the molecular models and the methods to manipulate them in HARP.

Attributes

The attributes are named following the PDB naming convention

Variable Descrption
self.natoms int
Number of atoms in the molecule.
self.atomid numpy.ndarray
A 1-D int array of size natoms, containing a numerical id for each individual atom in the molecule.
self.ind numpy.ndarray
A 1-D int array of size natoms, containing the index for each individual atom in the molecule. Is equal to numpy.arange(natoms). Used internally in HARP for indexing.
self.resid numpy.ndarray
A 1-D int array of size natoms, containing a numerical id for the residue for individual atom in the molecule.
self.resname numpy.ndarray
A 1-D string array of size natoms, containing the identity of the residue for each individual atom in the molecule (e.g., Gly, Ala, etc.).
self.element numpy.ndarray
A 1-D string array of size natoms, containing the element identity for each individual atom in the molecule (e.g., C, H, O, N, etc.).
self.atomname numpy.ndarray
A 1-D string array of size natoms, containing the atom identity of each individual atom in a particular residue in the molecule (e.g., CA, CB, N, etc.).
self.chain numpy.ndarray
A 1-D string array of size natoms, containing the identity of the chain for each individual atom in the molecule.
self.xyz numpy.ndarray
A 2-D double array of size [natoms, 3], containing the Cartesian co-ordinates of each individual atom in the molecule.
Note: This must be a double array!
self.occupancy numpy.ndarray
A 1-D double array of size natoms, containing the occupancy for each individual atom in the molecule.
self.bfactor numpy.ndarray
A 1-D double array of size natoms, containing the B-factor for each individual atom in the molecule.
self.hetatom numpy.ndarray
A 1-D bool array of size natoms, indicating whether the atom is a heteroatom entry.
self.authresid numpy.ndarray
A 1-D int array of size natoms, containing the author-provided id for the residue for individual atom in the molecule. Defaults to resid if not provided.
self.conf numpy.ndarray
A 1-D string array of size natoms, containing the conformation for individual atom in the molecule.
self.modelnum numpy.ndarray
A 1-D int array of size natoms, containing the model number for individual atom in the molecule.
self.unique_residues numpy.ndarray
A 1-D int array of the unique resid in the molecule.
self.unique_chains numpy.ndarray
A 1-D string array of the unique chain in the molecule.
self.unique_confs numpy.ndarray
A 1-D string array of the unique conf in the molecule.

Functions

The functions associated with atomcollection are

Function Descrption
self.get_residue Returns a particular residue.
Input: int number (the specific resid to be returned).
Output: atomcollection.
self.get_chain Returns a particular chain.
Input: string chain (the specific chain to be returned).
Output: atomcollection.
self.get_chains Returns multiple chains.
Input: numpy.ndarray/list chains (the array/list of chain to be returned).
Output: atomcollection.
self.get_atomname Returns a collection of specific atoms based on their identity in residues.
Input: string atomname (the specific atomname to be returned).
Output: atomcollection.
self.get_atomids Returns a collection of specific atoms.
Input: numpy.ndarray/list atomids (the array/list of atomid to be returned).
Output: atomcollection.
self.get_conformation Returns a specific conformation of the model.
Input: string conf (the specifc conf to be returned).
Output: atomcollection.
self.dehydrogen Removes hydrogens from self
Input: None.
Output: atomcollection.
self.remove_hetatoms Removes heteroatoms from self
Input: None.
Output: atomcollection.
self.split_residue Splits an atomcollection of a single residue into component parts. For a nucleotide, returns a list of phosphate, sugar, and base. For an amino-acid, returns backbone and sidechain.
Input: None (Note: self must be a single residue!).
Output: A list of atomcollection.
self.com Returns the Cartesian co-ordinate center-of-mass (centroid) of an atomcollection, i.e. the mean of self.xyz.
Input: None.
Output: A numpy.ndarray of size 3.

A separate function is involved in loading an atomcollection from a file.

load

harp.molecule.load (fname, only_polymers = False, firstmodel = True, authid = False)

Argument

Variable Descrption
fname string
The filename of the mmCIF file to load.
Note: This function currently only handles the file suffixes .mmcif, .cif, or_ .cif.gz.
only_polymers bool
A flag for whether to only use entities labeled as polymers (i.e. not water, ions, metals, or ligands, etc.).
firstmodel bool
A flag for whether to take the only first model. Useful for an ensemble of models (e.g., as in NMR).
authid bool
A flag for whether to use authid as atomcollection.resid or not. Useful for when people populate the wrong column (e.g., during model building).

Output

Variable Descrption
atomcollection atomcollection
Contains the molecular model written in the .mmCIF file