Molecules

This page describes how molecular models are handled in HARP.

atomcollection

class harp.molecule.atomcollection (atomid, resid, resname, atomnamechain, element, conf, xyz, occupancy, bfactor, hetatom, modelnum, authresid=None)

An atomcollection object stores all information about the molecular models and the methods to manipulate them in HARP.

Attributes

The attributes are named following the PDB naming convention

Variable	Descrption
`self.natoms`	`int` Number of atoms in the molecule.
`self.atomid`	`numpy.ndarray` A 1-D `int` array of size `natoms`, containing a numerical id for each individual atom in the molecule.
`self.ind`	`numpy.ndarray` A 1-D `int` array of size `natoms`, containing the index for each individual atom in the molecule. Is equal to `numpy.arange(natoms)`. Used internally in HARP for indexing.
`self.resid`	`numpy.ndarray` A 1-D `int` array of size `natoms`, containing a numerical id for the residue for individual atom in the molecule.
`self.resname`	`numpy.ndarray` A 1-D `string` array of size `natoms`, containing the identity of the residue for each individual atom in the molecule (e.g., Gly, Ala, etc.).
`self.element`	`numpy.ndarray` A 1-D `string` array of size `natoms`, containing the element identity for each individual atom in the molecule (e.g., C, H, O, N, etc.).
`self.atomname`	`numpy.ndarray` A 1-D `string` array of size `natoms`, containing the atom identity of each individual atom in a particular residue in the molecule (e.g., CA, CB, N, etc.).
`self.chain`	`numpy.ndarray` A 1-D `string` array of size `natoms`, containing the identity of the chain for each individual atom in the molecule.
`self.xyz`	`numpy.ndarray` A 2-D `double` array of size `[natoms, 3]`, containing the Cartesian co-ordinates of each individual atom in the molecule. Note: This must be a `double` array!
`self.occupancy`	`numpy.ndarray` A 1-D `double` array of size `natoms`, containing the occupancy for each individual atom in the molecule.
`self.bfactor`	`numpy.ndarray` A 1-D `double` array of size `natoms`, containing the B-factor for each individual atom in the molecule.
`self.hetatom`	`numpy.ndarray` A 1-D `bool` array of size `natoms`, indicating whether the atom is a heteroatom entry.
`self.authresid`	`numpy.ndarray` A 1-D `int` array of size `natoms`, containing the author-provided id for the residue for individual atom in the molecule. Defaults to `resid` if not provided.
`self.conf`	`numpy.ndarray` A 1-D `string` array of size `natoms`, containing the conformation for individual atom in the molecule.
`self.modelnum`	`numpy.ndarray` A 1-D `int` array of size `natoms`, containing the model number for individual atom in the molecule.
`self.unique_residues`	`numpy.ndarray` A 1-D `int` array of the unique `resid` in the molecule.
`self.unique_chains`	`numpy.ndarray` A 1-D `string` array of the unique `chain` in the molecule.
`self.unique_confs`	`numpy.ndarray` A 1-D `string` array of the unique `conf` in the molecule.

Functions

The functions associated with atomcollection are

Function	Descrption
`self.get_residue`	Returns a particular residue. Input: `int` number (the specific `resid` to be returned). Output: `atomcollection`.
`self.get_chain`	Returns a particular chain. Input: `string` chain (the specific `chain` to be returned). Output: `atomcollection`.
`self.get_chains`	Returns multiple chains. Input: `numpy.ndarray`/`list` chains (the array/list of `chain` to be returned). Output: `atomcollection`.
`self.get_atomname`	Returns a collection of specific atoms based on their identity in residues. Input: `string` atomname (the specific `atomname` to be returned). Output: `atomcollection`.
`self.get_atomids`	Returns a collection of specific atoms. Input: `numpy.ndarray`/`list` atomids (the array/list of `atomid` to be returned). Output: `atomcollection`.
`self.get_conformation`	Returns a specific conformation of the model. Input: `string` conf (the specifc `conf` to be returned). Output: `atomcollection`.
`self.dehydrogen`	Removes hydrogens from `self` Input: None. Output: `atomcollection`.
`self.remove_hetatoms`	Removes heteroatoms from `self` Input: None. Output: `atomcollection`.
`self.split_residue`	Splits an `atomcollection` of a single residue into component parts. For a nucleotide, returns a list of phosphate, sugar, and base. For an amino-acid, returns backbone and sidechain. Input: None (Note: `self` must be a single residue!). Output: A `list` of `atomcollection`.
`self.com`	Returns the Cartesian co-ordinate center-of-mass (centroid) of an `atomcollection`, i.e. the mean of `self.xyz`. Input: None. Output: A `numpy.ndarray` of size 3.

A separate function is involved in loading an atomcollection from a file.

load

harp.molecule.load (fname, only_polymers = False, firstmodel = True, authid = False)

Argument

Variable	Descrption
`fname`	`string` The filename of the mmCIF file to load. Note: This function currently only handles the file suffixes `.mmcif`, `.cif`, or_ `.cif.gz`.
`only_polymers`	`bool` A flag for whether to only use entities labeled as polymers (i.e. not water, ions, metals, or ligands, etc.).
`firstmodel`	`bool` A flag for whether to take the only first model. Useful for an ensemble of models (e.g., as in NMR).
`authid`	`bool` A flag for whether to use authid as `atomcollection.resid` or not. Useful for when people populate the wrong column (e.g., during model building).

Output

Variable	Descrption
`atomcollection`	`atomcollection` Contains the molecular model written in the .mmCIF file