pyemma.coordinates.featurizer

pyemma.coordinates.featurizer(topfile)

Featurizer to select features from MD data.

Parameters:topfile (str or mdtraj.Topology instance) – path to topology file (e.g pdb file) or a mdtraj.Topology object
Returns:feat
Return type:Featurizer

Examples

Create a featurizer and add backbone torsion angles to active features. Then use it in source()

>>> import pyemma.coordinates 
>>> feat = pyemma.coordinates.featurizer('my_protein.pdb') 
>>> feat.add_backbone_torsions() 
>>> reader = pyemma.coordinates.source(["my_traj01.xtc", "my_traj02.xtc"], features=feat) 

or

>>> traj = mdtraj.load('my_protein.pdb') # 
>>> feat = pyemma.coordinates.featurizer(traj.topology) 
class pyemma.coordinates.data.featurization.featurizer.MDFeaturizer(topfile, use_cache=True)

Extracts features from MD trajectories.

Methods

add_all() Adds all atom coordinates to the feature list.
add_angles(indexes[, deg, cossin, periodic]) Adds the list of angles to the feature list
add_backbone_torsions([selstr, deg, cossin, ...]) Adds all backbone phi/psi angles or the ones specified in selstr to the feature list.
add_chi1_torsions([selstr, deg, cossin, ...]) Adds all chi1 angles or the ones specified in selstr to the feature list.
add_contacts(indices[, indices2, threshold, ...]) Adds the contacts to the feature list.
add_custom_feature(feature) Adds a custom feature to the feature list.
add_custom_func(func, dim, *args, **kwargs) adds a user defined function to extract features
add_dihedrals(indexes[, deg, cossin, periodic]) Adds the list of dihedrals to the feature list
add_distances(indices[, periodic, indices2]) Adds the distances between atoms to the feature list.
add_distances_ca([periodic, excluded_neighbors]) Adds the distances between all Ca’s to the feature list.
add_group_mindist(group_definitions[, ...]) Adds the minimum distance between groups of atoms to the feature list.
add_inverse_distances(indices[, periodic, ...]) Adds the inverse distances between atoms to the feature list.
add_minrmsd_to_ref(ref[, ref_frame, ...]) Adds the minimum root-mean-square-deviation (minrmsd) with respect to a reference structure to the feature list.
add_residue_mindist([residue_pairs, scheme, ...]) Adds the minimum distance between residues to the feature list.
add_selection(indexes) Adds the coordinates of the selected atom indexes to the feature list.
describe() Returns a list of strings, one for each feature selected, with human-readable descriptions of the features.
dimension() current dimension due to selected features
pairs(sel[, excluded_neighbors]) Creates all pairs between indexes.
select(selstring) Returns the indexes of atoms matching the given selection
select_Backbone() Returns the indexes of backbone C, CA and N atoms
select_Ca() Returns the indexes of all Ca-atoms
select_Heavy([exclude_symmetry_related]) Returns the indexes of all heavy atoms (Mass >= 2), optionally excluding symmetry-related heavy atoms.
transform(traj) Maps an mdtraj Trajectory object to the selected output features

Attributes

logger The logger for this class instance
name The name of this instance
add_all()

Adds all atom coordinates to the feature list. The coordinates are flattened as follows: [x1, y1, z1, x2, y2, z2, ...]

add_angles(indexes, deg=False, cossin=False, periodic=True)

Adds the list of angles to the feature list

Parameters:
  • indexes (np.ndarray, shape=(num_pairs, 3), dtype=int) – an array with triplets of atom indices
  • deg (bool, optional, default = False) – If False (default), angles will be computed in radians. If True, angles will be computed in degrees.
  • cossin (bool, optional, default = False) – If True, each angle will be returned as a pair of (sin(x), cos(x)). This is useful, if you calculate the mean (e.g TICA/PCA, clustering) in that space.
  • periodic (bool, optional, default = True) – If periodic is True and the trajectory contains unitcell information, we will treat dihedrals that cross periodic images using the minimum image convention.
add_backbone_torsions(selstr=None, deg=False, cossin=False, periodic=True)

Adds all backbone phi/psi angles or the ones specified in selstr to the feature list.

Parameters:
  • selstr (str, optional, default = "") – selection string specifying the atom selection used to specify a specific set of backbone angles If “” (default), all phi/psi angles found in the topology will be computed
  • deg (bool, optional, default = False) – If False (default), angles will be computed in radians. If True, angles will be computed in degrees.
  • cossin (bool, optional, default = False) – If True, each angle will be returned as a pair of (sin(x), cos(x)). This is useful, if you calculate the mean (e.g TICA/PCA, clustering) in that space.
  • periodic (bool, optional, default = True) – If periodic is True and the trajectory contains unitcell information, we will treat dihedrals that cross periodic images using the minimum image convention.
add_chi1_torsions(selstr='', deg=False, cossin=False, periodic=True)

Adds all chi1 angles or the ones specified in selstr to the feature list.

Parameters:
  • selstr (str, optional, default = "") – selection string specifying the atom selection used to specify a specific set of backbone angles If “” (default), all chi1 angles found in the topology will be computed
  • deg (bool, optional, default = False) – If False (default), angles will be computed in radians. If True, angles will be computed in degrees.
  • cossin (bool, optional, default = False) – If True, each angle will be returned as a pair of (sin(x), cos(x)). This is useful, if you calculate the mean (e.g TICA/PCA, clustering) in that space.
  • periodic (bool, optional, default = True) – If periodic is True and the trajectory contains unitcell information, we will treat dihedrals that cross periodic images using the minimum image convention.
add_contacts(indices, indices2=None, threshold=0.3, periodic=True, count_contacts=False)

Adds the contacts to the feature list.

Parameters:
  • indices (can be of two types:) –
    ndarray((n, 2), dtype=int):
    n x 2 array with the pairs of atoms between which the contacts shall be computed
    iterable of integers (either list or ndarray(n, dtype=int)):
    indices (not pairs of indices) of the atoms between which the contacts shall be computed.
  • indices2 (iterable of integers (either list or ndarray(n, dtype=int)), optional:) – Only has effect if indices is an iterable of integers. Instead of the above behaviour, only the contacts between the atoms in indices and indices2 will be computed.
  • threshold (float, optional, default = .3) – distances below this threshold (in nm) will result in a feature 1.0, distances above will result in 0.0. The default is set to .3 nm (3 Angstrom)
  • periodic (boolean, default True) – use the minimum image convention if unitcell information is available
  • count_contacts (boolean, default False) – If set to true, this feature will return the number of formed contacts (and not feature values with either 1.0 or 0) The ouput of this feature will be of shape (Nt,1), and not (Nt, nr_of_contacts)
:param .. note::: When using the iterable of integers input, indices and indices2
will be sorted numerically and made unique before converting them to a pairlist. Please look carefully at the output of describe() to see what features exactly have been added.
add_custom_feature(feature)

Adds a custom feature to the feature list.

Parameters:feature (object) – an object with interface like CustomFeature (map, describe methods)
add_custom_func(func, dim, *args, **kwargs)

adds a user defined function to extract features

Parameters:
  • func (function) – a user-defined function, which accepts mdtraj.Trajectory object as first parameter and as many optional and named arguments as desired. Has to return a numpy.ndarray ndim=2.
  • dim (int) – output dimension of function
  • args (any number of positional arguments) – these have to be in the same order as func is expecting them
  • kwargs (dictionary) – named arguments passed to func

Notes

You can pass a description list to describe the output of your function by element, by passing a list of strings with the same lengths as dimensions. Alternatively a single element list or str will be expanded to match the output dimension.

add_dihedrals(indexes, deg=False, cossin=False, periodic=True)

Adds the list of dihedrals to the feature list

Parameters:
  • indexes (np.ndarray, shape=(num_pairs, 4), dtype=int) – an array with quadruplets of atom indices
  • deg (bool, optional, default = False) – If False (default), angles will be computed in radians. If True, angles will be computed in degrees.
  • cossin (bool, optional, default = False) – If True, each angle will be returned as a pair of (sin(x), cos(x)). This is useful, if you calculate the mean (e.g TICA/PCA, clustering) in that space.
  • periodic (bool, optional, default = True) – If periodic is True and the trajectory contains unitcell information, we will treat dihedrals that cross periodic images using the minimum image convention.
add_distances(indices, periodic=True, indices2=None)

Adds the distances between atoms to the feature list.

Parameters:
  • indices (can be of two types:) –
    ndarray((n, 2), dtype=int):
    n x 2 array with the pairs of atoms between which the distances shall be computed
    iterable of integers (either list or ndarray(n, dtype=int)):
    indices (not pairs of indices) of the atoms between which the distances shall be computed.
  • periodic (optional, boolean, default is True) – If periodic is True and the trajectory contains unitcell information, distances will be computed under the minimum image convention.
  • indices2 (iterable of integers (either list or ndarray(n, dtype=int)), optional:) – Only has effect if indices is an iterable of integers. Instead of the above behaviour, only the distances between the atoms in indices and indices2 will be computed.

Note

When using the iterable of integers input, indices and indices2 will be sorted numerically and made unique before converting them to a pairlist. Please look carefully at the output of describe() to see what features exactly have been added.

add_distances_ca(periodic=True, excluded_neighbors=2)

Adds the distances between all Ca’s to the feature list.

Parameters:
  • periodic (boolean, default is True) – Use the minimum image convetion when computing distances
  • excluded_neighbors (int, default is 2) – Number of exclusions when compiling the list of pairs. Two CA-atoms are considered neighbors if they belong to adjacent residues.
add_group_mindist(group_definitions, group_pairs='all', threshold=None, periodic=True)

Adds the minimum distance between groups of atoms to the feature list. If the groups of atoms are identical to residues, use add_residue_mindist.

Parameters:
  • group_definition (list of 1D-arrays/iterables containing the group definitions via atom indices.) – If there is only one group_definition, it is assumed the minimum distance within this group (excluding the self-distance) is wanted. In this case, group_pairs is ignored.
  • group_pairs (Can be of two types:) –
    ‘all’
    Computes minimum distances between all pairs of groups contained in the group definitions
    ndarray((n, 2), dtype=int):
    n x 2 array with the pairs of groups for which the minimum distances will be computed.
  • threshold (float, optional, default is None) – distances below this threshold (in nm) will result in a feature 1.0, distances above will result in 0.0. If left to None, the numerical value will be returned
  • periodic (bool, optional, default = True) – If periodic is True and the trajectory contains unitcell information, we will treat dihedrals that cross periodic images using the minimum image convention.
add_inverse_distances(indices, periodic=True, indices2=None)

Adds the inverse distances between atoms to the feature list.

Parameters:
  • indices (can be of two types:) –
    ndarray((n, 2), dtype=int):
    n x 2 array with the pairs of atoms between which the inverse distances shall be computed
    iterable of integers (either list or ndarray(n, dtype=int)):
    indices (not pairs of indices) of the atoms between which the inverse distances shall be computed.
  • periodic (optional, boolean, default is True) – If periodic is True and the trajectory contains unitcell information, distances will be computed under the minimum image convention.
  • indices2 (iterable of integers (either list or ndarray(n, dtype=int)), optional:) – Only has effect if indices is an iterable of integers. Instead of the above behaviour, only the inverse distances between the atoms in indices and indices2 will be computed.

Note

When using the iterable of integers input, indices and indices2 will be sorted numerically and made unique before converting them to a pairlist. Please look carefully at the output of describe() to see what features exactly have been added.

add_minrmsd_to_ref(ref, ref_frame=0, atom_indices=None, precentered=False)

Adds the minimum root-mean-square-deviation (minrmsd) with respect to a reference structure to the feature list.

Parameters:
  • ref

    Reference structure for computing the minrmsd. Can be of two types:

    1. mdtraj.Trajectory object
    2. filename for mdtraj to load. In this case, only the ref_frame of that file will be used.
  • ref_frame (integer, default=0) – Reference frame of the filename specified in ref. This parameter has no effect if ref is not a filename.
  • atom_indices (array_like, default=None) –

    Atoms that will be used for:

    1. aligning the target and reference geometries.
    2. computing rmsd after the alignment.

    If left to None, all atoms of ref will be used.

  • precentered (bool, default=False) – Use this boolean at your own risk to let mdtraj know that the target conformations are already centered at the origin, i.e., their (uniformly weighted) center of mass lies at the origin. This will speed up the computation of the rmsd.
add_residue_mindist(residue_pairs='all', scheme='closest-heavy', ignore_nonprotein=True, threshold=None, periodic=True)

Adds the minimum distance between residues to the feature list. See below how the minimum distance can be defined. If the topology generated out of topfile contains information on periodic boundary conditions, the minimum image convention will be used when computing distances.

Parameters:
  • residue_pairs (can be of two types:) –
    ‘all’
    Computes distances between all pairs of residues excluding first and second neighbors
    ndarray((n, 2), dtype=int):
    n x 2 array with the pairs residues for which distances will be computed
  • scheme ('ca', 'closest', 'closest-heavy', default is closest-heavy) – Within a residue, determines the sub-group atoms that will be considered when computing distances
  • ignore_nonprotein (boolean, default True) – Ignore residues that are not of protein type (e.g. water molecules, post-traslational modifications etc)
  • threshold (float, optional, default is None) – distances below this threshold (in nm) will result in a feature 1.0, distances above will result in 0.0. If left to None, the numerical value will be returned
  • periodic (bool, optional, default = True) – If periodic is True and the trajectory contains unitcell information, we will treat dihedrals that cross periodic images using the minimum image convention.
:param .. note::: Using scheme = ‘closest’ or ‘closest-heavy’ with residue pairs = ‘all’
will compute nearly all interatomic distances, for every frame, before extracting the closest pairs. This can be very time consuming. Those schemes are intended to be used with a subset of residues chosen via residue_pairs.
add_selection(indexes)

Adds the coordinates of the selected atom indexes to the feature list. The coordinates of the selection [1, 2, ...] are flattened as follows: [x1, y1, z1, x2, y2, z2, ...]

Parameters:indexes (ndarray((n), dtype=int)) – array with selected atom indexes
describe()

Returns a list of strings, one for each feature selected, with human-readable descriptions of the features.

Returns:labels – An ordered list of strings, one for each feature selected, with human-readable descriptions of the features.
Return type:list of str
dimension()

current dimension due to selected features

Returns:dim – total dimension due to all selection features
Return type:int
logger

The logger for this class instance

name

The name of this instance

static pairs(sel, excluded_neighbors=0)

Creates all pairs between indexes. Will exclude closest neighbors up to excluded_neighbors The self-pair (i,i) is always excluded

Parameters:
  • sel (ndarray((n), dtype=int)) – array with selected atom indexes
  • excluded_neighbors (int, default = 0) – number of neighbors that will be excluded when creating the pairs
Returns:

sel – m x 2 array with all pair indexes between different atoms that are at least excluded_neighbors indexes apart, i.e. if i is the index of an atom, the pairs [i,i-2], [i,i-1], [i,i], [i,i+1], [i,i+2], will not be in sel (n=excluded_neighbors) if excluded_neighbors = 2. Moreover, the list is non-redundant,i.e. if [i,j] is in sel, then [j,i] is not.

Return type:

ndarray((m,2), dtype=int)

select(selstring)

Returns the indexes of atoms matching the given selection

Parameters:selstring (str) – Selection string. See mdtraj documentation for details: http://mdtraj.org/latest/atom_selection.html
Returns:indexes – array with selected atom indexes
Return type:ndarray((n), dtype=int)
select_Backbone()

Returns the indexes of backbone C, CA and N atoms

Returns:indexes – array with selected atom indexes
Return type:ndarray((n), dtype=int)
select_Ca()

Returns the indexes of all Ca-atoms

Returns:indexes – array with selected atom indexes
Return type:ndarray((n), dtype=int)
select_Heavy(exclude_symmetry_related=False)

Returns the indexes of all heavy atoms (Mass >= 2), optionally excluding symmetry-related heavy atoms.

Parameters:exclude_symmetry_related (boolean, default=False) – if True, exclude symmetry-related heavy atoms.
Returns:indexes – array with selected atom indexes
Return type:ndarray((n), dtype=int)
transform(traj)

Maps an mdtraj Trajectory object to the selected output features

Parameters:traj (mdtraj Trajectory) – Trajectory object used as an input
Returns:out – Output features: For each of T time steps in the given trajectory, a vector with all n output features selected.
Return type:ndarray((T, n), dtype=float32)