pyemma.coordinates.data.DataInMemory

class pyemma.coordinates.data.DataInMemory(data, chunksize=5000)

multi-dimensional data fully stored in memory.

Used to pass arbitrary coordinates to pipeline. Data is being flattened to two dimensions to ensure it is compatible.

Parameters:data (ndarray (nframe, ndim) or list of ndarrays (nframe, ndim)) – Data has to be either one 2d array which stores amount of frames in first dimension and coordinates/features in second dimension or a list of this arrays.
__init__(data, chunksize=5000)

Methods

__init__(data[, chunksize])
describe()
dimension() Returns the number of output dimensions
get_output([dimensions, stride]) Maps all input data of this transformer and returns it as an array or list of arrays.
iterator([stride, lag]) Returns an iterator that allows to access the transformed data.
load_from_files(files) construct this by loading all files into memory
map(X)
n_frames_total([stride]) Returns the total number of frames, over all trajectories
number_of_trajectories() Returns the number of trajectories
output_type() By default transformers return single precision floats.
parametrize([stride]) Parametrize this Transformer
trajectory_length(itraj[, stride]) Returns the length of trajectory
trajectory_lengths([stride]) Returns the length of each trajectory

Attributes

chunksize chunksize defines how much data is being processed at once.
data_producer where the transformer obtains its data.
in_memory are results stored in memory?
chunksize

chunksize defines how much data is being processed at once.

data_producer

where the transformer obtains its data.

dimension()

Returns the number of output dimensions

Returns:
get_output(dimensions=slice(0, None, None), stride=1)

Maps all input data of this transformer and returns it as an array or list of arrays.

Parameters:
  • dimensions (list-like of indexes or slice) – indices of dimensions you like to keep, default = all
  • stride (int) – only take every n’th frame, default = 1
Returns:

output – the mapped data, where T is the number of time steps of the input data, or if stride > 1, floor(T_in / stride). d is the output dimension of this transformer. If the input consists of a list of trajectories, Y will also be a corresponding list of trajectories

Return type:

ndarray(T, d) or list of ndarray(T_i, d)

Notes

  • This function may be RAM intensive if stride is too large or too many dimensions are selected.
  • if in_memory attribute is True, then results of this methods are cached.

Example

plotting trajectories

>>> import pyemma.coordinates as coor
>>> import matplotlib.pyplot as plt
>>> %matplotlib inline # only for ipython notebook
>>>
>>> tica = coor.tica() # fill with some actual data!
>>> trajs = tica.get_output(dimensions=(0,), stride=100)
>>> for traj in trajs:
>>>     plt.figure()
>>>     plt.plot(traj[:, 0])
in_memory

are results stored in memory?

iterator(stride=1, lag=0)

Returns an iterator that allows to access the transformed data.

Parameters:
  • stride (int) – Only transform every N’th frame, default = 1
  • lag (int) – Configure the iterator such that it will return time-lagged data with a lag time of lag. If lag is used together with stride the operation will work as if the striding operation is applied before the time-lagged trajectory is shifted by lag steps. Therefore the effective lag time will be stride*lag.
Returns:

iterator – If lag = 0, a call to the .next() method of this iterator will return the pair (itraj, X) : (int, ndarray(n, m)), where itraj corresponds to input sequence number (eg. trajectory index) and X is the transformed data, n = chunksize or n < chunksize at end of input.

If lag > 0, a call to the .next() method of this iterator will return the tuple (itraj, X, Y) : (int, ndarray(n, m), ndarray(p, m)) where itraj and X are the same as above and Y contain the time-lagged data.

Return type:

a pyemma.coordinates.transfrom.TransformerIterator transformer iterator

classmethod load_from_files(files)

construct this by loading all files into memory

Parameters:files (str or list of str) – filenames to read from
n_frames_total(stride=1)

Returns the total number of frames, over all trajectories

Parameters:stride – return value is the number of frames in trajectories when running through them with a step size of stride
Returns:the total number of frames, over all trajectories
number_of_trajectories()

Returns the number of trajectories

Returns:number of trajectories
output_type()

By default transformers return single precision floats.

parametrize(stride=1)

Parametrize this Transformer

trajectory_length(itraj, stride=1)

Returns the length of trajectory

Parameters:
  • itraj – trajectory index
  • stride – return value is the number of frames in trajectory when running through it with a step size of stride
Returns:

length of trajectory

trajectory_lengths(stride=1)

Returns the length of each trajectory

Parameters:stride – return value is the number of frames in trajectories when running through them with a step size of stride
Returns:list containing length of each trajectory