Date help created: 06 Apr 2010 Date last updated: 06 Apr 2010DataRows is a Python module which allows access to blocked data files a row at a time.
There is a class DataRows in the module. So to access it you would type (in Python)
>>> from DataRows import DataRows
This is assuming that the lib directory is on your PYTHONPATH (the DataRows.so file lives as a symbolic link inside this directory).
The constructor has three mandatory arguments:
par_file: the file name for the par file of the blocked data file
dim: the dimension in which the rows are directed (1 <= dim <= ndim)
mode: 'r' for reading and 'w' for writing
and one optional argument:
in_order: 0: the rows are indexed in block order 1: the rows are indexed in point order
The default is that in_order = 0, which is what you would use (because it is more efficient) if you did not mind the order that the rows are returned in (it happens to do it a block at a time in the orthogonal dimensions). If you need the data in the natural point order (in the orthogonal dimensions) then you have to use in_order = 1.
It is perhaps slightly confusing that if you want to open a DataRows object for writing then the par file first needs to exist, because this is where all the information about the number of points, etc., lives. (An alternative design would have been to pass all that information into the constructor.)
For example:
dataRows = DataRows('edl387_5.spc.par', 1, 'r')
or:
dataRows = DataRows('edl387_5.spc.par', 1, 'r', 1)
These both access the rows in the dim 1 direction, with the second form returning the rows in natural point order.
Some attributes of the dataRows object:
name: the name of the par file dim: the dimension in which the rows are directed ndim: the number of dimensions for the data file row_size: the size of a given row nrows: the total number of rows in the data file npoints: the number of points in each of the dimensions block_size: the block size in each of the dimensions
Thus, for example, to determine the number of points in the spectrum (so not just in the dim being considered) you can do:
npoints = dataRows.npoints
Note that dataRows.nrows is the same as len(dataRows).
To access the n^th row in a spectrum you can do:
data = dataRows[n]
You must have 0 <= n < nrows. The result is a list of length dataRows.row_size.
There are two additional functions available on a dataRows object:
index(point): converts from an N-dimensional point to the corresponding index (0 <= index < dataRows.nrows).
point(index): converts from an index to an N-dimensional point where the element dataRows.dim is 0.
If a dataRows object is open for writing then you can set the rows of data. So for the n^th row you can do:
dataRows[n] = data
The documentation is also available via a module doc string:
import DataRows print DataRows.__doc__ Azara help: DataRows / W. Boucher / azara@bioc.cam.ac.uk