I’m interested in using numpy arrays of somewhat inhomogenous data types. Since numpy specifies that the data must be homogenous, this would be accomplished by defining a super-dtype that acts as a union wrapper over all the sub-dtypes. Accessing the fields of the sub-dtypes then gives a different interpretation of the underlying data.
There’s already some facility for this, for example
dtype(('|S2', [('x', '|i1'), ('y', '|i1')]))
refers to an array of two-byte strings, but the first and second bytes can also be interpreted as integers through the ‘x’ and ‘y’ field names. I can’t figure out how to assign a field label to the two-byte string, though.
Can this be made more general, so that we can overlay any number of different field specifications on the data?
My first try was to specify the field offsets in the dtype, but it failed with a complaint that the offsets must be ordered (i.e. non-overlapping data).
dtype1 = np.dtype(dict(
names=['a','b'],
formats=['|a2','<i2'],
offsets=[0,0]))
Another technique works, but is cumbersome. In this technique I can define several variables as view onto the same underlying data, and change the dtype of the different variables to let me access the data in different formats, i.e.
a=np.zeros(3, dtype='<a2')
b=a[:]
b.dtype='<i2'
This lets me access the data either as strings or integers depending on whether I’m looking at a or b. But it is a cumbersome way of manipulating the data. Ideally, I’d like to be able to specify a variety of different fields with arbitrary offsets. Is there any way to do this?
Union dtypes have been allowed since June 2011: https://github.com/numpy/numpy/pull/94
You’ll need to upgrade to NumPy 1.7.x to use this.
However, in previous versions you can use the overlay dtype constructor:
This is documented at http://docs.scipy.org/doc/numpy-dev/reference/arrays.dtypes.html#specifying-and-constructing-data-types (search for
(base_dtype, new_dtype)).