relatively new to python so excuse me if this has an obvious answer that I havent found.
I am reading some temporaly contiguous binary files into numpy record arrays with the end goal of storing them in a pytable. The problem I anticipate is that the files may not all have the same fields, or the same field order. I have been looking for a numpy function that will sort the columns (NOT the rows) of a recarray using either the field labels or an index. Even better would be a function that does this for you – and accounts for missing columns – when you append a recarray to another. Below is a sample of what I had in mind:
#-------script------------
Myarray1 = np.array([(1,2,3),(1,2,3),(1,2,3)], {'names': ('a','b','c'), 'formats': ('f4', 'f4', 'f4')})
Myarray2 = np.array([(2,1,4,3),(2,1,4,3),(2,1,4,3)], {'names': ('b','a','d','c'), 'formats': ('f4', 'f4', 'f4', 'f4')})
Myarray3 = SomeColumnSortFunction(Myarray2, sortorder=[2,1,4,3])
Myarray4 = SomeBetterVerticalStackFunction(Myarray1,Myarray2)
#
print(Myarray1)
print()
print(Myarray2)
print()
print(Myarray3)
print()
print(Myarray4)
#---------- Wished for Output -------------
[(1.0, 2.0, 3.0) (1.0, 2.0, 3.0) (1.0, 2.0, 3.0)],
dtype=[('a', 'i4'), ('b', 'i4'), ('c', 'i4')]
[(2.0, 1.0, 4.0, 3.0) (2.0, 1.0, 4.0, 3.0) (2.0, 1.0, 4.0, 3.0)],
dtype=[('b', 'i4'), ('a', 'i4'), ('d', 'i4'), ('c', 'i4')]
[(1.0, 2.0, 3.0, 4.0) (1.0, 2.0, 3.0, 4.0) (1.0, 2.0, 3.0, 4.0)]
dtype=[('a', 'i4'), ('b', 'i4'), ('c', 'i4'), ('d', 'i4')]
[(1.0, 2.0, 3.0, NaN) (1.0, 2.0, 3.0, NaN) (1.0, 2.0, 3.0, NaN),
(1.0, 2.0, 3.0, 4.0) (1.0, 2.0, 3.0, 4.0) (1.0, 2.0, 3.0, 4.0)]
dtype=[('a', 'i4'), ('b', 'i4'), ('c', 'i4'), ('d', 'i4')]
If you want to reorder the fields of your structured array, just use fancy indexing:
If you want to use integers to sort your fields, you can use something like:
(in your
sortorder=[2,1,4,3], you probably forgot that the first index of an iterable is 0…)For stacking structured arrays, have a look to the
numpy.lib.recfunctionssubmodule, thestack_arraysfunction in particular. Note that you have to useimport numpy.lib.recfunctionsexplicitlyHere’s the docstring