In the Python’s standard max function (I also can pass in a key parameter):
s = numpy.array(['one','two','three'])
max(s) # 'two' (lexicographically last)
max(s, key=len) # 'three' (longest string)
With a larger (multi-dimensional) array, I can not longer use max, so I tried to use numpy.amax, however I can’t seem to be able to use amax with strings…
t = np.array([['one','two','three'],['four','five','six']])
t.dtype # dtype('|S5')
numpy.amax(t, axis=0) #Error! Hoping for: [`two`, `six`]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/dist-packages/numpy/core/fromnumeric.py", line 1833, in amax
return amax(axis, out)
TypeError: cannot perform reduce with flexible type
Is it possible to use amax (am using it incorrectly!), or is there some other numpy tool to do this?
Instead of storing your strings as variable length data in the
numpyarray, you could try storing them as Pythonobjects instead. Numpy will treat these as references to the original Python string objects, and you can then treat them like you might expect:Keep in mind that here, the
np.minandnp.maxcalls are ordering the strings lexicographically – so “two” does indeed come after “five”. To change the comparison operator to look at the length of each string, you could try creating a newnumpyarray identical in form, but containing each string’s length instead of its reference. You could then do anumpy.argmincall on that array (which returns the index of the minimum) and look up the value of the string in the original array.Example code: