I have a large (in thousands) collection of word : value (float) pairs. I need to find the best of the value and extract the corresponding associated word. For example, I have (a,2.4),(b,5.2),(c,1.2),(d,9.2),(e,6.3),(f,0.4). I would like (d,9.2) as the output.
Currently, I am using a dictionary to store these tuples and use the max operator to retrieve the maximum key value in the dictionary. I was wondering if a numpy array would be more efficient. Soliciting expert opinions here.
Using Numpy here would involve keeping the float values in a separate
ndarray. Find the index of max value usingargmaxand get the word from a separate list. This is very fast, but constructing the ndarray only to find the max is not. Example:Timings: fa 67 µs, fb 2300 µs, fc 2580 µs, fd 3780 µs.
So, using Numpy (fa) is over 30 times faster than using a plain list (fb) or dictionary (fc), when the time to construct the Numpy array is not taken into account. (fd takes it into account)