Is there any fast way to obtain unique elements in numpy? I have code

Question

0

Asked: May 27, 20262026-05-27T18:00:00+00:00 2026-05-27T18:00:00+00:00

Is there any fast way to obtain unique elements in numpy? I have code

0

Is there any fast way to obtain unique elements in numpy? I have code similar to this (the last line)

tab = numpy.arange(100000000)

indices1 = numpy.random.permutation(10000)
indices2 = indices1.copy()
indices3 = indices1.copy()
indices4 = indices1.copy()

result = numpy.unique(numpy.array([tab[indices1], tab[indices2], tab[indices3], tab[indices4]]))

This is just an example and in my situation indices1, indices2,...,indices4 contains different set of indices and have various size. The last line is executed many times and Inoticed that it’s actually the bottleneck in my code ({numpy.core.multiarray.arange} to be precesive). Besides, ordering is not important and element in indices array are of int32 type. I was thinking about using hashtable with element value as key and tried:

seq = itertools.chain(tab[indices1].flatten(), tab[indices2].flatten(), tab[indices3].flatten(), tab[indices4].flatten())
myset = {}
map(myset.__setitem__, seq, [])
result = numpy.array(myset.keys())

but it was even worse.

Is there any way to speed this up? I guess the performance penalty comes from ‘fancy indexing’ that copy the array but I need the resulting element only to read (I don’t modify anything).

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-27T18:00:00+00:00

Sorry I don’t completely understand your question, but I’ll do my best to help.

Fist {numpy.core.multiarray.arange} is numpy.arange not fancy indexing, unfortunately fancy indexing does not show up as a separate line item in the profiler. If you’re calling np.arange in the loop you, should see if you can move it outside.

In [27]: prun tab[tab]
     2 function calls in 1.551 CPU seconds

Ordered by: internal time

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    1    1.551    1.551    1.551    1.551 <string>:1(<module>)
    1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler'    objects}

In [28]: prun numpy.arange(10000000)
     3 function calls in 0.051 CPU seconds

Ordered by: internal time

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
    1    0.047    0.047    0.047    0.047 {numpy.core.multiarray.arange}
    1    0.003    0.003    0.051    0.051 <string>:1(<module>)
    1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

Second I assume that tab is not np.arange(a, b) in your code, because if it is than tab[index] == index + a, but I assume that was just for your example.

Third, np.concatenate is about 10 times faster than np.array

In [47]: timeit numpy.array([tab[indices1], tab[indices2], tab[indices3], tab[indices4]])
100 loops, best of 3: 5.11 ms per loop

In [48]: timeit numpy.concatenate([tab[indices1], tab[indices2], tab[indices3],     tab[indices4]])
1000 loops, best of 3: 544 us per loop

(Also np.concatenate gives a (4*n,) array and np.array gives a (4, n) array, where n is the length if indices[1-4]. The latter will only work if the indices1-4 are all the same length.)

And last, you could also save even more time if you can do the following:

indices = np.unique(np.concatenate((indices1, indices2, indices3, indices4)))
result = tab[indices]

Doing it in this order is faster because you reduce the number of indices you need to look up in tab, but it’ll only work if you know that the elements of tab are unique (otherwise you could get repeats in result even if the indices are unique).

Hope that helps

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Is there any fast way to obtain unique elements in numpy? I have code

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply