I’m running filter on an interable and want to store the result in a sequence (I need a sequence so that I can use random.choice on it). I noticed that creating a set from a filter object is a lot faster than creating a list or a tuple. Why is that? I first though that the filter type is a subtype of set, which would explain this, but the filter function is actually identical to a generator expression, so it cannot really be a set internally.
I ran the following test to check the speed:
import time
def test ( n, seq ):
for method in ( set, list, tuple ):
t = time.time()
for i in range( n ):
method( seq )
print( method.__name__, ( time.time() - t ) )
someFilter = filter( lambda x: x % 3 == 0, range( 1000 ) )
test( 10000000, someFilter )
And the results were speaking clearly for using a set:
set 1.9240000247955322
list 8.82200002670288
tuple 7.031999826431274
So why is creating a set from a filter so much faster? Shouldn’t it usually take as long as creating a set from a sequence, where every element has to be hashed? Or is it somehow getting a boost from the internal filter representation?
For comparison, when running the test on a range expression, set takes about twice as long as list and tuple (which are both nearly identical in speed).
edit:
Sven’s answer is totally right, but for the completeness an updated test that will run on an actual filter:
import time
def testFilter ( n, test, rangeSize ):
for method in ( set, list, tuple ):
t = time.time()
for i in range( n ):
method( filter( test, range( rangeSize ) ) )
print( method.__name__, ( time.time() - t ) )
testFilter( 100000, lambda x: x % 3 == 0, 1000 )
The result actually shows what makes more sense with list and tuple both being the fastest, although set isn’t really slower, so it won’t make any difference what to use:
set 27.868000030517578
list 27.131999969482422
tuple 27.138000011444092
filter()returns an iterator in Python 3, and this iterator will be consumed on the first run of the inner for-loop. After that, you are only measuring the speed of the contructor — that’s why you have to repeat it so often to make it consume at least a bit of time.So it seems that the constructor of
set()is the fastest one in dealing with an empty iterator.