I am trying to filter some data I am working with to take out some artifacts such as negative numbers and errors in my measuring devices. I have been playing with the idea of using a generator to do this. I am using Python 2.7.2
testlist = [12,2,1,1,1,0,-3,-3,-1]
gen = (i for i, x in enumerate(testlist) if x < 0 or x > 2.5)
for i in gen: testlist.pop(i)
print testlist
This returns:
[2, 1, 1, 1, 0, -3]
My question is why is the -3 value showing up in the updated “testlist”?
When you remove items from your list, the indexes of the items after it change (they are all shifted down by one). As a result, the generator will skip over some items. Try adding some more print statements so that you can see what is going on:
Output:
You would have needed to delete items at index 0, 5, 5, 5. The generator produces the indexes 0, 5, 6. That makes sense because
enumeratereturns0, 1, 2, ...etc. It won’t return the same index twice in a row.It’s also very inefficient to remove the elements one at a time. This requires moving data around multiple times, with a worst case performance of O(n2). You can instead use a list comprehension.