I have about half a million items that need to be placed in a list, I can’t have duplications, and if an item is already there I need to get it’s index. So far I have
if Item in List:
ItemNumber=List.index(Item)
else:
List.append(Item)
ItemNumber=List.index(Item)
The problem is that as the list grows it gets progressively slower until at some point it just isn’t worth doing. I am limited to python 2.5 because it is an embedded system.
You can use a set (in CPython since version 2.4) to efficiently look up duplicate values. If you really need an indexed system as well, you can use both a set and list.
Doing your lookups using a set will remove the overhead of
if Item in List, but not that ofList.index(Item)Please note
ItemNumber=List.index(Item)will be very inefficient to do afterList.append(Item). You know the length of the list, so your index can be retrieved withItemNumber = len(List)-1.To completely remove the overhead of
List.index(because that method will search through the list – very inefficient on larger sets), you can use a dict mapping Items back to their index.I might rewrite it as follows: