I have the for loop code:
people = queue.Queue()
for person in set(list_):
first_name,last_name = re.split(',| | ',person)
people.put([first_name,last_name])
The list being iterated has 1,000,000+ items, it works, but takes a couple seconds to complete.
What changes can I make to help the processing speed?
Edit: I should add that this is Gevent’s queue library
The question is what is your queue being used for? If it isn’t really necessary for threading purposes (or you can work around the threaded access) in this kind of situation, you want to switch to generators – you can think of them as the Python version of Unix shell pipes. So, your loop would look like:
and you would use this generator like this:
This approach avoids what is probably your biggest performance hits – allocating memory to build a queue and a set with 1,000,000+ items on it. This approach works with one pair of strings at a time.
UPDATE
Based on more information about how threads play a roll in this, I’d use this solution instead.
This replaces the set() operation with something that should be more efficient.