I never actually thought I’d run into speed-issues with python, but I have. I’m trying to compare really big lists of dictionaries to each other based on the dictionary values. I compare two lists, with the first like so
biglist1=[{'transaction':'somevalue', 'id':'somevalue', 'date':'somevalue' ...}, {'transactio':'somevalue', 'id':'somevalue', 'date':'somevalue' ...}, ...]
With ‘somevalue’ standing for a user-generated string, int or decimal. Now, the second list is pretty similar, except the id-values are always empty, as they have not been assigned yet.
biglist2=[{'transaction':'somevalue', 'id':'', 'date':'somevalue' ...}, {'transactio':'somevalue', 'id':'', 'date':'somevalue' ...}, ...]
So I want to get a list of the dictionaries in biglist2 that match the dictionaries in biglist1 for all other keys except id.
I’ve been doing
for item in biglist2: for transaction in biglist1: if item['transaction'] == transaction['transaction']: list_transactionnamematches.append(transaction) for item in biglist2: for transaction in list_transactionnamematches: if item['date'] == transaction['date']: list_transactionnamematches.append(transaction)
… and so on, not comparing id values, until I get a final list of matches. Since the lists can be really big (around 3000+ items each), this takes quite some time for python to loop through.
I’m guessing this isn’t really how this kind of comparison should be done. Any ideas?
Index on the fields you want to use for lookup. O(n+m)
This is probably thousands of times faster than what you’re doing now.