Background
I have two lists, the first is items which contains around 250 tuples, each tuple contains 3 elements
(path_to_a_file, size_in_bytes, modified_time)
The second list, result contains anywhere up to 250 elements, which is the result of a database query which looks up rows based on the paths that are in the items list. The number of elements in result depends if those files are in the database already.
each element in result is an row object returned from SQLAlchemy query with attributes for the row values, (path, mtime and hash are the ones I’m interested in here)
What I’m trying and do is filter out all the elements in items that are in results that have the same mtime (and keep track of the number, and total size filtered) and make a new list with items either with a different mtime or that dont exist in result. items with different mtimes need to be stored (path,size,mtime_from_result,hash_from_result) and items which weren’t in the database (path,size,mtime,None).
I hope I’m not making this too localised but I felt I needed to explain what I’m trying to accomplish to ask the question.
Problem
I want to try and make this loop as fast as possible but the most important part is making it work as expected.
Is it safe to remove items from the lists as I iterate over them? I noticed iterating forwards has a weird outcome but iterating backwards seems to be ok. Is there a better approach?
I’m removing items that I’ve matched up (i.path == j[0]) because I know the relationship is 1 to 1 and its not going to match again so by reducing the lists I can iterate over it faster in the next iteration, and more importantly I get left with all the unmatched items.
I can’t help feel there’s a much nicer solution that I’m overlooking, perhaps with list comprehension or generators perhaps.
send_items=[]
for i in result[::-1]:
for j in items[::-1]:
if i.path==j[0]:
result.remove(i) #I think this remove is possibly pointless?
items.remove(j)
if i.mtime==j[2]:
self.num_skipped+=1
self.size_skipped+=j[1]
else:
send_items.append((j[0],j[1],i.mtime,i.hash))
break
send_items.extend(((j[0],j[1],j[2],None) for j in items))
I’d do this as:
Here is my analysis of your solution (Assuming both
resultanditemsare of length N):result[::-1]creates a copy ofresultso callingresult.remove(i)doesn’t affect the iteration, nor would it have anyways. You only loop overresultonce, so removing elements is a bit pointless. It only creates extra work.result[::]to create a copy ofresult.items.remove(j)actually reduces efficiency.remove()takes O(N) time. So calling it reduces the algorithm’s efficiency to O(N^3) from O(N^2).