Next try :) I think you're uv buffer is not…

Question

0

Asked: May 11, 20262026-05-11T21:58:00+00:00 2026-05-11T21:58:00+00:00

I’ve got a data source that provides a list of objects and their properties

0

I’ve got a data source that provides a list of objects and their properties (a CSV file, but that doesn’t matter). Each time my program runs, it needs to pull a new copy of the list of objects, compare it to the list of objects (and their properties) stored in the database, and update the database as needed.

Dealing with new objects is easy – the data source gives each object a sequential ID number, check the top ID number in the new information against the database, and you’re done. I’m looking for suggestions for the other cases – when some of an object’s properties have changed, or when an object has been deleted.

A naive solution would be to pull all the objects from the database and get the complement of the intersection of the two sets (old and new) and then examine those results, but that seems like it wouldn’t be very efficient if the sets get large. Any ideas?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-11T21:58:00+00:00

The standard approach for huge piles of data amounts to this.

We’ll assume that list_1 is the “master” (without duplicates) and list_2 is the “updates” which may have duplicates.

iter_1 = iter( sorted(list_1) ) # Essentially SELECT...ORDER BY
iter_2 = iter( sorted(list_2) )
eof_1 = False
eof_2 = False
try:
    item_1 = iter_1.next()
except StopIteration:
    eof_1= True
try:
    item_2 = iter_2.next()
except StopIteration:
    eof_2= True
while not eof_1 and not eof_2:
    if item_1 == item_2:
        # do your update to create the new master list.
        try:
            item_2 = iter_2.next()
        except StopIteration:
            eof_2= True
    elif item_1 < item_2:
        try:
            item_1 = iter_1.next()
        except StopIteration:
            eof_1= True
    elif item_2 < item_1:
        # Do your insert to create the new master list.
        try:
            item_2 = iter_2.next()
        except StopIteration:
            eof_2= True
assert eof_1 or eof_2
if eof_1:
    # item_2 and the rest of list_2 are inserts.
elif eof_2:
    pass
else:
    raise Error("What!?!?")

Yes, it involves a potential sort. If list_1 is kept in sorted order when you write it back to the file system, that saves considerable time. If list_2 can be accumulated in a structure that keeps it sorted, then that saves considerable time.

Sorry about the wordiness, but you need to know which iterator raised the StopIteration, so you can’t (trivially) wrap the whole while loop in a big-old-try block.

How to approach applying for a job at a company ...

How to handle personal stress caused by utterly incompetent and ...

What is a programmer’s life like?

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions