I am looking for a way to quickly compare the state of a database table with the results of a Web service call.
I need to make sure that all records returned by the Web service call exist in the database, and any records in the database that are no longer in the Web service response are removed from the table.
I have to problems to solve:
- How do I quickly compare a data structure with the results of a database table?
- When I find a difference, how do I quickly add what’s new and remove what’s gone?
For number 1, I was thinking of doing an MD5 of a data structure and storing it in the database. If the MD5 is different, then I’d move to step 2. Are there better ways of comparing response data with the state of a database?
I need more guidance on number 2. I can easily retrieve all records from a table (SELECT * FROM users WHERE user_id = 1) and then loop through an array adding what’s not in the DB and creating another array of items to be removed in a subsequent call, but I’m hoping for a better (faster) was of doing this. What is the best way to compare and sync a data structure with a subset of a database table?
Thanks for any insight into these issues!
I’ve recently been caught up in a similar problem. Our–very simple–solution was to load the web service data into a table with the same structure as the DB table. The DB table keeps a hash of its most important columns, and the same hash function is applied to the corresponding columns in the web service table.
The ‘sync’ logic then goes like this:
Delete any rows from the web service table with hashes that do exist in the DB table. This is duplicate data that doesn’t need synchronizing.
DELETE FROM ws_table WHERE hash IN (SELECT hash from db_table);Delete any rows from the DB table with hashes not found in the web service table.
DELETE FROM db_table WHERE hash NOT IN (SELECT hash FROM ws_table);Anything left over in the web service table is new data, and should now be inserted into the DB table.
INSERT INTO db_table SELECT ... FROM ws_table;It’s a pretty brute-force approach, and if done transactionally (even just steps 2 and 3) locks up the DB table for the duration, but it’s very simple.
One refinement would be to deal with changed records using
UPDATEstatements, but that adds a good deal of complexity, and may not be any faster than aDELETEfollowed by anINSERT.Another possible optimization would be to set a flag instead of deleting rows. The rows could then be deleted later on. However, any logic using the DB table would have to ignore rows with a set flag.