I’ve been grappling with what appears to be a bug between Django/MySQL, but is perhaps just my own misunderstanding in the nuances of threaded applications, etc.
First, a bit of information on my application. I have a multithreaded application programmed in python that is using Django’s models. There are three different types of threads that supply information down a pipeline through the use of Queues. Thread one pulls a bunch of objects from the database and throws them into a queue. The next thread (the main workhorse) takes the item off the queue and pulls an HTTP request and throws that onto a queue for the third thread. The third thread does some processing on the html and updates some database values.
Here’s the weird part. I have a mysql column called “level.” The first thread pulls rows where level = 0. After parsing the HTTP response the final thread is supposed to update the row in the database with level = 1 along with all manner of data that is parsed out of the HTTP. Well at full speed the script says it’s processing about 1,000 a minute. But the number of rows with level = 1 increases at about 1/3 of that. Here’s some excerpt from the problem when going slowly.
a picture of program output showing correct output
The important part is the lines that say “Updating level one entry.” The numbers at the end are displaying the number of level 1 rows in the database, followed by the current “level” status of the working data object. This output is when it is functioning correctly. It is produced by this code block:
# update our current record to reflect having run here
current.update = datetime.now()
# this prints out the "updating level one" text with debugging information
self.send_message(304, str(Scrape.objects.filter(level=1).count()) + ":" + str(current.level))
current.level = 1
current.save()
# and after saving the information to the db, prints it out again
self.send_message(304, str(Scrape.objects.filter(level=1).count()) + ":" + str(current.level))
self.send_message(308, str(current.asin)) # send out a consuming message
However, after running for a little while I will get output that is basically identical, except the count for number of objects at level = 1 will not increase. This makes absolutely no sense to me. If the value was = 0 before, and is = 1 now, then it should increase the number of level = 1 entries!
I don’t believe this is merely caching but more likely either an error that I’ve made or some sort of unexpected behavior out of the components I’m using. Any advice from more experienced eyes would be greatly appreciated.
My immediate guess would be a transaction issue. Since these are running in separate threads, they’ll have their own transactions, and therefore will be subject to transaction isolation. Even if the thread doing the updating commits its transaction and starts a new one, the thread outputting the count will not necessarily see that update until it too starts a new transaction.