Situation: A user submits a url and my php script adds that url to a “queue” table in the database. And suppose that there will be at least 1000 urls inserted to that “queue” table per minute. What I am going to do with the url is grab the contents of the url and then some quick parse work with the contents.
My Solution: Was thinking of creating a daemon which will keep checking the “queue” table and grabs the rows available every time it checks. And then work with the data retrieved, update data from another table, and then delete the rows when that cycle completes, then repeat again. It may take up to maybe 1ms-3ms for each row to complete. (Btw, I’m using InnoDB tables)
Question: So would you say this would be a good way of doing this? Or is there something better? – I don’t want to use any heavy systems though, like to keep stuff short and simple if possible 🙂
I would say you only grab 1 row at the time instead of all your rows. Here’s why:
Say you have 1000 entries in your table, your script comes, takes all 1000 in memory (warning sign 1 for high memory usage) and starts processing. Processing 1000 entries takes 5 minutes, but your script runs every 3 minutes. This means that by the time your first thread is processing row 674 (e.g.), your second thread starts processing row 1, as your database hasn’t been updated yet (warning sign 2: multi threaded behaviour).
This also works for when you let multiple threads run your queue at once.
Application flow:
Instead of using flags, you could also use row-locking in your database. But this is very prone to deadlocks, so be careful.