I currently have a script that takes 1,000 rows at a time from a MySQL table, loops through them, does some processing, easy stuff. Right now, though, it’s not automated. Every time I want to run this, I connect with the terminal and just do php myscript.php and wait for it to end. The problem with this is that it’s not fast enough – the processing the script does is scraping, and I have been asked to find out how to enable multiple instances of scraping at one time to speed things up.
So I started trying to plan out how to do this, and realized after a couple of Google searches that I honestly don’t even know what the correct terminology for this actually is.
Am I looking to make a service with Apache? Or a daemon?
What I want my script to do is this:
- Some kind of “controller” that looks up a main table, gets X rows (could be tens or hundreds of thousands) that haven’t had a particular flag set
- Counts the total of the result set, figures out how many “children” it would need in order to send rows in batches of, say, 5,000 to each of the “children”
- Those “children” each get a group of rows. Say Child1 gets rows 0 – 5,000, Child2 gets rows 5,001 – 10,000, etc
- After each “child” runs its batch of rows, it needs to tell the “controller” that it has finished, so the “controller” can then tell our Sphinx indexer to re-index, and then send a new batch of rows to the child that just completed (assuming there are still more rows to do)
My main concern here is with how to automate all of this, as well as how to get two or more PHP scripts to “talk” to each other, or at the very least, the children notifying the controller that they have finished and are awaiting new batches of rows.
Another concern I have is if I should be worried about MySQL database problems with these myriad scripts in terms of row-locking, or something similar? Or if the table the finished rows are going into is just using auto_increment, would this have the potential of conflicting ID numbers?
You might want to look into turning that script into a daemon. With a bit of research and tinkering, you can get
System_DaemonPear set up to do just that.Here is an article that I used to help me write my first PHP daemon:
Create daemons in PHP (09 Jan 2009; by Kevin van Zonneveld)
You can also consider the comment above, and run your script in the background, having the script run in a continuous loop indefinitely with a set wait timer, for example: