I have a script that is transferring about 1.5 million rows (~400mb worth of data) from a table to another table (during this process, some data is converted, modified, and placed in the correct field). It’s a simple script, it just recursively loads data, then places it in the new tables under the correct fields and formats. The scripts works by (as an example) pulling all of the users from the table then begins looping through the users, inserting them into the new table, then pulling all of the posts from that user, looping through and inserting them into the correct table, then pulling all of the comments from a post and inserting those, then jumping back up and pulling all of the contacts for that user, finally onto the next user where it goes through the same process.
I’m just having a problem with the immense amount of data being transferred, because it is so large and there isn’t any sort of memory management besides garbage collection (that I know of) in PHP, I’m unable to complete the script (it gets through about 15,000 connections and items transferred before it maxes out at 200MB of memory).
This is a one time thing, so I’m doing it on my local computer, not an actual server.
Since unset() does not actually free up memory, is there any other way to free up the data in a variable? One thing I attempted to do was overwrite the variable to a NULL value, but that didn’t seem to help.
Any advice would be awesome, because man, this stinks.
If you’re actually doing this recursively then that’s your problem – you should be doing it iteratively. Recursive processing leaves overhead (+garbage) every time the next call is made – so eventually you hit the limit. An iterative approach doesn’t have such problems, and should be actively garbage collecting.
You’re also talking about a mind numbing number of connections – why are there so many? I guess I don’t completely understand your process, and why this approach is what’s needed rather than one retrieve connection and one store connection. Even if you were – say – reconnecting on for each row, you should look at using persistent connections which allows the second connection to the same db to reuse the last connection. Persistent connections aren’t a great idea for a web app with multi users (for scalability reasons) but in your very targeted case they should be fine.