Long story short, if I break up a PHP script into tiny chunks, I can eventually get all of my code to run. However, I’ve got a script right now which reads a huge CSV file and inserts each row into a MySQL database. Rather than having to go through the onerous process of splitting up the file every time I want to update my site, I just want to get this script to work the way I know it should.
I’ve gotten it to insert around 10,000 rows before on a different web server, but there are at least 7 times that in the file, and it craps out before it’s done.
So, the story is, on one server it stops before it’s supposed to, and on another it doesn’t run at all… it just chugs its way to a 500 error after about 30 seconds.
The Apache error log gives me these lines when the script dies:
[Tue Aug 23 13:09:04 2011] [warn] [client 71.168.85.72] mod_fcgid: read data timeout in 40 seconds
[Tue Aug 23 13:09:04 2011] [error] [client 71.168.85.72] Premature end of script headers: newcsvupdater.php
I’ve got these two lines going at the top of the script:
set_time_limit(0);
ini_set('memory_limit','256M');
because previously I was having a fatal memory allocation error, because apparently splitting a large file up into arrays is memory-intensive.
Here’s the insertion code:
$file = "./bigdumbfile.csv"; // roughly 30mb
$handle = fopen($file, r);
$firstentry = 0;
while($csv = fgetcsv($handle))
{
if($firstentry == 0)
{
$firstentry++; // skips the top row of field names
}
else
{
// unimportant conditional code omitted
$checkforexisting = mysql_query("SELECT * FROM DB_TABLE WHERE ".
"id_one = '".$csv[0]."' AND id_two = '".$csv[2]."'");
$checknum = mysql_num_rows($checkforexisting);
if($checknum == 0)
{
if(!mysql_query("INSERT INTO DB_TABLE ".
"(id_one, data_one, id_two, data_two, ".
/* so on for 22 total fields */")
VALUES ('".addslashes($csv[0])."', '".
addslashes($csv[1])."', '".
addslashes($csv[2])."', '".
addslashes($csv[3])."' "/* ditto, as above */))
{
exit("<br>" . mysql_error());
}
else
{
print_r($csv);
echo " insert complete<br><br>";
}
}
}
}
echo "<br><b>DB_TABLE UPDATED";
I’ve had to split up large tasks because of this before, and I’m pretty tired of it. I’m sure I’m doing plenty wrong, as I’m totally self-taught and generally write what amounts to spaghetti, so don’t hold back.
To increase the time limit for your script, you will need to edit the virtual host configuration for your site:
http://www.moe.co.uk/2009/08/17/php-running-under-mod_fcgid-read-data-timeout-in-40-seconds-on-plesk/
(mod_fcgid’s timeout is overriding PHP’s timeout)
To make your script faster (so you might not need to perform the above step, which might not be possible on shared hosting), try this:
Prepare all of the information to be inserted in advanced to do a bulk insert. The query should look something like this:
The IGNORE part should have the same effect of checking in advance if the record already exists (if it does, it just won’t be inserted and it will continue on to the next).