There is a performance issue where that I have stuck with my application which is based on PHP & MySql. The application is for Data Migration where data has to be uploaded and after various processes (Cleaning from foreign characters, duplicate check, id generation) it has to be inserted into one central table and then to 5 different tables. There, an id is generated and that id has to be updated to central table.
There are different sets of records and validation rules. The problem I am facing is that when I insert say(4K) rows file (containing 20 columns) it is working fine within 15 min it gets inserted everywhere. But, when I insert the same records again then at this time it is taking one hour to insert (ideally it should get inserted by marking earlier inserted data as duplicate).
After going through the log file, I noticed is that there is a Mysql select statement where I am checking the duplicates and getting ID which are duplicates. Then I am calling a function inside for loop which is basically inserting records into 5 tables and updates id to central table. This Calling function is major time of whole process. P.S. The records has to be inserted record by record.. Kindly Suggest some solution..
//This is that sample code
$query=mysql_query("SELECT DISTINCT p1.ID
FROM table1 p1, table2 p2, table3 a
WHERE p2.datatype =0
AND (p1.datatype =1 || p1.datatype=2)
AND p2.ID =0
AND p1.ID = a.ID
AND p1.coulmn1 = p2.column1
AND p1.coulmn2 = p2.coulmn2
AND a.coulmn3 = p2.column3");
$num=mysql_num_rows($query);
for($i=0;$i<$num;$i++) {
$f=mysql_result($query,$i,"ID");
//calling function
RecordInsert($f);
}
Without your code I can only give you a general guideline for the solution. Basically, fetch all the existing values before you start inserting. While inserting, check this cached list to see if there is a duplicate, rather than doing a query. After inserting, add the inserted item to the list of cached items, just in case it comes up again. If your tables might be added to outside of your script, while the script is running, then just lock the tables you’re working with until you’re done.