I am making a piece of PHP code that takes user XML input (containing multiple records/items, usually around 20 to 100), parses it, and then checks it against a database of records. If a record is not in the database, the PHP script should INSERT it. If the record is in the database, the script should either discard the user’s input or run an UPDATE of that record, depending on whether the user has a ‘replace records’ checkbox checked.
My question is, which is faster: To SELECT the columns of the entire table that determine uniqueness, then sort through them in PHP? Or, for each record, to do a SELECT COUNT() FROM table WHERE name=(input name) AND region=(input region) and see if any records come back?
One big SQL query + quite a lot of PHP sorting time, or 100 small SQL queries and one PHP comparison?
EDIT: People have been requesting details, so:
Database size: 250 records or less
Columns indexed: I haven’t put indexes in YET, but I will set up the name and region columns with an index in the production version.
Format of returned SELECT: If I do the big SELECT, it’ll be returned in an associative array of row objects, due to the DB class I’m using (WPDB).
What constitutes uniqueness: The name and region columns determine if a record is unique. If a name-region combination is not in the database, then the record is unique.
As an example, name:”Paris” region:”France” and name:”Paris” region:”Texas” are two unique records. But, so are name:”Paris” region:”France” and name:”Marseilles” region:”France”.
It’s faster to use “REPLACE INTO”. Syntax is exactly the same as “INSERT INTO” but if there are any values that are not unique that must be unique because of an index (either a Primary Key or a Unique Index) then the existing record will be replaced. No need to check in advance, and it’s one query.
Note that replace will replace the ENTIRE record. If you want to simply update one field then run “INSERT INTO… ON DUPLIACTE KEY UPDATE field=value” and this will insert the new record or update an existing one. Still one query.
As for the other part (removing records if checkbox checked)), simply run “DELETE” and if the record is not there, it won’t be deleted – again, one query.
This way, you don’t risk a record being added/deleted between your check and your update, plus you save queries, plus it’s easier for someone else to read what’s going on.