Let’s say I have a table tbl with columns id and title. I need to change all values of title column:
- from ‘a-1’ to ‘a1’,
- from ‘a.1’ to ‘a1’,
- from ‘b-1’ to ‘b1’,
- from ‘b.1’ to ‘b1’.
Right now, I’m performing two UPDATE statements:
UPDATE tbl SET title='a1' WHERE title IN ('a-1', 'a.1') UPDATE tbl SET title='b1' WHERE title IN ('b-1', 'b.1')
This isn’t at all a problem, if the table is small, and the single statement completes in less than a second and you only need a few statements to execute.
You probably guested it – I have a huge table to deal with (one statement completes in about 90 seconds), and I have a huge number of updates to perform.
So, is it possible to merge the updates so it would only scan the table once? Or perhaps, there’s a better way to deal with in a situation like this.
EDIT: Note, that the real data I’m working with and the changes to the data I have to perform are not really that simple – the strings are longer and they don’t follow any pattern (it is user data, so no assumptions can be made – it can be anything).
In a more general case, where there could be many hundreds of mappings to each of the new values, you would create a separate table of the old and new values, and then use that in the UPDATE statement. In one dialect of SQL:
Both select statements are crucial. The first is a correlated sub-query (not necessarily fast, but faster than most of the alternatives if the mapper table has thousands of rows) that pulls the new value out of the mapping table that corresponds to the old value. The second ensures that only those rows which have a value in the mapping table are modified; this is crucial as otherwise, the title will be set to null for those rows without a mapping entry (and those were the records that were OK before you started out).
For a few alternatives, the CASE operations are OK. But if you have hundreds or thousands or millions of mappings to perform, then you are likely to exceed the limits of the SQL statement length in your DBMS.