I’ve this query:
DELETE FROM f
WHERE ft != 'f'
OR fs NOT IN ( SELECT fs
FROM f
GROUP BY fs
HAVING COUNT(fs) >1)
It is doing its job nicely except that it takes much more time than I expected. I’m talking about 2.25 secs to down ~209,000 records to ~187,000. I think it could be improved, and I would like to know how.
Query EXPLAIN:
addr opcode p1 p2 p3 p4 p5 comment ---- ------------- ---- ---- ---- ----------------- -- ------------- 0 Trace 0 0 0 00 1 Goto 0 82 0 00 2 Null 0 1 0 00 3 String8 0 3 0 f 00 4 OpenRead 0 2 0 4 00 5 Rewind 0 69 0 00 6 Column 0 2 4 00 7 Ne 3 66 4 collseq(BINARY) 63 8 If 7 53 0 00 9 Integer 1 7 0 00 10 Null 0 6 0 00 11 OpenEphemeral 4 1 0 keyinfo(1,BINARY) 00 12 OpenEphemeral 5 2 0 keyinfo(1,BINARY) 00 13 Integer 0 11 0 00 14 Integer 0 10 0 00 15 Gosub 13 50 0 00 16 OpenRead 2 2 0 4 00 17 Rewind 2 23 0 00 18 Column 2 3 16 00 19 Sequence 5 17 0 00 20 MakeRecord 16 2 4 00 21 IdxInsert 5 4 0 00 22 Next 2 18 0 01 23 Close 2 0 0 00 24 Sort 5 53 0 00 25 Column 5 0 15 00 26 Compare 14 15 1 keyinfo(1,BINARY) 00 27 Jump 28 32 28 00 28 Move 15 14 1 00 29 Gosub 12 41 0 00 30 IfPos 11 53 0 00 31 Gosub 13 50 0 00 32 Column 5 0 16 00 33 AggStep 0 16 9 count(1) 01 34 Column 5 0 8 00 35 Integer 1 10 0 00 36 Next 5 25 0 00 37 Gosub 12 41 0 00 38 Goto 0 53 0 00 39 Integer 1 11 0 00 40 Return 12 0 0 00 41 IfPos 10 43 0 00 42 Return 12 0 0 00 43 AggFinal 9 1 0 count(1) 00 44 Integer 1 4 0 00 45 Le 4 42 9 6a 46 SCopy 8 18 0 00 47 MakeRecord 18 1 4 c 00 48 IdxInsert 4 4 0 00 49 Return 12 0 0 00 50 Null 0 8 0 00 51 Null 0 9 0 00 52 Return 13 0 0 00 53 Column 0 3 4 00 54 NotNull 4 57 0 00 55 Rewind 4 66 0 00 56 Goto 0 68 0 00 57 Affinity 4 1 0 c 00 58 Found 4 65 4 1 00 59 NotNull 6 63 0 00 60 Found 4 62 6 1 00 61 Integer -1 6 0 00 62 AddImm 6 1 0 00 63 If 6 68 0 00 64 Goto 0 66 0 00 65 Goto 0 68 0 00 66 Rowid 0 2 0 00 67 RowSetAdd 1 2 0 00 68 Next 0 6 0 01 69 Close 0 0 0 00 70 OpenWrite 0 2 0 4 00 71 OpenWrite 1 3 0 keyinfo(1,BINARY) 00 72 RowSetRead 1 79 2 00 73 NotExists 0 78 2 00 74 Rowid 0 20 0 00 75 Column 0 1 19 00 76 IdxDelete 1 19 2 00 77 Delete 0 1 0 f 00 78 Goto 0 72 0 00 79 Close 1 3 0 00 80 Close 0 0 0 00 81 Halt 0 0 0 00 82 Transaction 0 1 0 00 83 VerifyCookie 0 2 0 00 84 TableLock 0 2 1 f 00 85 Goto 0 2 0 00
Table definition (no indexes yet):
CREATE TABLE f (fi INTEGER PRIMARY KEY AUTOINCREMENT,
fn STRING,
ft STRING,
fs INTEGER)
I’m not sure that two seconds is a “killer” query (though this, of course, depends on your circumstances and needs) but one thing you could test is the effect of splitting the query into two.
That’s because it will currently delete records matching either condition so it’s an easy transformation into two
deletestatements (within a transaction if you want to ensure it’s atomic). You could try:and see if that improves things. It may or it may not, depending on your DBMS and the makeup of your data. It’s likely to get rid of the crossover records (those satisfying both conditions) in the most-likely-faster first query.
But, as with all database optimisations, measure, don’t guess!
Then you can either recombine and re-evaluate, or concentrate on speeding up the second simpler query if it’s still a problem.
One thing to particularly make sure of: have indexes on both the
ftandfscolumns. That should speed up your queries quite a bit if you don’t already have them.