I’m experiencing some weird behavior with SELECT statements in sqlite. There is one table with 3 Million records. E.g.
SELECT * FROM table1 WHERE cond1;
reduces the output to 10000 records and finishes instantly. Same with
SELECT * FROM table1 WHERE cond1 ORDER BY col1;
But
SELECT * FROM table1 WHERE cond1 AND cond2 ORDER BY col1;
seems to take forever. The CPU is working for about 2 seconds and after that there is only I/O. CPU does nothing, memory is free.
What am I doing wrong?
Hope, it’s not a newbie question and all i have to do is using an index (but why?).
Thx for help!
More concrete:
the table structure:
0|url|TEXT|0||1
1|date|DATE|0||1
2|md5sum|TEXT|0||0
3|size|INTEGER|0||0
4|archive|TEXT|0||0
5|numScripts|INTEGER|0||0
6|numScriptBytes|INTEGER|0||0
7|numLinesBehaviour|INTEGER|0||0
8|state|TEXT|0||0
the statement:
SELECT * FROM t1 WHERE md5sum LIKE "00%" AND state=="okay" ORDER BY md5sum;
There is no connection between md5sum and state.
I haven’t created any indexes.
What i also forgot to mention: The problem occurs only when the statement includes two or more string comparisons AND ordering. So
SELECT * FROM t1 WHERE md5sum LIKE "00%" AND state=="okay";
works also fine.
2 Update:
An obvious workaround:
CREATE TABLE temp (url TEXT, date DATE, ...
INSERT INTO temp SELECT * FROM t1 WHERE state=="okay" AND md5sum LIKE "00%";
SELECT * FROM temp ORDER BY md5sum;
But, damn, there must be an easier way.
That implies that the DBMS will have to inspect every row of your table just to make the selection.
That implies that the DBMS has to sort (typically an N log(N) operation) the result set.
Adding indexes may help, either by making the checking of your condition cheaper, or by making the sorting unneeded. (and maybe both)
UPDATE (added):
Since md5sum is part of both the selection condition and the orderby expression, you might try to fool the queryplan generator by adding a bogus term to the sorting expression:
No guarantees, YMMV.