I have a table that is comprised of 6 numbers as the primary key
CREATE TABLE table1 ( num1 decimal, num2 int, num3 int, num4 bigint, num5 bigint, num6 bigint,
PRIMARY KEY (num1, num2, num3, num4, num5, num6))
I need to access the table in sorted order and often times I have a need to query the table to find the next N large numbers in order and their associated data.
So the query I wrote was something like this
SELECT * FROM table1 WHERE
num1 >? OR (
(num1 == ? AND num2 > ?) OR (
(num1 == ? AND num2 == ? AND num3 > ?) OR (
(num1 == ? AND num2 == ? AND num3 == ? AND num4 > ? OR (
(num1 == ? AND num2 == ? AND num3 == ? AND num4 == ? AND num5 > ?) OR (
(num1 == ? AND num2 == ? AND num3 == ?
AND num4 == ? AND num5 == ? AND num6 > ?)))))) ORDER BY num1, num2, num3, num4, num5, num6
LIMIT ?;
This was the best way I could see to find the next largest key, and this does query in the order of the index however….query takes a few seconds, which is something that I’m not to fond of.
Is there any way to improve the performance? This takes a few seconds to execute on a table of 10million rows and I need it to execute more on the order of 100ms.
Query Plan:
"SEARCH TABLE table1 USING INDEX sqlite_autoindex_table1_1 (num1>?) (~250000 rows)"
"SEARCH TABLE table1 USING INDEX sqlite_autoindex_table1_1 (num1=? AND num2>?) (~2 rows)"
"SEARCH TABLE table1 USING INDEX sqlite_autoindex_table1_1 (num1=? AND num2=? AND num3>?) (~2 rows)"
"SEARCH TABLE table1 USING INDEX sqlite_autoindex_table1_1 (num1=? AND num2=? AND num3=? AND num4>?) (~2 rows)"
"SEARCH TABLE table1 USING INDEX sqlite_autoindex_table1_1 (num1=? AND num2=? AND num3=? AND num4=? AND num5>?) (~1 rows)"
"SEARCH TABLE table1 USING INDEX sqlite_autoindex_table1_1 (num1=? AND num2=? AND num3=? AND num4=? AND num5=? AND num6>?) (~1 rows)"
"USE TEMP B-TREE FOR ORDER BY"
Edit:
Why is this not possible? I literally want to get things in the INDEXED ORDER, the same order generated by the ORDER BY keyword?
Contrary to other more sophisticated RDBMS, sqlite has a rule-based query optimizer, meaning that the execution plan mostly depends on the way the query is written (and the order of the clauses). It makes the optimizer quite predictable, and if you know how sqlite generates execution plans, you can take benefit of this predictability to solve your issue.
A first idea is to note that the various clauses like (num1>?) or (num1=? and num2>?) are producing disjoint results, and that these results are naturally sorted between each others. If the query is divided in subqueries (each of them handling a part of the condition) producing sorted results, then the concatenation of all the result-sets is also sorted, if the subqueries are executed in the correct order.
For example, consider the following queries:
The two result sets produced by these queries are disjoint, and the rows of the first result-set are always ordered before the rows of the second result-set.
The second idea is to understand how sqlite handles the LIMIT clause. Actually, it declares a counter at the begining of the query, and decrement and test this counter at each selected row, so it can stop a query early.
For instance, consider the following query:
sqlite will evaluate the subqueries in the order specified in the query. If the first subquery returns more than 10 rows, the second subquery will not even be executed.
It can be easily checked by displaying the plan:
The counter is declared step 1, and decremented/tested at steps 21, 24, 40.
By combining these two remarks, we can propose a query which is not pretty, but will produce an efficient execution plan:
Note that because the “order by” clause is not required in the outer query, there is no need for sqlite to execute all the subqueries. So it can just stop when it has the correct number of rows. The order of the subqueries is critical.
The second level inner subqueries are needed because it is not possible to use “order by” before “union all”. They are optimized away by sqlite, so it is not an issue.
On a dummy table containing 777K rows, the initial query costs:
while mine only costs: