1)Are SQL query execution times O(n) compared to the number of joins, if indexes are not used? If not, what kind of relationship are we likely to expect? And can indexing improve the actual big-O time-complexity, or does it only reduce the entire query time by some constant factor?
Slightly vague question, I’m sure it varies a lot but I’m talking in a general sense here.
2) If you have a query like:
SELECT T1.name, T2.date
FROM T1, T2
WHERE T1.id=T2.id
AND T1.color='red'
AND T2.type='CAR'
Am I right assuming the DB will do single table filtering first on T1.color and T2.type, before evaluating multi-table conditions? In such a case, making the query more complex could make it faster because less rows are subjected to the join-level tests?
This depends on the query plan used.
Even without indexes, modern servers can use
HASH JOINandMERGE JOINwhich are faster thanO(N * M)More specifically, complexity of a
HASH JOINisO(N + M), whereNis the hashed table andMthe is lookup table. Hashing and hash lookups have constant complexity.Complexity of a
MERGE JOINisO(N*Log(N) + M*Log(M)): it’s the sum of times to sort both tables plus time to scan them.If there are no indexes defined, the engine will select either a
HASH JOINor aMERGE JOIN.The
HASH JOINworks as follows:The hashed table is chosen (usually it’s the table with fewer records). Say it’s
t1All records from
t1are scanned. If the records holdscolor='red', this record goes into the hash table withidas a key andnameas a value.All records from
t2are scanned. If the record holdstype='CAR', itsidis searched in the hash table and the values ofnamefrom all hash hits are returned along with the current value ofdata.The
MERGE JOINworks as follows:The copy of
t1 (id, name)is created, sorted onidThe copy of
t2 (id, data)is created, sorted onidThe pointers are set to the minimal values in both tables:
The pointers are compared in a loop, and if they match, the records are returned. If they don’t match, the pointer with the minimal value is advanced:
Sure.
Your query without the
WHEREclause:is more simple but returns more results and runs longer.