I’m looking for some “inference rules” (similar to set operation rules or logic rules) which I can use to reduce a SQL query in complexity or size.
Does there exist something like that? Any papers, any tools? Any equivalencies that you found on your own? It’s somehow similar to query optimization, but not in terms of performance.
To state it different: Having a (complex) query with JOINs, SUBSELECTs, UNIONs is it possible (or not) to reduce it to a simpler, equivalent SQL statement, which is producing the same result, by using some transformation rules?
So, I’m looking for equivalent transformations of SQL statements like the fact that most SUBSELECTs can be rewritten as a JOIN.
This answer was written in 2009. Some of the query optimization tricks described here are obsolete by now, others can be made more efficient, yet others still apply. The statements about feature support by different database systems apply to versions that existed at the time of this writing.
That’s exactly what optimizers do for a living (not that I’m saying they always do this well).
Since SQL is a set based language, there are usually more than one way to transform one query to other.
Like this query:
can be transformed into this one (provided that
mytablehas a primary key):or this one:
, which look uglier but can yield better execution plans.
One of the most common things to do is replacing this query:
with this one:
In some RDBMS’s (like PostgreSQL 8.4),
DISTINCTandGROUP BYuse different execution plans, so sometimes it’s better to replace the one with the other:vs.
In PostgreSQL,
DISTINCTsorts andGROUP BYhashes.MySQL 5.6 lacks
FULL OUTER JOIN, so it can be rewritten as following:vs.
, but see this article in my blog on how to do this more efficiently in MySQL:
FULL OUTER JOINin MySQLThis hierarchical query in Oracle 11g:
can be transformed to this:
, the latter one being more efficient.
See this article in my blog for the execution plan details:
To find all ranges that overlap the given range, you can use the following query:
, but in SQL Server this more complex query yields same results faster:
, and believe it or not, I have an article in my blog on this too:
SQL Server 2008 also lacks an efficient way to do cumulative aggregates, so this query:
can be more efficiently rewritten using, Lord help me, cursors (you heard me right: "cursors", "more efficiently" and "SQL Server" in one sentence).
See this article in my blog on how to do it:
There is a certain kind of query, commonly met in financial applications, that pulls effective exchange rate for a currency, like this one in Oracle 11g:
This query can be heavily rewritten to use an equality condition which allows a
HASH JOINinstead ofNESTED LOOPS:Despite being bulky as hell, the latter query is six times as fast.
The main idea here is replacing
<=with=, which requires building an in-memory calendar table to join with.