I have a query that looks a bit like this (note: the actual query is generated by Hibernate and is a bit more complicated):
select * from outage_revisions orev
join outages o
on orev.outage=o.id
where o.observed_end is null
and orev.observation_date =
(select max(observation_date)
from outage_revisions orev2
where orev2.observation_date <= '2011-11-21 00:00:00'
and orev2.outage = orev.outage);
This query runs very slowly (about 15 minutes). However, if I take out the part of the where clause with the subquery, it comes back almost instantly (about 83 milliseconds) with only about 14 rows.
Furthermore, the subquery itself is very fast (about 31 milliseconds):
select max(observation_date) from outage_revisions orev2
where orev2.observation_date <= '2011-11-21 00:00:00'
and orev2.outage = 1
My question is this: if there are only 14 rows returned from the full query excluding the subquery filter, why does adding the subquery slow down the query so much? Should not the subquery add at most approximately 31*14 milliseconds?
Here is the plan for the full query:
Nested Loop (cost=0.00..71078813.16 rows=1 width=115)
-> Seq Scan on outagerevisions orev (cost=0.00..71077624.67 rows=284 width=79)
Filter: (observationdate = (SubPlan 2))
SubPlan 2
-> Result (cost=1250.56..1250.57 rows=1 width=0)
InitPlan 1 (returns $1)
-> Limit (cost=0.00..1250.56 rows=1 width=8)
-> Index Scan Backward using idx_observationdate on outagerevisions orev2 (cost=0.00..2501.12 rows=2 width=8)
Index Cond: (observationdate <= '2011-11-21 00:00:00'::timestamp without time zone)
Filter: ((observationdate IS NOT NULL) AND (outage = $0))
-> Index Scan using outages_pkey on outages o (cost=0.00..4.17 rows=1 width=36)
Index Cond: (o.id = orev.outage)
Filter: (o.observedend IS NULL)
My guess is that PostgreSQL is just making a poor choice on how it executes the query. Although it seems obvious that it should narrow down to the 9 rows before executing the correlated subquery, it’s probably not doing that, so the subquery has to be run 60,000 times. While it’s doing that it also has to track which rows will continue on to the next step, etc.
Here are a couple of other ways that you could try to write it:
or
(assuming that PostgreSQL and Hibernate support joining subqueries)
You can play around with the order of the joins in that last query.