Currently I have two different queries that return exactly the same results however, changing the parameter from which the results are being filtered make them behave in a very different manner.
Results when searching for cartography
query #1: 22 rows / ~860ms;
SELECT eid FROM t_entidades
WHERE eid IN (
SELECT eid
FROM t_entidades
WHERE entidade_t LIKE '%cartography%'
)
OR eid IN (
SELECT entidade as eid
FROM t_entidade_actividade ea
LEFT JOIN t_actividades a ON a.aid = ea.actividade
WHERE a.actividade LIKE '%cartography%'
)
query #2: 22 rows / ~430ms;
SELECT eid FROM t_entidades WHERE entidade_t LIKE '%cartography%'
UNION
SELECT entidade as eid
FROM t_entidade_actividade ea
LEFT JOIN t_actividades a ON a.aid = ea.actividade
WHERE a.actividade LIKE '%cartography%'
Results when searching for cart
query #1: 715 rows / ~870ms;
query #2: 715 rows / ~450ms
Results when searching for car
query #1: never waited long enough… it seems it takes forever and over 1s would be too much
-- EXPLAIN OUTPUT:
"QUERY PLAN"
"Seq Scan on t_entidades (cost=44997.40..219177315.47 rows=500127 width=4)"
" Filter: ((SubPlan 1) OR (hashed SubPlan 2))"
" SubPlan 1"
" -> Materialize (cost=37712.46..38269.55 rows=40009 width=4)"
" -> Seq Scan on t_entidades (cost=0.00..37515.45 rows=40009 width=4)"
" Filter: ((entidade_t)::text ~~ '%car%'::text)"
" SubPlan 2"
" -> Hash Join (cost=36.48..7284.20 rows=298 width=4)"
" Hash Cond: (ea.actividade = a.aid)"
" -> Seq Scan on t_entidade_actividade ea (cost=0.00..5826.63 rows=378163 width=8)"
" -> Hash (cost=36.46..36.46 rows=1 width=4)"
" -> Seq Scan on t_actividades a (cost=0.00..36.46 rows=1 width=4)"
" Filter: ((actividade)::text ~~ '%car%'::text)"
query #2: 23661 rows / ~860ms
-- EXPLAIN OUTPUT:
"QUERY PLAN"
"HashAggregate (cost=45303.48..45706.55 rows=40307 width=4)"
" -> Append (cost=0.00..45202.72 rows=40307 width=4)"
" -> Seq Scan on t_entidades (cost=0.00..37515.45 rows=40009 width=4)"
" Filter: ((entidade_t)::text ~~ '%car%'::text)"
" -> Hash Join (cost=36.48..7284.20 rows=298 width=4)"
" Hash Cond: (ea.actividade = a.aid)"
" -> Seq Scan on t_entidade_actividade ea (cost=0.00..5826.63 rows=378163 width=8)"
" -> Hash (cost=36.46..36.46 rows=1 width=4)"
" -> Seq Scan on t_actividades a (cost=0.00..36.46 rows=1 width=4)"
" Filter: ((actividade)::text ~~ '%car%'::text)"
So, searching car using query #1 seems to take forever… Which is funny considering that SELECT eid FROM t_entidades takes only around 4s returning all 350k+ rows…
The only difference between EXPLAINs for query #1 at the different steps is that for car the following row appears: ” -> Materialize (cost=37712.46..38269.55 rows=40009 width=4)”
If someone would care to explain why query #1 takes so long to execute at the last example and exactly what’s happening at each step of the explain, it would be very appreciated because I never seem to get it…
The plan for query#1 reads to me as:
An “explain analyze” would be able to tell you how often steps 1.1 and 1.2 were actually run for the query… if the scan in step 1.1 is being done for each row from step 1, then your query time will grow O(n^2) where n is the number of rows in t_entidades, and the temp space used for each iteration of 1.1 will increase as the number of matches in that table increases.
Your query 2 is much better written, IMHO. Each of the two sets of IDs are produced in quite different ways, so put them in separate queries and use a UNION to merge them together at the end. It also cuts out the useless outer scan of t_entidades in query 1 that just passes through IDs from the where clause. (Not that it’s relevant to PostgreSQL, but it also makes it clear that the two scans could be run in parallel and then merged, but never mind).
t_entidade_actividade.actividademight need an index?