I’m working with two extremely large tables (A is ~20E6 rows, B is ~65E3 rows) and I have very elaborate where clauses to get just the items I need. One thing that could speed it up is to first evaluate the date the entry was created (info goes back 10+ years, only need last 3 years); this would eliminate most of the data before doing expensive string processing. Is there a way to force the evaluation of the WHERE date > cutoff_date to happen before the other clauses?
Here is what my query looks like.
SELECT A.C1, A.C2,
SUM(CASE WHEN B.C1 = '[condition 1]' OR ... [condition n] THEN 1 ELSE NULL END)
AS SUM_OF_B_C1, ... [other sums]
FROM DB.TABLE_A A LEFT JOIN DB.TABLE_B AS B ON B.COMMON_COL = A.COMMON_COL
WHERE B.DATE > DATE('[cutoff date]')
AND [complex substr conditions]
AND NOT [other complex string conditions]
GROUP BY A.C1, A.C2
One problem, is that I only have read permission for the DB. Looking through Control Center, it seems that there is no index on either tables A or B, so that optimization route is out.
Any help you could provide would be great; my small test queries are taking about 5 minutes and in product queries will be about 20 times larger.
Trying to out-think modern query optimizers isn’t usually very productive. I think you should spend your time doing two other things.
columns that should have indexes.
EXPLAIN PLAN. Thislets you approach management with numbers that show your code is
dead slow not because you’re incompetent, but because
theyDB2 is forced to do table scans all over the place.are
Tables that have no indexes at all are often generated by programs either to simplify querying and reporting or to reduce the load on the main database server. Find out who’s responsible for those programs. Befriend them. Buy them a pizza. Think “social engineering”.