I’m trying to optimize a query which is taking a long time. The goal of the query is to get best similar F2 .(Specially similarity measure)
This is an example of what I have:
CREATE TABLE Test
(
F1 varchar(124),
F2 varchar(124),
F3 varchar(124)
)
INSERT INTO TEST ( F1, F2, F3 ) VALUES ( 'A', 'B', 'C' )
INSERT INTO TEST ( F1, F2, F3 ) VALUES ( 'D', 'B', 'E' )
INSERT INTO TEST ( F1, F2, F3 ) VALUES ( 'F', 'I', 'G' )
INSERT INTO TEST ( F1, F2, F3 ) VALUES ( 'F', 'I', 'G' )
INSERT INTO TEST ( F1, F2, F3 ) VALUES ( 'D', 'B', 'C' )
INSERT INTO TEST ( F1, F2, F3 ) VALUES ( 'F', 'B', 'G' )
INSERT INTO TEST ( F1, F2, F3 ) VALUES ( 'D', 'I', 'C' )
INSERT INTO TEST ( F1, F2, F3 ) VALUES ( 'A', 'B', 'C' )
INSERT INTO TEST ( F1, F2, F3 ) VALUES ( 'A', 'B', 'K' )
INSERT INTO TEST ( F1, F2, F3 ) VALUES ( 'A', 'K', 'K' )
Now if I run this query:
SELECT B.f2,COUNT(*) AS CNT
FROM
(
select F1,F3 from Test
where F2='B'
)AS A
INNER JOIN Test AS B
ON A.F1 = B.F1 AND A.F3 = B.F3
GROUP BY B.F2
ORDER BY CNT DESC
The table has 1m+ rows.
What would be a better way to do this?
A filtered search for all rows
WHERE F2 = 'B'will incur a full table scan unless you create an index that has F2 as its first or only column. Further down, the join condition involves columns F1 and F3, which you mention are already part of an index that begins with F1.I also notice that the first part of the your query doesn’t eliminate duplicates for the set of (T1, T3) where T2 = ‘B’, as one might expect when intersecting that set right back against another subset of the same table. You may have a reason for doing this, but we can’t know for sure until you provide some details about the similarity measurement algorithm you’re trying to implement.
Your
ORDER BYclause is also affecting the query run time by incurring a potentially large, internal sort on the final result set.