I have a poor man’s replication setup that I can’t do anything about. Some identifying data (basically primary key) from a call_table is copied into another table via a simple trigger, and then the “replication server” runs a stored procedure to copy the data from the queue table to a #temp table (to prevent locking in SQL 6.5 is the case that was made to me). Finally, a query uses the key data from the temp table to pull data back to the replication server from the call_table using this query:
/* select the data to return to poor man replication server */
SELECT c.id,
c.date,
c.time,
c.duration,
c.location
FROM #tmp q, call_table c (NOLOCK)
WHERE q.id=c.id
AND q.date=c.date
AND q.time=c.time
AND q.duration=c.duration
AND q.location=c.location
GROUP BY c.id,
c.date,
c.time,
c.duration,
c.location
Once a night the queue table is purged and this starts over. While investigating this, the implicit cross join jumped at me (I’m on the side that they are usually evil), but then I read The power of the Cross Join. I’m here because I’m not quite convinced. Say the temp table has about 10,000 rows for the day, the call_table has about 100,000 for the month so far. How is this query going to work? Does it mash the two tables together for a total of 1,000,000,000 in memory, then use the group clause to trim it back down? Could you explain what steps SQL takes to compile the results?
Execution Plans:
My Query:
|--Hash Match Root(Aggregate, HASH:([c].[id], [c].[date], [c].[location], [c].[time], [c].[duration]), RESIDUAL:(((((((((((((((((((((([c].[id]=[c].[id] AND [c].[PIN]=[c].[PIN]) AND [c].[ORIG]=[c].[ORIG]) AND [c].[date]=[c].[date]) AND [c].[CTIME]=[c].[CTIME
|--Hash Match Team(Inner Join, HASH:([q].[id], [q].[date], [q].[location], [q].[time], [q].[duration])=([c].[id], [c].[date], [c].[location], [c].[time], [c].[duration]), RESIDUAL:(((([c].[id]=[q].[id] AND [c].[location]=[q].[location]) AND [c].[duration]=[q].[duration]) AND [
|--Table Scan(OBJECT:([db].[dbo].[queue] AS [q]))
|--Table Scan(OBJECT:([db].[dbo].[call_table] AS [c]))
Yours:
|--Merge Join(Right Semi Join, MERGE:([q].[id], [q].[date], [q].[time], [q].[duration], [q].[location])=([c].[id], [c].[date], [c].[time], [c].[duration], [c].[location]), RESIDUAL:(((([q].[id]=[c].[id] AND [q].[location]=[c].[location]) AND [q].[duration]=[c].[duration]) AND [q].[
|--Index Scan(OBJECT:([db].[dbo].[queue].[PK_queue] AS [q]), ORDERED)
|--Sort(ORDER BY:([c].[id] ASC, [c].[date] ASC, [c].[time] ASC, [c].[duration] ASC, [c].[location] ASC))
|--Table Scan(OBJECT:([db].[dbo].[call_table] AS [c]))
The query you described is no way a
CROSS JOIN.SQL Serveris smart enough to transform theWHEREcondition into theJOIN‘s.However, I see no point in
GROUP BYhere.This query:
can be easilty rewritten as
, provided that
c.idis aPRIMARY KEY.If it’s not, just add
DISTINCTtoSELECTabove.Update:
From your plan I see that that your query uses
HASH JOIN, while my usesMERGE SEMI JOIN.The latter one is usually more efficient if you have an ordered set, but for some reason the query does not use the composite index you created, but instead performs full table scan.
This is strange, since all your values are contained within the index.
Probably (probably) this is because your fields allow
NULL‘s.Make sure that you use only the fields from the composite index both in
WHEREconditions and inSELECTclause and, if possible, make themNOT NULL.This should make your query use preordered resultsets in
MERGE SEMI JOIN. You can tell it if you see neitherTABLE SCANnorSORTin the plan, just twoINDEX SCAN‘s.And two more questions:
c.idaPRIMARY KEYoncall_table?q.idaPRIMARY KEYon#tmp?If answer to both questions is
yes, then you will benefit from doing two things:PRIMARY KEYasCLUSTEREDon both tablesRewriting your query as this: