It’s crazy, but query performance is about 50% WORSE after I add a primary XML index to my xml field.
Here’s what I’m doing.
-
I have a table containing an XML field ActivityStepLog (contains LogData, XML)
-
I generate sample data to insert into this table by running the following
INSERT INTO dbo.ActivityStepLog (
LogGUID
,LogContextID
,LogTypeID
,LogSourceName
,LogContent
,LogDate
,CreateDate
,CreatedBy
)
select
LogGUID = newid()
,LogContextID = newid()
,LogTypeID = 2
,LogSourceName = ‘test test test’
,LogContent = (SELECT top 1 * FROM ##SampleData SampleData1 where DecisionLogID = SampleData.DecisionLogID FOR XML AUTO, ELEMENTS, ROOT(‘BusinessRule’) )
,LogDate = current_timestamp
,CreateDate = current_timestamp
,CreatedBy = ‘test create by’
from ##SampleData SampleData
SampleData has 100,000 rows, I run it in a loop 5 times so end up with 500,000 rows.
-
The LogContent field will end up having data such as the following:
-2147483643
0569281A-D1A3-49E3-9E68-BCAC62E2C1C3
1016
2
0
-2147483495
1
2009-05-18T11:47:00
none
(sorry, not sure if this will be formatted properly – it’s just basically a short set of elements).
And then I just run a very simple sql –
SELECT *
FROM ActivityStepLog
WHERE LogContent.value('(/BusinessRuleDecisionLog/SampleData1/DecisionLogID)[1]', 'int') = -2147483535
Before creating the primary xml index on LogContent, it takes 8 seconds, after, it takes about 12 seconds. I’ve cleaned out cache, etc (DROPCLEANBUFFERS and FREEPROCCACHE ), it doesn’t seem to affect the proportions though it does affect the overall time.
Here’s my statistics:
WITH xml index
Table ‘xml_index_nodes_325576198_256000’. Scan count 1000000, logical reads 3517272, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table ‘ActivityStepLog’. Scan count 1, logical reads 71694, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
With OUT xml index
(5 row(s) affected)
Table ‘ActivityStepLog’. Scan count 1, logical reads 71694, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
So, the logical reads are much less withOUT the xml index. I tried adding ALL the available secondary indexes, that didn’t improve performance over having a primary xml index.
I’ll be doing some more research on this, but I would really appreciate any pointers or comments.
thanks,
Sylvia
From doing more research on this – it appears that for UNTYPED xml fields, at least in my test case, the xml indexes degrade performance. This appears to be different for typed xml, though I didn’t look into it much.
One thing that DID improve performance tremendously (thank you for the idea to wBob on the msdn sql xml forum!) was to create a full text index on the xml field. I got subsecond performance at that point. I also included an xml filter as well for accuracy.
I sill need to research whether this fits all my filtering needs, but so far it looks good.
Sylvia