Using SQL Server 2008 R2
I’d like to create a table with the following columns
[id] INT IDENTITY(1,1) NOT NULL,
[user_id] INT NOT NULL,
[date] DATE NOT NULL,
[timestamp] DATETIME NOT NULL,
[xml_data] XML NOT NULL
with the primary key on the identity column and a non-clustered index on user_id and date that covers xml_data and timestamp.
However, I notice that I can’t add xml_data to the INCLUDE statement in the index. Sad face, since that’s going to result in an RID lookup when a user searches on user_id and date.
What’s the best way to store xml data that will be queried?
I figure my choices are
- Stick with xml and have well-formatted data but take the query hit
- Use a VARCHAR(MAX) with unknown pros/cons
- Use a VARBINARY(MAX) with unknown pros/cons
Note: I doubt I’ll be able to restrict the length of the string to even something like 8000.
If you have XML – store it as
XML, for two main reasons:it’s optimized for XML storage – it’s not stored as just plain text, it’s actually tokenized and stored more efficiently than plain text
you can actually query the XML when it’s stored as type
XMLBut: you cannot just index a XML column like that. Any index in SQL Server can be a maximum of 900 bytes long – an XML column could be up to 2 GB in size.
If you want to index your XML column, have a look at XML Indexes in SQL Server 2005 – it’s a separate type of index designed to handle queries into XML very efficiently.
Another way to speed up your XML queries could be to “surface” certain pieces of your XML that you query on often onto the parent table, by means of a stored function that extract that piece of information from the XML, and stores it as a computed persisted column on the parent table. Once it’s stored there, you can query it just like any other column, and you can index it, too! It only works for single pieces of information, however (e.g. the
OrderNumberfrom your order – you only ever have one of those) – it can’t be applied to collections of data.