In a recent CODE Magazine article, John Petersen shows how to use bitwise operators in TSQL in order to store a list of attributes in one column of a db table.
Article here.
In his example he’s using one integer column to hold how a customer wants to be contacted (email,phone,fax,mail). The query for pulling out customers that want to be contacted by email would look like this:
SELECT C.*
FROM dbo.Customers C
,(SELECT 1 AS donotcontact
,2 AS email
,4 AS phone
,8 AS fax
,16 AS mail) AS contacttypes
WHERE ( C.contactmethods & contacttypes.email <> 0 )
AND ( C.contactmethods & contacttypes.donotcontact = 0 )
Afterwards he shows how to encapsulate this in to a table function.
My questions are these:
1. Is this a good idea? Any drawbacks? What problems might I run in to using this approach of storing attributes versus storing them in two extra tables (Customer_ContactType, ContactType) and doing a join with the Customer table? I guess one problem might be if my attribute list gets too long. If the column is an integer then my attribute list could only be at most 32.
2. What is the performance of doing these bitwise operations in queries as you move in to the tens of thousands of records? I’m guessing that it would not be any more expensive than any other comparison operation.
If you wish to filter your query based on the value of any of those bit values, then yes this is a very bad idea, and is likely to cause performance problems.
Besides, there simply isn’t any need – just use the bit data type.
The reason why using bitwise operators in this way is a bad idea is that SQL server maintains statistics on various columns in order to improve query performance – for example if you have an email column, SQL server can tell you roughly what percentage of values that email column are true and select an appropriate execution plan based on that knowledge.
If however you have flags column, SQL server will have absolutely no idea how many records in a table match
flags & 2(email) – it doesn’t maintain these sorts of indexes. Without this sort of information available to it SQL server is far more likely to choose a poor execution plan.