I’m currently in the initial phases of planning a rewrite for a large module in our CRM application.
One area I am currently looking into is database optimization, I haven’t made any decision yet but I just want to make sure I understand the concept of ROW_OVERFLOW_DATA properly – http://msdn.microsoft.com/en-us/library/ms186981.aspx
We are using SQL server 2005, it’s my understanding that the row size limit is 8,060 bytes and that after that overflow will occur.
I ran a query to get my max row size for a particular read intensive database
SELECT OBJECT_NAME (sc.[id]) tablename
, COUNT (1) nr_columns
, SUM (sc.length) maxrowlength
FROM syscolumns sc
join sysobjects so
on sc.[id] = so.[id]
WHERE so.xtype = 'U'
GROUP BY OBJECT_NAME (sc.[id])
ORDER BY SUM (sc.length) desc
This gave me a few tables with a maxrowlength that was sligtly above 8,000, but under 10,000. Another query shows that the average row size is actually quite small, around 1,000 bytes.
My question is: is ROW_OVERFLOW_DATA based on each row or is it per column? Once the 8,060 bytes limit is expanded is the entire column that caused it to overflow moved to another page or is it only the specific row?
So for example given the following simplified schema:
col1 (int) | col 2 (varchar (4000)) | col 3(varchar(5000))
1 | 4000 characters | 5000 characters ***This row is overflowing
2 | 4000 characters | 100 characters
3 | 150 characters | 150 characters
4 | 500 characters | 600 characters
Would every the col 3 of row 1 to 4 get replaced by a 24 bytes pointer or only rowID 1?
I am wondering cause if it’s every row gets a pointer it becomes important to fix it, if it’s only a few rows maybe we can take the performance hit.
Also, I’ve seen many blogs suggesting to move nullable columns toward the end of the database so that if the values are in fact NULL they don’t take any row space. Is this true? We tend to keep our timestamp and tracking columns at the end cause it’s easier to visualize. Now I am wondering if maybe we shouldn’t move them further up as they are never NULL.
If you have one row in, say, a 100 million that overflows would you move the whole column? No.
For reference, a technet article from Paul Randal who is the God of this stuff (my bold)
And MSDN (my bold)
As for your NULLable columns, this is false. NULLable columns are stored at the end of the disk structure anyway regardless of column order in the table definition. And a reference from Paul Randal: Inside the Storage Engine: Anatomy of a record again. Any some previous answers from me here on SO