I have a table that we just enabled FileStreams on. We created a new varbinary column and set it to store to a filestream. Then we copied everything from the existing column to the new one in order to get the file data pushed to the file system.
So far so good.
However, we weren’t able to take the DB offline while doing this (uptime SLA) and there were 2 records out of 7400 that came in after the update statement ran but before we renamed the columns. We currently have 2 columns: FileData and FileDataOld. Where FileData is the one tied to the filestream.
The average file size is a little over 2MB. So, I decided to run a very simple select statement to find the records that didn’t go:
select DocumentId, FileName
from docslist
where FileData is null
When I ran this query, the CPU spiked to 80% and sat there for quite a while. Ultimately I killed the select after 2 minutes because that was just insane.
If I run something like:
select DocumentId, FileName from docslist
It returns almost instantly.
However, as soon as I try to query where FileData or FileDataOld is null it spins off into forever land.
When using Resource Monitor, and I query for ‘FileData is null’, I can see it pulling every byte from every single document off the file system. Which is pretty odd; you’d think that info would be stored within the table itself.
If I query for FileDataOld is null, it looks like it’s trying to load the entire table (16GB) in memory.
How can I improve this?? I just need to get the 2 records that happened after the update statement and force those documents to move over.
Can’t you do:
On mdsn it says:
Reference here