We work with some very large databases (300Gb – 1Tb). Tables can contain from 10M to 5B records. We do some not very complex data transformation involving some with and unpivot statements. The problem is that the data log file and tempdb grows huge and eventually server stops working.
Now I’m leaning to an idea that with and even unpivot constructions are expensive in terms of resource usage and we should consider some simplifications here:
- splitting into several steps with temp tables instead of using
with - using
unioninstead ofunpivot
Does anybody have experience like this?
Sincere thanks to everyone. Now its pretty obvious for me that it was UNPIVOT misusage. Indeed, CTEs are just views, so they don’t hurt that much unless you use them improperly.
So the basic of our problem was that our server (32 Gb RAM, 8 CPUs, 2Tb HDD) was simply unable to manage a big amount of records that UNPIVOT produced.
Let’s say we have HugeTable with fields (F1, F2, F3, F4, F5, F6). RecordCount = 1,000,000,000
We use it this way (pseudocode):
The query plan estimates that our UNPIVOT produces 6,000,000,000 records to be processed by our where clause. It becomes even worse with the fact that in reality we join some additional tables and do extra filterings. All this occurs 6 billion times. The transaction log and tempdb were still untouched – rather small. I’ve found no information that UNPIVOT/JOINS(hashjoins to be presize) uses RAM only to manage its operations but from what we experienced I understand, that our SQL Server 2008 R2 Enterprise was simply trying to fit that bulk recordset in RAM, but as we didn’t have 1Tb RAM the operating system was doing huge swapping operations.
The interesting thing here is that it may start up very quickly and process about 1,800,000,000 records for first 6 hours, but then hangs (well, it produces 100K records per 24 hours, which is not acceptable at all)
If we turn it into manual UNION ALL like this:
the query plan showed that CTE produced about 2 billion records. So all further joins had to be done against much smaller recordset than in 1st case. This took less than 10 hours to do the job (against days in 1st case).
BTW, we use SSIS/VS2008 environment to process our data loadings.