I am supposed to remove whole rows and part of XML-documents from a table with an XML column based on a specific value in the XML column. However the table contains millions of rows and gets locked when I perform the operation. Currently it will take almost a week to clean it up, and the system is too critical to be taken offline for so long.
Are there any ways to optimize the xpath expressions in this script:
declare @slutdato datetime = '2012-03-01 00:00:00.000'
declare @startdato datetime = '2000-02-01 00:00:00.000'
declare @lev varchar(20) = 'suppliername'
declare @todelete varchar(10) = '~~~~~~~~~~'
CREATE TABLE #ids (selId int NOT NULL PRIMARY KEY)
INSERT into #ids
select id from dbo.proevesvar
WHERE leverandoer = @lev
and proevedato <= @slutdato
and proevedato >= @startdato
begin transaction /* delete whole rows */
delete from dbo.proevesvar
where id in (select selId from #ids)
and ProeveSvarXml.exist('/LaboratoryReport/LaboratoryResults/Result[Value=sql:variable(''@todelete'')]') = 1
and Proevesvarxml.exist('/LaboratoryReport/LaboratoryResults/Result[Value!=sql:variable(''@todelete'')]') = 0
commit
go
begin transaction /* delete single results */
UPDATE dbo.proevesvar SET ProeveSvarXml.modify('delete /LaboratoryReport/LaboratoryResults/Result[Value=sql:variable(''@todelete'')]')
where id in (select selId from #ids)
commit
go
The table definitions is:
CREATE TABLE [dbo].[ProeveSvar](
[ID] [int] IDENTITY(1,1) NOT NULL,
[CPRnr] [nchar](10) NOT NULL,
[ProeveDato] [datetime] NOT NULL,
[ProeveSvarXml] [xml] NOT NULL,
[Leverandoer] [nvarchar](50) NOT NULL,
[Proevenr] [nvarchar](50) NOT NULL,
[Lokationsnr] [nchar](13) NOT NULL,
[Modtaget] [datetime] NOT NULL,
CONSTRAINT [PK_ProeveSvar] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY],
CONSTRAINT [IX_ProeveSvar_1] UNIQUE NONCLUSTERED
(
[CPRnr] ASC,
[Lokationsnr] ASC,
[Proevenr] ASC,
[ProeveDato] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
The first insert statement is very fast. I believe I can handle the locking by committing 50 rows at a time, so other requests can be handled in between my transactions.
The total number of rows for this supplier is about 5.5 million and the total rowcount in the table is around 13 million.
I’ve not really used xpath within SQL server before, but something which stands out is that you’re doing lots of reads and writes in the same command (in the second statement). If possible, change your queries to..
This means that the first query will only create the new temporary table, and not write anything back, which will take slightly longer than your original, but the key thing is that your second query will ONLY be deleting records based on what’s in your temporary table.
What you’ll probably find is because it’s deleting records, it’s constantly re-building indices, and causing the reads to also be slower.
I’d also delete/disable any indices/constraints that don’t actually help your query run.
Also, you’re creating your clustered primary key on the ID, which isn’t always the best thing to do. Especially if you’re doing lots of date scans.
Can you also view the estimated execution plan for the top query, it would be interesting to see the order in which it checks the conditions. If it’s doing the date first, then that’s fine, but if it’s doing the xpath before it checks the date, you might have to separte it into 3 queries, or add a new clustered index on ‘proevedato,id’. This should force the query to only run the xpath for records which actually match the date.
Hope this helps.