We have a merge replication setup on SQL Server that goes like this: 1 SQL server at the office, another SQL server traveling around the world. The publisher is the SQL server at the office.
In about 1% of the cases, two of our tables with a column of XML Data type (not bound to a schema) are replicated with rows containing empty XML columns. ( This only happened when data is sent from the “traveling server” back home, but then again, data seems to be changed more often there ). We only have this in prod. environment ( WAN replication ).
Things i have verified:
- The row is replicated, as the last modification date on the row is refreshed but the xml column is empty. Of course it is not empty on the other SQL Server.
- No conflicts are displayed in the replication conflicts UI.
- It is not caused by the size of the data inside the XML Column as some are very small.
- Usually, the problem occurs in batch. ( The xml column of 8-9 consecutive rows will be empty )
- The problem occurs if a row was inserted OR updated. No pattern there.
- The problem seems to occur, but this is pure speculation on my part when the connection is weaker. ( We’ve seen this problem happen more often when the server was far away as compared to when it was close by. )
Sorry if i have confused some things, I am not really a DBA, more of a DEV with knowledge of SQL but since the application using the database keeps getting blamed for the problems ( the XML column must not be empty!! ) I have taken it at heart to try and find the problem instead of just manually patching the data each time ( Whats the use of replication if you have to do that? )
If anyone could help out with this problem, or at least suggest some ways of being able to debug / investigate this it would be greatly appreciated.
I did search alot on google and I did find this: Hot Fix . But we do have the latest service pack and the problem seems a bit different.
fyi: We have a replication setup locally here but the problem never occurs. We will be trying a WAN simulator on it as well to see if that can help.
Thanks
Edit: hot fix is now available for my issue: http://support.microsoft.com/kb/2591902
After logging this issue with Microsoft, we were able to reproduce the problem without a slow link ( Big thanks to the competent escalation engineer at Microsoft ). The repro is a bit different from our scenario, but highlights the timing issue we were getting perfectly.
Create 2 tables – One parent one child (have a PK-FK relationship)
Insert 2 rows in the parent table
Set up replication – configure merge agent to run ON DEMAND
Sync
Once all is replicated:
On the PUBLISHER: delete one row from the parent table
On the SUBSCRIBER: Insert 2 rows of data that references the parentid you deleted above
Insert 5 rows of data that references the parentid that will stay in the table
Sync, Merge agent will fail, Sync again, Merge agent will succeed
Missing XML data on the publisher on the 5 rows.
Seems it is a bug that is in SQL Server 2005/2008 and 2008R2.
It will be addressed in a hot fix in 2008 and up. ( As SQL Server 2005 is no longer being altered )
Cheers.