We have a source XML file that has an address node, and each node is supposed to have a zip_code node beneath in order to validate. We received a file that failed the schema validation because at least one node was missing it’s zip_code (there were several thousand addresses in the file).
We need to find the elements that do not have a zip code, so we can repair the file and send an audit report to the source.
--declare @x xml = bulkcolumn from openrowset(bulk 'x:\file.xml',single_blob) as s
declare @x xml = N'<addresses>
<address><external_address_id>1</external_address_id><zip_code>53207</zip_code></address>
<address><external_address_id>2</external_address_id></address>
</addresses>'
declare @t xml = (
select @x.query('for $a in .//address
return
if ($a/zip_code)
then <external_address_id />
else $a/external_address_id')
)
select x.AddressID.value('.', 'int') AddressID
from @t.nodes('./external_address_id') x(AddressID)
where x.AddressID.value('.', 'int') > 0
GO
Really, it’s the where clause that bugs me. I feel like I’m depending on a cast for a null value to 0, and it works, but I’m not really sure that it should. I tried a few variations with the .exist function, but I couldn’t get the correct result.
If you just want to locate those nodes that are missing their
<zip_code>element, you could use something like this:It uses the built-in
.exist()method in XQuery to check the existence of a subnode inside an XML node.