I’m pulling my hair out on this. I do some manual deserialization using XmlReader – nothing serious, done that zilion times. But this is something I can’t figure out.
This is sample xml file
<?xml version="1.0" encoding="utf-8"?>
<Theme name="something" version="1.0.0.0">
<Thumbnail length="1102">[some base64 encoded data]
</Thumbnail>
<Backgrounds>
<string>Themes\something\Backgrounds\file1</string>
<string>Themes\something\Backgrounds\file2</string>
<string>Themes\something\Backgrounds\file3</string>
</Backgrounds>
<Stickers>
<string>Themes\something\Stickers\stick1</string>
<string>Themes\something\Stickers\stick1</string>
<string>Themes\something\Stickers\stick1</string>
</Stickers>
<PreviewImages>
<string>Themes\something\Preview\rh_01.jpg</string>
<string>Themes\something\Preview\rh_02.jpg</string>
<string>Themes\something\Preview\rh_03.jpg</string>
</PreviewImages>
</Theme>
This is deserialization code (a bit simplified):
public void ReadXml(System.Xml.XmlReader reader)
{
/* Read attributes - not important here */
while (reader.Read())
{
Console.WriteLine("Main: {0} {1}", reader.NodeType, reader.Name);
switch (reader.Name)
{
case Xml.Elements.Thumbnail:
this._thumbnail = Xml.DeserializeBitmap(reader);
Console.WriteLine("Inner: {0} {1}", reader.NodeType, reader.Name);
break;
case Xml.Elements.Backgrounds:
this._backgrounds = Xml.DeserializeListOfStrings(reader);
break;
case Xml.Elements.Stickers:
this._stickers = Xml.DeserializeListOfStrings(reader);
break;
case Xml.Elements.PreviewImages:
this._previewImages = Xml.DeserializeListOfStrings(reader);
break;
}
if (reader.NodeType == System.Xml.XmlNodeType.EndElement
&& reader.Name == Xml.Root)
break;
}
}
The problem:
After this._thumbnail is deserialized, the reader is positioned on closing element of Thumbnail node. Then reader.Read() at the beginning of while loop is called… and the reader gets positioned on starting element of a string node. The Backgrounds element is skipped! Why?
This happens when the reader is the XmlTextReader and it’s WhitespaceHandling property is set to WhitespaceHandling.None or WhitespaceHandling.Significant.
If it is set to WhitespaceHandling.All everything works as expected. After calling reader.Read() the reader is positioned on starting element of Backgrounds node.
[EDIT] I’ve added two debug lines to the example code.
With WhitespaceHandling.All I get this:
Main: Whitespace
Main: Element Thumbnail
Inner: EndElement Thumbnail
Main: Element Backgrounds
Main: Whitespace
Main: Element Stickers
Main: Whitespace
Main: Element PreviewImages
Main: Whitespace
Main: EndElement Theme
With WhitespaceHandling.Significant I get this:
Main: Element Thumbnail
Inner: EndElement Thumbnail
Main: Element string
Main: Text
Main: EndElement string
Main: Element string
Main: Text
Main: EndElement string
Main: Element string
Main: Text
Main: EndElement string
Main: EndElement Backgrounds
[EDIT 2] Adjusted debug output a bit to be more readable.
As you can see, the debug output for WhitespaceHandling.Significant ends on </Backgrounds>. That’s because my Xml.DeserializeListOfStrings does not yet check if it’s positioned correctly and “accidentally” reads document to the end. But that’s not the scope of this question.
The cause of my headache is
XmlReader.ReadElementContentAsBase64method that I use to deserialize<Thumbnail>node. I was experimenting with it in a loop:However MSDN says that:
It seems that despite reading to the end of element’s content (I know data length so theoretically I can do that), the
XmlReaderdid not consider that I’ve “consumed” all of the element’s content. That caused some unexpected behaviour described in MSDN.The
XmlReaderbehaved the same withWhietespaceHandling.AllandWhietespaceHandling.Significant. My code worked withWhietespaceHandling.Allbecause after last call toXmlReader.ReadElementContentAsBase64, thereaderwas skipping non significant whitespace. If source xml file would contain no newlines and tabs, my code would fail withWhietespaceHandling.Alltoo.The solution is to modify while loop to make one additional call to
XmlReader.ReadElementContentAsBase64after all bytes are red. The downside of this approach is that after that additional call thereaderis moved to the node following the EndElement node.One could also use
XmlTextReader.ReadBase64method to read whole element content at once, but I’m forced to use onlyXmlReaderbase as my class implements IXmlSerializable, so this method is not available for me.