I wrote a custom XML reader because I needed something that would not read ahead from the source stream. I wanted the ability to have an object read its data from the stream without negatively affecting the stream for the parent object. That way, the stream can be passed down the object tree.
It’s a minimal implementation, meant only to serve the purpose of the project that uses it (right now). It works well enough, except for one method — ReadString. That method is used to read the current element’s content as a string, stopping when the end element is reached. It determines this by counting nesting levels. Meanwhile, it’s reading from the stream, character by character, adding to a StringBuilder for the resulting string.
For a collection element, this can take a long time. I’m sure there is much that can be done to better implement this, so this is where my continuing education begins once again. I could really use some help/guidance. Some notes about methods it calls:
Read – returns the next byte in the stream or -1.
ReadUntilChar – calls Read until the specified character or -1 is reached, appending to a string with StringBuilder.
Without further ado, here is my two-legged turtle. Constants have been replaced with the actual values.
public string ReadString() {
int level = 0;
long originalPosition = m_stream.Position;
StringBuilder sb = new StringBuilder();
sbyte read;
try {
// We are already within the element that contains the string.
// Read until we reach an end element when the level == 0.
// We want to leave the reader positioned at the end element.
do {
sb.Append(ReadUntilChar('<'));
if((read = Read()) == '/') {
// End element
if(level == 0) {
// End element for the element in context, the string is complete.
// Replace the two bytes of the end element read.
m_stream.Seek(-2, System.IO.SeekOrigin.Current);
break;
} else {
// End element for a child element.
// Add the two bytes read to the resulting string and continue.
sb.Append('<');
sb.Append('/');
level--;
}
} else {
// Start element
level++;
sb.Append('<');
sb.Append((char)read);
}
} while(read != -1);
return sb.ToString().Trim();
} catch {
// Return to the original position that we started at.
m_stream.Seek(originalPosition - m_stream.Position, System.IO.SeekOrigin.Current);
throw;
}
}
Right off the bat, you should using a profiler for performance optimizations if you haven’t already (I’d recommend SlimTune if you’re on a budget). Without one you’re just taking slightly-educated stabs in the dark.
Once you’ve profiled the parser you should have a good idea of where the
ReadString()method is spending all its time, which will make your optimizing much easier.One suggestion I’d make at the algorithm level is to scan the stream first, and then build the contents out: Instead of consuming each character as you see it, mark where you find
<,>, and</characters. Once you have those positions you can pull the data out of the stream in blocks rather than throwing characters into aStringBuilderone at a time. This will optimize away a significant amount ofStringBuilder.Appendcalls, which may increase your performance (this is where profiling would help).You may find this analysis useful for optimizing string operations, if they prove to be the source of the slowness.
But really, profile.