Ok, I’m reading data from a stream using a StreamReader. The data inside the stream is not xml, it could be anything.
Based on the input StreamReader I’m writing to an output stream using an XmlTextWriter. Basically, when all is said and done, the output stream contains data from the input stream wrapped in a element contained in a parent element.
My problem is twofold. Data gets read from the input stream in chunks, and the StreamReader class returns char[]. If data in the input stream contains a ‘]]>’ it needs to be split across two CDATA elements. First, how do I search for ‘]]>’ in a char array? And second, because I’m reading in chunks, the ‘]]>’ substring could be split across two chunks, so how do I account for this?
I could probably convert the char[] to a string, and do a search replace on it. That would solve my first problem. On each read, I could also check to see if the last character was a ‘]’, so that on the next read, if the first two characters are ‘]>’ I would start a new CDATA section.
This hardly seems efficient because it involves converting the char array to a string, which means spending time to copy the data, and eating up twice the memory. Is there a more efficient way, both speedwise and memory wise?
Indeed, you would have to keep back the last two characters in a queue instead of spitting them out immediately. Then when new input comes in, append it to the queue and again take all but the last two characters, search-and-replace over them, and output.
Better: don’t bother with a CDATA section at all. They’re only there for the convenience of hand-authoring. If you’re already doing search-and-replace, there’s no reason you shouldn’t just search-and-replace ‘<’, ‘>’ and ‘&’ with their predefined entities, and include those in a normal Text node. Since those are simple single-character replacements, you don’t need to worry about buffering.
But: if you’re using an XmlTextWriter as you say, it’s as simple as calling WriteString() on it for each chunk of incoming text.