I’ve got a simple procedure to strip a string of all characters illegal in XML:
string SanitizeXml(string xml) { return string.Concat (xml.ToCharArray().Where(c => IsLegalXmlChar(c)).ToArray()); }
It’s nice and terse. But I’m concerned about its performance. The same thing could easily be accomplished using a simple for loop:
string SanitizeXml(string xml) { var buffer = new StringBuilder(); foreach(char c in xml) { if (IsLegalXmlChar(c)) { buffer.Append(c); } } return buffer.ToString(); }
What sticks out to me is that, in the second example, xml is converted to a char[], and Where()’s IEnumerable<char> back to a char[]. I seem to do this a lot with LINQ–change between arrays and enumerables.
Should I be concerned about this? What kind of performance hit am I going to get, in general, for relying on LINQ extension methods when there a clear alternative that may be a little more verbose.
Perhaps this is too broad of a question.
Well, you don’t need the first call to
ToCharArray()to start with – string implementsIEnumerable<char>. However, I agree that a StringBuilder and a loop would probably be more appropriate in this case.I’m not sure what string.Concat(char[]) does offhand, btw – why aren’t you just using the string constructor which takes a char array? In other words, after these modifications:
I still prefer a StringBuilder solution, but that could be improved for the common case (where there are few illegal characters) by giving an appropriate capacity to start with:
One alternative I hadn’t thought of before might be an extension method on StringBuilder:
Then you could use it like this:
(You could have other overloads for AppendSequence to take IEnumerable etc, if you wanted.)
EDIT: Another alternative might be to avoid calling Append so often, using instead the overload which appends a substring. You could then again build an extension method for StringBuilder, something like (completely untested, I’m afraid – I haven’t even tried compiling it):
Usage for the example:
Another alternative would be to change it to be an extension method on String, create the StringBuilder lazily, and if you get to the end with start=0, just return the original string.