string str1 = '12345ABC...\\...ABC100000'; // Hypothetically huge string of 100000 + Unicode Chars str1 = str1.Replace('1', string.Empty); str1 = str1.Replace('22', string.Empty); str1 = str1.Replace('656', string.Empty); str1 = str1.Replace('77ABC', string.Empty); // ... this replace anti-pattern might happen with upto 50 consecutive lines of code. str1 = str1.Replace('ABCDEFGHIJD', string.Empty);
I have inherited some code that does the same as the snippet above. It takes a huge string and replaces (removes) constant smaller strings from the large string.
I believe this is a very memory intensive process given that new large immutable strings are being allocated in memory for each replace, awaiting death via the GC.
1. What is the fastest way of replacing these values, ignoring memory concerns?
2. What is the most memory efficient way of achieving the same result?
I am hoping that these are the same answer!
Practical solutions that fit somewhere in between these goals are also appreciated.
Assumptions:
- All replacements are constant and known in advance
- Underlying characters do contain some unicode [non-ascii] chars
All characters in a .NET string are ‘unicode chars’. Do you mean they’re non-ascii? That shouldn’t make any odds – unless you run into composition issues, e.g. an ‘e + acute accent’ not being replaced when you try to replace an ‘e acute’.
You could try using a regular expression with
Regex.Replace, orStringBuilder.Replace. Here’s sample code doing the same thing with both:I wouldn’t like to guess which is more efficient – you’d have to benchmark with your specific application. The regex way may be able to do it all in one pass, but that pass will be relatively CPU-intensive compared with each of the many replaces in StringBuilder.