I have to load a large file into memory and I want to find a substring. Which method is faster?
// application initialization
string instring = "which is faster find in string or list..."; // large string +- 150MB
List<string> inlist = new List<string>();
foreach (string word in instring) {
inlist.Add(word);
}
// button click
if (instring.Contains("find")) {
...
}
or
if (inlist.Contains("find")) {
...
}
I have did some measurement in my case String search was the fastest.
Singel search:
Boyer-Moore search found - elapsed: 00:00:00.0025893
String search found - elapsed: 00:00:00.0026120
List search not found - elapsed: 00:00:00.0026394
Multi search:
Boyer-Moore search found - elapsed: 00:00:00.0027377
Boyer-Moore search found - elapsed: 00:00:00.0028308
Boyer-Moore search found - elapsed: 00:00:00.0029269
Boyer-Moore search found - elapsed: 00:00:00.0030234
Boyer-Moore search found - elapsed: 00:00:00.0031210
String search found - elapsed: 00:00:00.0032474
String search found - elapsed: 00:00:00.0032653
String search found - elapsed: 00:00:00.0032832
String search found - elapsed: 00:00:00.0033015
String search found - elapsed: 00:00:00.0033201
List search not found - elapsed: 00:00:00.0033629
List search not found - elapsed: 00:00:00.0033826
List search not found - elapsed: 00:00:00.0033961
List search not found - elapsed: 00:00:00.0034155
List search not found - elapsed: 00:00:00.0034345
You’re testing radically different things.
For example, suppose you do indeed look for “find”, and you’ve got a file which is:
If you split that into a list of strings, one per word, then “find” doesn’t appear – because it’s only part of the word “finding”. Using
string.Containsyou will find it, however, as it’s a substring.You should work out your desired behaviour first, implement it in the simplest, most elegant fashion, then measure performance. If that meets your desired performance, you’re done. If not, you can then try to improve it, measuring at each point and making sure you’ve still got the behaviour you want.