I am trying to find an efficient way to sort an array of strings based on a numeric value within each string element of the array. I am currently using the Array.Sort(array, customComparer) static method (quick sort), with my custom comparer class (sorting in descending order) being:
class StringComparer : IComparer<string>
{
public int Compare(string a, string b)
{
string s1 = a;
string s2 = b;
Match matchA = Regex.Match(s1, @"\d+$");
Match matchB = Regex.Match(s2, @"\d+$");
long numberA = long.Parse(matchA.Value);
long numberB = long.Parse(matchB.Value);
if (numberB - numberA < 0)
{
return -1;
}
else
{
return 1;
}
}
}
This works very well, but sometimes it takes too much time to sort, with an array of 100 000 strings taking more than a minute on a 2.4Ghz processor. I wonder if there is a more efficient way to accomplish the same. For example, implementing a different sorting algorithm or taking another approach like using a dictionary and sorting on the value (the value being the numeric part of the string). Any suggestions? Thanks in advance!
First, you’re needlessly parsing the same string over and over (both matching with the regular expression and then parsing the matches). Instead, encapsulate what you have into a custom type so that you only have to parse once.
I’d even add a
Contract.Requiresto this class that says thatfoomust satisfy the regular expression.Second, you have an
IComparer<T>that dies on certain values ofT(in your case,strings that don’t match the regular expression and can’t be parsed to along). This is generally a bad idea.So, make the comparer for
FooString:Now, your sorting will be blazingly fast because you’ve stopped parsing the same string over and over.