I am trying to develop a simple application in C# to count the number of duplicates in a listbox. I need to count all the number of duplicates and display a rank suffix to the top 3 elements most duplicated. For example, suppose a list has 7 elements called ‘apple’, 6 elements called ‘pear’, 4 elements called ‘peach’ and 3 elements called ‘orange’, after the process, it should display the list as:
apple (7) pear (6) peach (4) orange
Here is an alternative method to using Linq, presented as a timed test to see which performs faster. These are the results I obtained with 1000 iterations:
The LinkMethod is only about 1.6 times slower in this case. Not as bad as a lot of Linq code that I have performance tested, but it was only 1324 words.
Edit #1
That was before adding the sort. With the sort, you can see that it is comparible with the Linq method. Of course, copying the hash to a list and then sorting the list isn’t the most efficient way to do this. We could improve on this. There are a couple of ways that come to mind, but none of them are simple and would require writing a lot of custom code.
Since we want to use what’s already available and we want the code to be clear, I have to say that Linq is in fact a very good choice. This has changed my opinion of Linq.. a little. I’ve seen far too many other comparisons where Linq ends up disastrously slower (on the order of 1,000s of times slower) to give a green light to using Linq anywhere and everywhere, but certainly in this one place it shines very well.
I guess the moral is, as it always has been, test, test, test.
Here are the values with the sort added to HashMethod.
Edit #2
A couple of simple optimizations (pre-initializing both the dictionary and the list) make HashMethod a bit noticably faster.
Edit #3
With a larger word set, they become much more even. In fact, the Linq method seems to edge out every time. Here is the United States Constitution (All seven articles and signatures). This may be due to the fact that the declaration contains a lot of repeat words (“He has …”).
Code: