I’m developing a custom email client in C#. One of the obvious requirements is that I don’t download already downloaded messages. This is done by comparing a unique ID string against messages stored in my database.
The database stores emails for multiple users and multiple accounts so the unique ID will not necessarily be unique in my database.
Currently I have something like this:
List<String> DownloadedUIDs = BLL.EmailsDataSource.ViewEmailUIDs(AccountNo);
foreach (string uid in serveruids) {
if (DownloadedUIDs.Contains(uid)) continue; // don't download messages we already have
...
}
I know the Contains() method performs a linear search which is very inefficient. If there are 5000 emails stored on the server then 5000 linear searches need to be made on a list of 5000 emails to determine whether or not the email already exists.
Would I see better performance asking SQL Server to order the unique IDs and then perform a Binary Search on them, or storing the unique IDs in a Hash Table? Or using some other data structure?
Does anyone know of any similar performance comparisons that have been made?
I decided to do some performance testing and these are the results I got (from connecting to the mail server to verifying all 3000 emails had been downloaded):
So it seems given my data at least that HashSets are quickest at doing this though there is little to choose between all 4 optimised methods.