I am working for a startup that is building a iphone app. And i would like to ask a few questions to improve an algorithm we use for string matching.
We have a database that has a huge list of phone numbers along with the name of the user who owns the phone number. Lets say that the database looks like this
name phonenum
hari 1234
abc 3873
….
This database has large number of rows (around 1 million). When the user opens the app, the app gets the list of phone numbers from the person’s phone contacts and matches it with the database. We return all the phone numbers that are present in the database. Right now, what we do is very very inefficient. We send the phone numbers from phone contacts in sets of 20. And we match it with the database. This will lead to a complexity of num of phone contacts * O(n).
I thought of some improvements like having the database rows sorted by phone numbers so that we can do binary search. In addition to that, we can have a hash table containing some 10,000 phone numbers in the cache memory and we can search against this cache memory initially. Only if there is a miss, we will access the database and search the database with complexity of O(log n) using binary search.
Also, there is the issue of sending phone numbers for matching. do i send them as such or send them as a hashed value ? will that matter in terms of improving performance?
Is there any other way of doing this thing?
I explained the whole scenario so that you can have a better understanding of my need
thanks
If you already have an SQL Server database, let it take care of that. Create an index on the phone number column (if you don’t have it already). Send all the numbers in the contact list in one go (no need to split them by 20) and match them against the database. The SQL server probably uses much better indexing than anything you could come up with, so it’s going to be pretty fast.
Alternatively, you can try to insert the numbers into a temporary table and query against that, but I have no idea whether that would be faster.