I have encountered the following interview question from another website:
You are given a bunch of emails in an inbox. You want to send all the
sender addresses to some server. You can send them in batches (each
batch containing a bunch of sender email addresses). The restriction
is that no batch can contain duplicate email address. How would you
write a program to send all the email addresses in batches such that
it takes the minimum number of batches.Analyze the complexity
The answer to this that I like involves placing the emails into a binary search tree (thus removing the duplicates), then serializing it and sending it. This would send just one batch, and is O(n*log n) time. Anyone care to chime in with a better solution?
You can use hash, first you check if special name is in hash, if not, you will put it hash and add it to batch. this is O(n) in average, but your current method is O(n logn).
Your current approach is O(n log n) because creating binary tree takes O(n logn), as you any comparison base algorithm, fails to bit
n log nbarrier.Also about the hash function, it takes O(n) in average. In all it’s better than sorting methods in speed, but it takes may be too much space, and you should consider your data format.