Given endless stream of numbers (BigInteger) how can the numbers with top N appearances (frequencies) be detected and stored?
The memory is limited (cannot store counter per number).
EDIT:
the frequency value cannot exceed size of long
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
The top N appearances cannot be determined until you have all the data (or most of it)
You can determine the N appearance so far by counting how often they have appeared and sorting them by count. This could be a problem if you can’t store than many in which case you have to decide what compromises you are will to make to save space.
I assume
longis not large enough. What sort of data are you counting?A trivial example which demonstrates the problem you face.
Say your endless stream of account ids are all different. This means the only way to record the top N is to record them all. Without some short cuts, there is not other possible solution.
Note: it is likely what you really want it a decaying average so that if a user has been seen for some time you want to reduce their weight. You don’t want the top user to be one which is no longer active.