How can I find commonly occurring sequences in a string in Java?
The string is a long sequence of digits and I want to find the most commonly occurring sequences of digits.
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
I guess that depends on how long the sequences are that you are looking for.
What I would do is use a Guava
Multiset, iterate over the sequence, write all subsequences to the Multiset and sort that by occurrence. Here’s a sample implementation:And about performance: This test method takes about a second on my machine for unlimited length and about 25 milliseconds when I limit the pattern length to 12 chars