In the past I’ve written code to find common words in a body of text, but I was curious if there is a known way to find common phrases in a body of text? (In java)
Does anyone know how to accomplish something like this without Lucene or nlp? What other tools or solutions are there?
It is difficult to give you an answer without knowing exactly what you want to do. A naive answer to your problem would be split the text in punctuation marks, and use a data structure to store the counters of every sentence in your text, incrementing the counter for every sentence you parse from the text.
You could use for example a priority queue to keep the sentences sorted by its counters. Then you could remove the maximum element n times for the n most common sentences, or pop sentences until the counter is greater than a number you want.
However, if you don’t want exact sentences, either you’ll have to change what you store in the priority queue or you would have to use another algorithm altogether.
Hope this at least helps!