I’d rather not have to fire up lingpipe if possible which leaves me wondering if there are any quick, easy ways in java to extract all the bigrams and trigrams from a string of text?
thanks
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
Always the easiest way is to use an existing library. You can take a look on simmetrics library. You can also use lucene NgramTokenizer. You can also implement this algorithm yourself. First, You have to find all words (using StringTokenizer) in the text and than generate n-grams you need.