I have to remove common words like (is,are,am,was etc) from a text file. what is the efficient way of doing it in java ?
Share
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
You will have to read the file in, skipping the words you want to remove, and then write the file back out again.
Because of this, you may prefer to just skip the words you want to ignore each time you read it – depends on your use case.
To actually remove the words on a line-by-line basis (which may not be the way you want to do it anyway), you could do this (using google guava):
Running this code will output:
I.e, “a” and “for” are omitted.
Notice that this makes for simple code, but, it will change the whitespace formatting in your file. If you had a line with doubled up spaces, tabs etc, then this all gets changed to a single space in this code. This is just an example of how you might do it, depending on your requirements, there will probably be better ways.