I have some html documents for which I need to return the number of words in the document. This count should only include actual text (so no html tags e.g. html, br, etc).
Any ideas how to do this? Naturally, I would prefer to re-use some code.
Thanks,
Assaf
Strip out the HTML tags , get the text content , reuse Jsoup
Read file line by line , hold a
Map<String, Integer> wordToCountMapand read through and operate on theMap