I am looking for a simple lightweight java library that parses HTML. I have looked a lot and there are many options out there. But I cannot find something simple. I really would like to have something like pyquery in python except in java. My requirements are: fast, easy to use and lightweight.
What do I need it for? Not sure if this matters, but I need to index parts of an html documents. So I am hoping to be able to select part of that document quickly and then parse it.
I have used HTMLParser in the past. I wasn’t very happy with it. I found tagsoup and jsoup. I really like jsoup. Haven’t used it extensively yet but you can do something like: