I use jericho parser in my application to get a lighter version of a web page, extracting some parts from it. So, for instance, when I get this code:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN/" "http://www.w3.org/TR/html4/loose.dtd"><html> <head> </head> <body> <b> <span class="articletitletext">Happy New Year!</span></b> <br> <span class="postedstamp">Posted By <script language="JavaScript" type="text/javascript"> <!-- document.write('<a href=" mailto:chris.wyman@verizon.net">'); // --> </script>Chris</a> on January 1, 2012</span><br> <br> <span id="intelliTXT">
From all of us here at TheForce.net, we wish you and your family a safe and Happy New Year. May the Force be with you in 2012!
</span></body> </html>
I’d like to parse it once again using jericho parser, but when I run
ArrayList<Element> centerElems=(ArrayList<Element>) pageSource.getAllElements(HTMLElementName.CENTER);
I got this exception
01-01 10:46:37.518: ERROR/AndroidRuntime(648): java.lang.RuntimeException: Unable to start activity ComponentInfo{net.test.theforce/net.test.theforce.NewsListActivity}: java.lang.RuntimeException: java.lang.ClassCastException: java.util.Collections$EmptyList
and the application crashes…so, what’s wrong with the lighter page?
It looks to me like the Jericho parser can parse the HTML you gave it. The error you’re getting arises because you’ve made an incorrect assumption about what the
getAllElements()method returns.I admit I could only find the Javadoc for the zero-argument overload of this method, as opposed to the one-argument overload that you’re using, so I’ll have to assume that both methods return the same type,
List<Element>. In your example, there are nocenterelements in the HTML, so thegetAllElements()method should return an emptyList<Element>. It doesn’t have to return anArrayList<Element>here; any implementation ofList<Element>will do. In this case, it chooses to return aCollections.emptyList(). This isn’t anArrayList<Element>, and you get aClassCastExceptionbecause you cannot cast this to anArrayList<Element>.As far as I can see, you have two options:
Firstly, you might not need the returned list to be an
ArrayList<Element>. It might be sufficient to useList<Element>instead. In this case, you should replace the linewith
Secondly, if you really do need the list to be an
ArrayList<Element>, then you can create anArrayList<Element>from the results: