I am working on Android application that parses a website but I can’t seem to get Jsoup to work.
I am trying to parse this html:
My code just now is:
Document doc = null;
try{
doc = Jsoup.connect("URL").get();
Elements tds = doc.select("table.tr>td");
for (Element td : tds) {
String tdText = td.text();
System.out.println(tdText);
}
}
At the moment it does not return anything but if I print ‘doc’ it return the whole website.
I am trying to extract the following information:
Drower, E. S. (Ethel Stefana), Lady, b. 1879, With or without the  .
But I can’t seam to get it to work.
Thanks for your help!
You got the selector wrong: it picks
tdchildren of atableelement with classtr, while you probably wanttdcells intrrows in atable. I believe you could get at them just by using"td"as selector.However, that’s a bit too generic, since it’s going to pick every cell in the table. If the cell you need is always the third cell in the rows of that table, you can refine the selector to pick only those:
"td:eq(2)". You should really get a knack of JSoup selectors, and experiment a little bit to see how much you are able to restrict the data extracted from the document to just the elements you really need.To obtain the text after the
<script>element in the fourth cell you could use something along the following snippet:because, from a little experiment of mine, it seems that JavaScript code inside
<script>tags is skipped when asking the text of an element that contains one of those.You would use a
forloop rather thanfirst, though, since there are as many fourth cells as there are rows in your document, and you got a lot of them.