i’m using JSoup to parse a webpage like this, and make it into two string arrays, one for each of the items text values (to be displayed in a ListActivity) and one for the links. some of these text values have special characters which jsoup has trouble parsing. at first i used:
Document doc = Jsoup.connect(URL).get();
maintable = doc.select(".kader").first();
to get the element for the table with the content. in another thread here someone said it would work using Jsoup.parse(html), so i changed it to this:
Document doc = Jsoup.connect(URL).get();
Document DOC = Jsoup.parse(doc.html());
if(doc.select(".kader") != null){
maintable = DOC.select(".kader").first();
}
however this did not seem to work either. so i left that as something later to solve (here perhaps) but it is not my main problem.
if i try to get a String array of all the links displayed in the main content i would use this method:
public String[] getTranslationLinks(){
String[] items = new String[alllinks.size()];
Element tempelement;
for(int i = 0;i<items.length;i++){
tempelement = alllinks.get(i);
items[i] = tempelement.attr("abs:href");
}
return items;
}
the debugger says that tempelement contains the proper element, but for some reason the .attr(“abs:href”) doesnt return the link as requested. tempelement would for instance contain:
<a href="./vertaling.php?id=6518" target="_top" title="">Hoofdstuk 3, tekst A: Herakles de slaaf</a>
but the .attr(abs:href) returns “”.
do any of you know a way to solve these problems?
Your best bet is to create a small compilable and runnable bit of code that demonstrates your problem, an SSCCE. For instance, when I created my SSCCE based on my interpretation of your problem, it seemed to work. This was the code:
And this was the output: