When I scrape a site using jsoup I am getting extra values that I do not want to recieve.
I only want to recieve his name not his team and position. Currently it is also scraping the position and team. I only want to recieve the name.
Page Source:
<td class="playertableData">5</td><td class="playertablePlayerName" id="playername_515" style=""><a href="" class="flexpop" content="tabs#ppc" instance="_ppc" fpopHeight="357px" fpopWidth="490px" tab="null" leagueId="0" playerId="515" teamId="-2147483648" cache="true">Derrick Rose</a>, Chi PG<a href="" class="flexpop" content="tabs#ppc"
My Code:
while (tdIter.hasNext()) {
int tdCount = 1;
Element tdEl = tdIter.next();
name = tdEl.getElementsByClass("playertablePlayerName")
.text();
Elements tdsEls = tdEl.select("td.playertableData");
Iterator<Element> columnIt = tdsEls.iterator();
namelist.add(name);
OUTPUT:
name: Derrick Rose, Chi PG
You are doing it wrong. By the line,
you will get the complete text of the with class=”playertablePlayerName” which includes an anchor tag and a plane text outside any tag. Means, you will get
Which is your output. To solve this issue, you must include the condition for th anchor tag too. Try using the belove line as a replacement.
You can traverse through the child of the td you have already got. When you get correct tag, use the chained text() method.
Feel free to ask if you have any doubt.