I am trying to get the results of this code this way:
title: Ben 10 Ultimate Alien
comment:taseen_shafquattaseen_shafquat : is there go na a season 4 for
this seriestitle: Akira
comment: dragon3476dragon3476 : one of my most fav animations
excellent bit o work and about my 300th watch , i still got the
orginal poster from when it came out + dvd and vid and even the
t-shirt so yeah i couldn’t say anything bad about such a great
animation 5/5
But, I get it this way instead:
title: Ben 10 Ultimate Alien
title: taseen_shafquattaseen_shafquat : is there go na a season 4 for
this seriestitle: Akira
title: dragon3476dragon3476 : one of my most fav animations excellent
bit o work and about my 300th watch , i still got the orginal poster
from when it came out + dvd and vid and even the t-shirt so yeah i
couldn’t say anything bad about such a great animation 5/5
Code
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.nodes.Node;
import org.jsoup.select.Elements;
import java.io.*;
import java.util.List;
public class WebScraper {
public static void main(String[] args) throws Exception {
String url = "http://www.1channel.ch/latest_comments.php";
Document doc = Jsoup.connect(url).get();
for (Element E : doc.select("div.latest_comments > a, div.latest_comments > p")) {
System.out.print("title: "+ E.getElementsByTag("a").text());
System.out.println( E.getElementsByTag("p").text());
// System.out.println(T);
System.out.print("\n");
try
{
PrintWriter out = new PrintWriter(new BufferedWriter(new FileWriter("/Users/samualdoku/Desktop/Twitter/scraped.txt", true)));
out.println(E.text());
out.close();
} catch (IOException e) {
}
}
}
}
And this is the html I am trying to scrape. I think the problem lies with the href tag inside the span. It contains the commenter usernames. I called getElementsByTag("a") for the title because the title is within an anchor tag. How do I get rid of the span tag, because it prints the title in front of the user names which shouldn’t be so.
<div class="latest_comments com_class_tv">
<a href="/tv-2733767-Dallas/season-1-episode-3">Dallas</a>
( 6 minutes ago )
<p>
<span class="latest_comments_poster">
<a href="/profile/jowar">jowar</a>
:
</span>
i just started watchin...eeing as its 34nyrs old
</p>
</div>
Try this