I know how to parse RSS feeds, but how can one read the articles ? Do I have to scrape the website? Or is there an alternative for parsing the article in java?
Thanks in advance
Edit:
I decided to use jSoup.
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
Well the RSS feed (presumably) contains URLs that are the links to the articles, so it (presumably) boils down to what you mean by “read”.
If you simply need to fetch them, then use
URL.getInputStreamor some other HTTP client library.If you want to display the news article pages for the end user to read, then you just need to open the URL in the native browser.
If you want to extract the article text, then yes you do need to parse the HTML either using a proper HTML parser or (blech!) using kludgey text pattern recognition that ignores the HTML structure.