I am building a news reader and I have an option for users to

Question

0

Asked: June 4, 20262026-06-04T17:37:55+00:00 2026-06-04T17:37:55+00:00

I am building a news reader and I have an option for users to

0

I am building a news reader and I have an option for users to share article from blog, website, etc. by entering link to page. I am using two methods for now to determine the content of page:

I am trying to extract rss feed link from page user entered and then match that url in feed to get right item.
If site doesn’t cointain feed or it’s malformed or entered address differes from item link in rss(which is in about 50% cases if not more) I try to find og meta tags, and that works great but only bigger sites have that, smaller sites and blogs usually have even same meta description for whole website.

I am wondering how for example Google does it? When website doesn’t cointain meta description Google somehow determines by itself what is content on page for their search results.

I am using HtmlAgilityPack to extract stuff from pages and my own methods to clean html to text.

Can someone explain me the logic or best approach to this, If I try to crawl it directly from top I usually end up with content from sidebar, navigation etc.?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-04T17:37:56+00:00

Editorial Team

2026-06-04T17:37:56+00:00Added an answer on June 4, 2026 at 5:37 pm

I ended up using Boilerpipe which is written in JAVA,imported it using IKVM and it works well for pages that area formated correctly, but it still has troubles with some pages where content is scattered.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am building a news reader and I have an option for users to

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply