You can't rely on for (v in someObject) ... to…

Question

0

Asked: May 12, 20262026-05-12T07:53:47+00:00 2026-05-12T07:53:47+00:00

I’m writing a special crawler-like application that needs to retrieve the main content of

0

I’m writing a special crawler-like application that needs to retrieve the main content of various pages. Just to clarify : I need the real “meat” of the page (providing there is one , naturally)

I have tried various approaches:

Many pages have rss feeds , so I can read the feed and get this page specific contnent.
Many pages use “content” meta tags
In a lot of cases , the object presented in the middle of screen is the main “content” of the page

However , these methods don’t always work , and I’ve noticed that Facebook do a mighty fine job doing just this (when you want to attach a link , they show you the content they’ve found on the link page) .

So – do you have any tip for me on an approach I’ve over looked?

Thanks!

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-12T07:53:47+00:00

Editorial Team

2026-05-12T07:53:47+00:00Added an answer on May 12, 2026 at 7:53 am

There really is no standard way for web pages to mark “this is the meat”. Most pages don’t even want this because it makes stealing their core business easier. So you really have to write a framework which can use per-page rules to locate the content you want.

0

Reply
Share
Share

- Report

How to approach applying for a job at a company ...

How to handle personal stress caused by utterly incompetent and ...

What is a programmer’s life like?

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions