I want to make something like readability, which extracts only the article text from

Question

0

Asked: May 27, 20262026-05-27T22:56:47+00:00 2026-05-27T22:56:47+00:00

I want to make something like readability, which extracts only the article text from

0

I want to make something like readability, which extracts only the article text from any page and removes everything else…

I am using file_get_contents to get a webpage and this works fine.

After I get that, how can I extract out just the main article text using PHP?

Are there any plugins or is there a way to do it?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-27T22:56:48+00:00

Editorial Team

2026-05-27T22:56:48+00:00Added an answer on May 27, 2026 at 10:56 pm

There are many libraries that help you parse HTML, and more than a few questions on SO that cover them (such as this one), but that’s not your biggest problem.

Your issue is going to be how to determine what exactly is the main article. You could potentially determine what element has the most <p> tags as children, but there’s no reason I can’t make a CMS that doesn’t use <p> tags at all.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I want to make something like readability, which extracts only the article text from

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply