I’ve seen this question , which is very nice and informative. However, it doesn’t

Question

0

Asked: June 7, 20262026-06-07T20:05:27+00:00 2026-06-07T20:05:27+00:00

I’ve seen this question , which is very nice and informative. However, it doesn’t

0

I’ve seen this question, which is very nice and informative. However, it doesn’t deal with a rather common scenario.

Say I need to scrape a multitude of websites (or even pages in the same domain), but the author of that website didn’t care enough for his code, and has some seriously malformed code "that kinda works". I need to take information from that website.

How do I do it in this case? Ideally without going í͞ń̡͢͡s̶̢̛á̢̕͘ń̵͢҉e̶̸̢̛.

Is it possible? Do I have to revert to RegExp?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-07T20:05:28+00:00

Editorial Team

2026-06-07T20:05:28+00:00Added an answer on June 7, 2026 at 8:05 pm

You need a DOM Parser. Php has one. And then there are some alternatives (and more… just google for them). You can even run the “garbled HTML” trhu HTML Purifier if you want.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’ve seen this question , which is very nice and informative. However, it doesn’t

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply