Possible Duplicate:
How to parse and process HTML with PHP?
I’m trying to scrape a page with PHP using file_get_contents().
This page has some JSON wrapped in a bit of HTML. I’d like to strip out this HTML to be able to use json_decode() on the scraped string so I can deal with the JSON separately.
Is there any clean way to do that? A quick search didn’t really lead to anything.
Thanks
parsing/stripping HTML content is always a tricky one because (common?) solutions via regex might crash if the HTML markup is malformed and are painful slow btw. I would suggest using this little HTML DOM parser class:
http://simplehtmldom.sourceforge.net/
edited & added from subcomment:
Okay this is a bad one because the inline javascript is not properly wrapped with CDATA-Tags. Otherwise something like this might work: