I am tring parse html use simple html and remove page menu and footer (In example, i choose http://codex.buddypress.org/developer-docs/the-bp-global/, and then may be other url.). But my code return Fatal error: Call to a member function find() on a non-object , where is wrong? Thanx.
require('simple_html_dom.php');
$webch = curl_init();
curl_setopt($webch, CURLOPT_URL, "http://codex.buddypress.org/developer-docs/the-bp-global/");
curl_setopt($webch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($webch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 GTB5');
$htmls = curl_exec($webch);
curl_close($webch);
$html = str_get_html($htmls);
$html = preg_replace('#<div(.*?)id="(.*?)head(.*?)"(.*?)>.*</div>#is', '', $html);
$html = preg_replace('#<div(.*?)class="(.*?)head(.*?)"(.*?)>.*</div>#is', '', $html);
$html = preg_replace('#<div(.*?)id="(.*?)menu(.*?)"(.*?)>.*</div>#is', '', $html);
$html = preg_replace('#<div(.*?)class="(.*?)menu(.*?)"(.*?)>.*</div>#is', '', $html);
$html = preg_replace('#<div(.*?)id="(.*?)foot(.*?)"(.*?)>.*</div>#is', '', $html);
$html = preg_replace('#<div(.*?)class="(.*?)foot(.*?)"(.*?)>.*</div>#is', '', $html);
foreach($html->find('a') as $element){
echo $element.'<hr />';
}
str_get_htmlseems like it is a function from an HTML DOM Parser. What it returns is anything but a string, and that’s what you’re treating as. Thepreg_replaceexpects a string as input and returns a string, which is then set to$html.Your problem is that you are then calling
$html->find, this means that you are expecting$htmlto be an object, as the one returned bystr_get_html, but it is not because you just assigned it to a string, returned bypreg_replace.What you probably want is either one of these two things:
preg_replace), before doing it$html = str_get_html($htmls);. After that statement, it is no longer a string and any processing you do will be useless and wrong.$html->find('div.menu')->class = '';for example.I would recommend the second point (if it is what you want), because HTML processing using regular expressions is not a really good idea.