I need to parse many html files using php.
foreach($url_array as $url){
$file = file_get_contents($url);
parse_html($file);
}
For some reasons (file is too big), function parse_html() take very long time to run or has memory leak in it.
I want to monitor function parse_html(). If the running time exceed a given time, should continue to parse the next url and disregard the current one.
For most of the time, my codes runs great but there are some urls can not be parsed. There is no error output and I guess it is memory leak.
This can not be done as easily as you think. Since you are running on one thread only, you cannot have any checks. If this thread is blocking, it is blocking.
You need to create some sort of multi-threaded environment where you run one worker thread for the execution of
parse_html()(to increase speed and take advantage of multi-core processors you could even spawn more worker threads) and another thread that checks and kills the workers if they are taking too much time.