I’ve just started a small library which will need to screen scrape from various URLs and search for specified strings. To improve performance, I want to cache the contents of the retrieved page (for the duration of the request, so in-memory).
I’ve currently got this:
class Scraper {
private $CI;
private $Cache;
function __construct() {
$this->CI =& get_instance();
$Cache = array();
}
public function GetPage($Url) {
if(!isset($Cache[$Url])) {
dump("Retrieving");
$Cache[$Url] = "DATA";//file_get_contents($Url);
}
return $Cache[$Url];
}
public function FindString($Url, $String) {
$Contents = $this->GetPage($Url);
$Ret = (strpos(strtolower($Contents), strtolower($String)) !== false);
return $Ret;
}
}
NB: To improve performance while debugging, I’m just dumping “DATA” into the cache rather than fetching the page.
Now, I’ve got a loop which repeatedly calls FindString() with the same URL.
I’d expect the first call to print out “retrieving” and after that, see nothing else. In fact, I see “retrieving” repeatedly.
I suspect I’ve got a scoping issue somewhere – either the library itself isn’t a singleton so each call to FindString reaches a unique instance – or the Cache variable is being reinitialised somehow.
Can someone please suggest next steps for debugging.
(dump() just formats stuff nicely for me)
You are missing an
$thison all places where you access the instance variable$Cache. The code should be: