I went to a PHP job interview, I was asked to implement a piece of code to detect visitors are bots to crawl thru the website and steal content.
So I implemented a few lines of code to detect if the site is being refreshed/visited too quickly/often by using a session variable to store last visit timestamp.
I got told that session varaibles can be manupilated by cookies etc, so I am wondering if there is a application variable that I can use to store the timestamp information against visitor IPs eg $_SERVER[REMOTE_ADDR]?
I know that I can write the data to a file but it’s not very good for a high traffic website.
Regards
James
Just to be clear, clients can’t edit session variables to their liking. They can delete or change PHPSESSID, however, which grants another session. Global variables (ie.
$_SERVER) are not persistent, so any changes you make to them will not make it to the next page load.The best way to go about detecting crawlers is to store the IP address, user-agent and timestamp of all page loads in a database. The overhead is miniscule.