I need to write a web crawler, and want to be able to crawl using a known user agent. For example, I want my crawler to act as an iphone to crawl the mobile site of a website, then crawl again using Mozilla PC agent, etc.
That way, Ill be able to crawl every “type” of site (mobile & PC). However, I also want to be able to set my crawler’s user agent, so webmasters also see in their stats that it’s a crawler that visited their whole website, not real users.
So my question is, do you guys know how to set a mobile agent + a crawler agent at the same time, in PHP? Is it even possible?
Please refer to RFC1945 for how a User Agent should be formed:
So what you put there is more or less up to you. You could pose to be a GoogleBot-Mobile:
or pose as an iPhone and add your own stuff