I am building an rss feed discovery service by scraping a page URL and finding the <link> tags in the page header. The problem is some URLs take really long to serve the page source so my code gets stuck at file_get_contents($url) very often.
Is there a way to do this with a predefined timeout, for example if 10 seconds have passed and there is still no content served then simply drop that URL and move to the next one?
I was thinking to use the maxLen parameter to get only a part of the source (<head>..</head>) but I’m not sure if this would really stop after the received bytes are reached of would still require the full page load. The other issue with this is that I have no idea what value to set here because every page has different content in the head tag so sizes vary.
I’ve just been reading about this, so this is theory only right now.. but..
This is the function definition, notice the resource context part:
If you specify the result of a
stream_context_create()function and pass it the timeout value in it’s options array, it just might work.Or you could create the stream and set it’s timeout directly:
Hope you have some success with it.