I am building an rss feed discovery service by scraping a page URL and

Question

0

Asked: June 6, 20262026-06-06T05:00:49+00:00 2026-06-06T05:00:49+00:00

I am building an rss feed discovery service by scraping a page URL and

0

I am building an rss feed discovery service by scraping a page URL and finding the <link> tags in the page header. The problem is some URLs take really long to serve the page source so my code gets stuck at file_get_contents($url) very often.

Is there a way to do this with a predefined timeout, for example if 10 seconds have passed and there is still no content served then simply drop that URL and move to the next one?

I was thinking to use the maxLen parameter to get only a part of the source (<head>..</head>) but I’m not sure if this would really stop after the received bytes are reached of would still require the full page load. The other issue with this is that I have no idea what value to set here because every page has different content in the head tag so sizes vary.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-06T05:00:51+00:00

I’ve just been reading about this, so this is theory only right now.. but..

This is the function definition, notice the resource context part:

string file_get_contents ( string $filename [, bool $use_include_path = false [, **resource $context** [, int $offset = -1 [, int $maxlen ]]]] )

If you specify the result of a stream_context_create() function and pass it the timeout value in it’s options array, it just might work.

$context = stream_context_create($opts);

Or you could create the stream and set it’s timeout directly:

http://www.php.net/manual/en/function.stream-set-timeout.php

Hope you have some success with it.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am building an rss feed discovery service by scraping a page URL and

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply