I’m using a ‘rolling’ cURL multi implementation (like this SO post, based on this cURL code). It works fine to process thousands of URLs using up to 100 requests at the same time, with 5 instances of the script running as daemons (yeah, I know, this should be written in C or something).
Here’s the problem: after processing ~200,000 urls (across the 5 instances) curl_multi_exec() seems to break for all instances of the script. I’ve tried shutting down the scripts, then restarting, and the same thing happens (not after 200,000 urls, but right on restart), the script hangs calling curl_multi_exec().
I put the script into ‘single’ mode, processing one regular cURL handle at time, and that works fine (but it’s not quite the speed I need). My logging leads me to suspect that it may have hit a patch of slow/problematic connections (since every so often it seems to process on URL then hang again), but that would mean my CURLOPT_TIMEOUT is being ignored for the individual handles. Or maybe it’s just something with running that many requests through cURL.
Anyone heard of anything like this?
Sample code (again based on this):
//some logging shows it hangs right here, only looping a time or two
//so the hang seems to be in the curl call
while(($execrun =
curl_multi_exec($master, $running)) == CURLM_CALL_MULTI_PERFORM);
//code to check for error or process whatever returned
I have CURLOPT_TIMEOUT set to 120, but in the cases where curl_multi_exec() finally returns some data, it’s after 10 minutes of waiting.
I have a bunch of testing/checking yet to do, but thought maybe this might ring a bell with someone.
After much testing, I believe I’ve found what is causing this particular problem. I’m not saying the other answer is incorrect, just in this case not the issue I am having.
From what I can tell,
curl_multi_exec()does not return until all DNS (failure or success) is resolved. If there are a bunch of urls with bad domainscurl_multi_exec()doesn’t return for at least:Here’s someone else who has discovered this:
Also, testing shows that cURL (at least my version) does follow the
CURLOPT_CONNECTTIMEOUTsetting. Of course the first step of a multi cycle may still take a long time, since cURL waits for every url to resolve or timeout.