I’m attempting to automatically download content at regular time intervals from a site requiring users to log in. The content I’m seeking to download is a small .js file (<10 kb).
As the site will display the desired data only when I’m logged in, I’m unable to simply use functions such as urlwrite (in MATLAB) to download the data.
I’m not sure whether the libcurl library in PHP would be able to solve the problem easily.
As suggested in the answer to this similar question (Fetching data from a site requiring POST data?), I’ve tried to use the Zend_Http_Client, but haven’t been able to get it to work.
In summary, I’d like help on automatically downloading URL content from a site requiring user log-in (and presumably submission of cookies).
In addition to this, I’d appreciate advice on which software is best for automated download of such data at regular time intervals.
(If you do require the exact URL I am trying to download from to test a solution, please leave a comment below.)
It depends on the type of login the site uses. If it uses HTTP authentication you use curl option CURLOPT_HTTPAUTH (see setopt, http://php.net/manual/en/function.curl-setopt.php) Otherwise, as said, you use COOKIEJAR and possible COOKIEFILE.
Another option is the standalone utility wget. The FAQ contains a nice explanation of both login methods http://wget.addictivecode.org/FrequentlyAskedQuestions#password-protected
If this is the first time you use curl: don’t forget to set CURL_RETURNTRANSFER to true (if false the content is send to stdout) and CURL_HEADER to false to get the content without headers.