I am trying to parse a page which contains some links. These links, if followed, will redirect to some files to download.
For example, <a href="http://example.com/file.php"> Download </a> which redirects to <a href="http://example.com/1.pdf".
I don’t want to download the file, I just want to get the file link (int this case http://example.com/1.pdf).
I am trying this:
curl_setopt($ch, CURLOPT_RETURNTRANSFER, FALSE); // Return in string
curl_setopt($ch, CURLOPT_URL, $url);
curl_exec($ch);
var_dump(curl_getinfo($ch));
But, it gives me the file contents.
Does anyone have any idea how to this?
==EDIT==
Thank you guys. I solved it like this:
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
curl_setopt($ch, CURLINFO_HEADER_OUT, TRUE);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_HEADER, TRUE);
curl_setopt($ch, CURLOPT_NOBODY, TRUE);
curl_exec($ch);
$info = curl_getinfo($ch);
Now, $info contains the header and I can the link from it.
The reason the output is being sent to the screen is because you’re telling cURL to do so. If you want to store the response in a variable the following line:
curl_setopt($ch, CURLOPT_RETURNTRANSFER, FALSE);should read:
curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE);Then, actually retrieve the returned output from
curl_execlike so:$output = curl_exec($ch);Once you have the returned HTML content from the remote page in the
$outputvariable you can use DOMdocs or regex (but preferably DOM) to parse out any information you want.UPDATE
I can’t tell because the question is vaguely worded: is there actually a Location header redirect happening? If so, you’ll want to do as @heiko suggests to prevent cURL from following the redirect and retrieve the headers. Then you can easily parse the contents of the location header: