I have a url grabber setup and it was working fine. It grabs the url of a doc that is in a response header such as:
<script type='text/javascript' language='JavaScript'>
document.location.href = 'http\x3a\x2f\x2fcms.example.com\x2fd\x2fd\x2fworkspace\x2fSpacesStore\x2f61d96949-b8fb-43f1-adaf-0233368984e0\x2fFinancial\x2520Agility\x2520Report.pdf\x3fguest\x3dtrue'
</script>
Here is my grabber script.
<?php
set_time_limit(0);
$target_url = $_POST['to'];
$html =file_get_contents($target_url);
$pattern = "/document.location.href = '([^']*)'/";
preg_match($pattern, $html, $matches, PREG_OFFSET_CAPTURE, 3);
$raw_url = $matches[1][0];
$eval_url = '$url = "'.$raw_url.'";';
eval($eval_url);
echo $url;
We had to add a variable to our doc management system so each doc url needed ?guest=true on the end of the url. When we did this my grabber returned the full url and appends that to the filename. So I tried to have it grab just the url until it hit /guest=true. With this code:
<?php
set_time_limit(0);
$target_url = $_POST['to'];
$html =file_get_contents($target_url);
$pattern = "/document.location.href = '([^']*)\x3fguest\x3dtrue'/";
preg_match($pattern, $html, $matches, PREG_OFFSET_CAPTURE, 3);
$raw_url = $matches[1][0];
$eval_url = '$url = "'.$raw_url.'";';
eval($eval_url);
echo $url;
Why isn’t it returning the url up until the ?guest=true part? aka why doesn’t this work? and what’s the fix?
This is the solution. You’ll get the match directly, not in group.
You can check out the results here.
The problem with your regex was in the fact that you did not escape certain characters in the string (
.and\) that you wanted to catch literary. Furthermore, you do not need to usePREG_OFFSET_CAPTUREand offset of3. I guess you copied these values from the example on this page.Here’s an explanation of the regex pattern:
This answer has been edited to reflect updates to the question.