I have a url grabber setup and it was working fine. It grabs the

Question

0

Asked: June 11, 20262026-06-11T14:00:42+00:00 2026-06-11T14:00:42+00:00

I have a url grabber setup and it was working fine. It grabs the

0

I have a url grabber setup and it was working fine. It grabs the url of a doc that is in a response header such as:

<script type='text/javascript' language='JavaScript'>
document.location.href = 'http\x3a\x2f\x2fcms.example.com\x2fd\x2fd\x2fworkspace\x2fSpacesStore\x2f61d96949-b8fb-43f1-adaf-0233368984e0\x2fFinancial\x2520Agility\x2520Report.pdf\x3fguest\x3dtrue'
</script>

Here is my grabber script.

<?php

set_time_limit(0);
$target_url = $_POST['to'];
$html =file_get_contents($target_url);

$pattern = "/document.location.href = '([^']*)'/";
preg_match($pattern, $html, $matches, PREG_OFFSET_CAPTURE, 3);

$raw_url = $matches[1][0];
$eval_url = '$url = "'.$raw_url.'";';

eval($eval_url);
echo $url;

We had to add a variable to our doc management system so each doc url needed ?guest=true on the end of the url. When we did this my grabber returned the full url and appends that to the filename. So I tried to have it grab just the url until it hit /guest=true. With this code:

<?php

set_time_limit(0);

$target_url = $_POST['to'];
$html =file_get_contents($target_url);

$pattern = "/document.location.href = '([^']*)\x3fguest\x3dtrue'/";

preg_match($pattern, $html, $matches, PREG_OFFSET_CAPTURE, 3);

$raw_url = $matches[1][0];
$eval_url = '$url = "'.$raw_url.'";';

eval($eval_url);
echo $url;

Why isn’t it returning the url up until the ?guest=true part? aka why doesn’t this work? and what’s the fix?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-11T14:00:43+00:00

This is the solution. You’ll get the match directly, not in group.

set_time_limit(0);

$target_url = $_POST['to'];
$html = file_get_contents($target_url);

$pattern = '/(?<=document\.location\.href = \').*?(?=\\\\x3fguest\\\\x3dtrue)/';

preg_match($pattern, $html, $matches))

$raw_url = $matches[0];
$eval_url = '$url = "'.$raw_url.'";';

eval($eval_url);
echo $url;

You can check out the results here.

The problem with your regex was in the fact that you did not escape certain characters in the string (. and \) that you wanted to catch literary. Furthermore, you do not need to use PREG_OFFSET_CAPTURE and offset of 3. I guess you copied these values from the example on this page.

Here’s an explanation of the regex pattern:

# (?<=document\.location\.href = ').*?(?=\\x3fguest\\x3dtrue)
# 
# Assert that the regex below can be matched, with the match ending at this position (positive lookbehind) «(?<=document\.location\.href = ')»
#    Match the characters “document” literally «document»
#    Match the character “.” literally «\.»
#    Match the characters “location” literally «location»
#    Match the character “.” literally «\.»
#    Match the characters “href = '” literally «href = '»
# Match any single character that is not a line break character «.*?»
#    Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
# Assert that the regex below can be matched, starting at this position (positive lookahead) «(?=\\x3fguest\\x3dtrue')»
#    Match the character “\” literally «\\»
#    Match the characters “x3fguest” literally «x3fguest»
#    Match the character “\” literally «\\»
#    Match the characters “x3dtrue” literally «x3dtrue»

This answer has been edited to reflect updates to the question.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a url grabber setup and it was working fine. It grabs the

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply