I’ve been breaking my head with a hammer to figure this out but here goes. I’m currently scraping some pages that I get from various source and the URLs often have Google Analytics crap attached to the end of it, in this fashion:
&utm_medium=something&utm_source=other
And I’m trying to get rid of those from a URL. Since these are appended at the end of a URL, I do this:
$pattern = "^utm_source.*^";
$interUrl = preg_replace($pattern, '', $url);
utm_source is a required portion of the URL for google analytics. Here’s my problem shows up. For some reason, I can’t get the pattern to match an ampersand like so: “^\&utm_source.*^”. Without the ampersand (and its escape), I get matches. So I thought “no biggie, I’ll just to a substr” like so:
$finalUrl = substr($interUrl, 0, strlen($interUrl) - 1);
But nothing happens. I increased the -1 number to -3 or even -4 but nothing got cut off, not even characters after the ampersand. I’ve also tried str_replace and even rtrim but none could filter out the ampersand. This is frustrating since I am left with the wrong URL. Not only that, when I try to curl the page, I get a 404 while if I go to that site via my browser, i get redirected to the right page.
Any ideas on why this is happening?
ANSWER
While all the answers were nice and technical, I kept trying shit with the regex until I figured something out. The URLs were, for some reason (probably my retrieval method), being encoded so I ended up tweaking the regex like so:
$pattern = "/&utm_source.*/";
and it works! Thanks for everyone’s help!
While all the answers were nice and technical, I kept trying shit with the regex until I figured something out. The URLs were, for some reason (probably my retrieval method), being encoded so I ended up tweaking the regex like so:
And it works.
Why didn’t I catch it earlier? I’m running my app on laravel and whenever I use the logging system, it seems to use an actual ampersand instead of & thus it seemed like all was well.
At one point, I went to check the database as to what was happening and noticed that my URLs were ending with & instead of with & (it showed up this way on my view).
Thanks everyone!