I’ve got strings that contain a tracking-string that i want to remove. Regular expressions seemed to be the best solution but i can’t figure a regular expression that will work.
Example URLs:
- http://example.com?tracking=foo
- http://example.com/bar.html?tracking=foo
- http://example.com?tracking=foo¶m=baz
- http://example.com/bar.php?param=baz&tracking=foo
tracking=foo should be removed where foo can be pretty much anything except &, URLs without tracking shouldn’t be touched.
The best shot i got working is /(http:\/\/[^?]*?.*)tracking=[^&]*&?(.*?["|\'])/i but it matches too much with the [^&]*-part thus eliminating everything behind the link if there isn’t a second parameter on the URL after the tracking string.
And i’m using it like this at the moment $html contains the whole html for the page to be output and i want to remove the tracking from all urls within:
$html = preg_replace($pattern, '$1$2', $html);
So the minimum the $html would contain would be something like this:
<body>
<a href="[one of the examples above]">Some Link</a>
</body>
You should do this by parsing the URL, using
parse_urlandparse_str. It makes things much easier than using a regular expression.Now you just have to rebuild the string using the parts in
$url_partsand the rest of the params in$params. You can do this withhttp_build_query.Try something like this, although I haven’t tested it so it will need some modifications:
For your specific use-case, I would use PHP’s
DOMDocumentclass to parse the HTML, then grab all of the URLs from that, then use the above to remove the tracking parameter. However, if you must use a regular expression, you can use a generic regular expression to find just URLs, then apply the above to each URL you find usingpreg_replace_callback.