I want to extract all the urls from an XML file, excludeing the the

Question

Editorial Team

Asked: June 8, 20262026-06-08T03:47:35+00:00 2026-06-08T03:47:35+00:00

I want to extract all the urls from an XML file, excludeing the the tracking code in the url:

Here’s an example of a URL, they all follow the same format

http://www.domain.com.au/category/pXXXXXX?uni_id=XXXXXX&cid=1_demo_1

So the only thing that changes between the domains is XXXXXX which is a numerical value

The end result I want is

http://www.domain.com.au/category/pXXXXXX

I have tried to use preg_replace in the below code but it ended up replacing the whole URL with a random (i think) number

$data = preg_replace('/http\:\/\/www\.domain\.com.au\/[^\?]+([^.]+)/','',$data);

You must login to add an answer.

Need An Account,

Editorial Team · Answer 1 · 2026-06-08T03:47:37+00:00

Editorial Team

Match URLs in the XML with preg_match():

preg_match("(http://[^\s]+|ftp://[^\s]+)", $input, $matches);

Then, you should use preg_replace() and should only match the part of the string that needs to be removed:

foreach($matches as $value)
{
    preg_replace("(\?[^\s]+)","",$value);
}

The Archive Base Latest Questions