I have a string of text that contains html with all different types of links (relative, absolute, root-relative). I need a regex that can be executed by PHP’s preg_replace to replace all relative links with root-relative links, without touching any of the other links. I have the root path already.
Replaced links:
<tag ... href="path/to_file.ext" ... > ---> <tag ... href="/basepath/path/to_file.ext" ... >
<tag ... href="path/to_file.ext" ... /> ---> <tag ... href="/basepath/path/to_file.ext" ... />
Untouched links:
<tag ... href="/any/path" ... >
<tag ... href="/any/path" ... />
<tag ... href="protocol://domain.com/any/path" ... >
<tag ... href="protocol://domain.com/any/path" ... />
If you just want to change the base URI, you can try the
BASEelement:But note that changing the base URI affects all relative URIs and not just relative URI paths.
Otherwise, if you really want to use regular expression, consider that a relative path like you want must be of the type path-noscheme (see RFC 3986):
So the begin of the URI must match:
But please use a proper HTML parser for parsing the HTML an build a DOM out of that. Then you can query the DOM to get the
hrefattributes and test the value with the regular expression above.