I am dealing a problem to write a python regex ‘not’to identify a certain pattern within href tags.
My aim is to replace all occurrences of DSS[a-z]{2}[0-9]{2} with a href link as shown below,but without replacing the same pattern occurring inside href tags
Present Regex:
replaced = re.sub("[^http://*/s](DSS[a-z]{2}[0-9]{2})", "<a href=\"http://test.com=\\1\">\\1</a>", input)
I need to add this new regex using an OR operator to the existing one I have
EDIT:
I am trying to use regex just for a simple operation. I want to replace the occurrences of the pattern anywhere in the html using a regex except occurring within<a><\a>.
The answer to any question having regexp and HTML in the same sentence is here.
In Python, the best HTML parser is indeed Beautilf Soup.
If you want to persist with regexp, you can try a negative lookbehind to avoid anything precessed by a
". At your own risk.