Possible Duplicate: Strip html from strings in python RegEx match open tags except XHTML

Question

0

Editorial Team

Asked: May 27, 20262026-05-27T17:57:01+00:00 2026-05-27T17:57:01+00:00

Possible Duplicate: Strip html from strings in python RegEx match open tags except XHTML

0

Possible Duplicate:
Strip html from strings in python
RegEx match open tags except XHTML self-contained tags

I have the regex pattern in my python module which removes the html tags from the given string.

It doesn’t work for this case.

Input string:

string=<li class="
      tal
    "><h3><a href="/aclk?sa=l&amp;ai=CoS4y-Wz0TrnqC8y0rAfysK2DB46PiJECzoK8_yKPwd4FCAAQAigCUL7Kz4P9_____wFg5erjg5gOoAH0m_XuA8gBAakCoqvilYNWVD6qBB1P0Dm6CNzrf62IC36fDvUIh77EpeheIRdH_YEaPw&amp;sig=AOD64_2z9xPK8vOxUCpIGTjBcc2Lg-GAeA&amp;adurl=http://www.policybazaar.com/creditcards/creditcard-india.aspx%3Futm_source%3Dgoogle%26utm_medium%3Dppc%26utm_term%3DCreditcard_delhi_only%26utm_campaign%3Dcredit_card" id="pa2">Compare <b>Credit Cards</b> | PolicyBazaar.com</a></h3>Get Best <b>Credit Card</b> For Free, Now U Have a Choice, Choose wisely!<br /><cite>www.policybazaar.com/<b>credit</b>-<b>Cards</b></cite></li>

regex pattern:

 In [64]:p = re.compile(r'<.*?>')
 In [65]:text=p.sub('',str(string))
 In [66]: text
 Out[66]: '<li class="\n          tal\n        ">Compare Credit Cards | PolicyBazaar.comGet Best Credit Card For Free, Now U Have a Choice, Choose wisely!www.policybazaar.com/credit-Cards'

The result has the <li> tag still. How it should be removed irrespective of this class name and string pattern.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-27T17:57:01+00:00

Editorial Team

2026-05-27T17:57:01+00:00Added an answer on May 27, 2026 at 5:57 pm

In that case you should use the DOTALL functionality:

p = re.compile(r'<.*?>',re.DOTALL)

should work.

But… you should not use regexes for HTML parsing, see this: https://stackoverflow.com/a/1732454/11621

HTH.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Possible Duplicate: Strip html from strings in python RegEx match open tags except XHTML

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply