I’ve seen a number of questions about removing HTML tags from strings, but I’m

Question

0

Asked: June 3, 20262026-06-03T17:49:29+00:00 2026-06-03T17:49:29+00:00

I’ve seen a number of questions about removing HTML tags from strings, but I’m

0

I’ve seen a number of questions about removing HTML tags from strings, but I’m still a bit unclear on how my specific case should be handled.

I’ve seen that many posts advise against using regular expressions to handle HTML, but I suspect my case may warrant judicious circumvention of this rule.

I’m trying to parse PDF files and I’ve successfully managed to convert each page from my sample PDF file into a string of UTF-32 text. When images appear, an HTML-style tag is inserted which contains the name and location of the image (which is saved elsewhere).

In a separate portion of my app, I need to get rid of these image tags. Because we’re only dealing with image tags, I suspect the use of a regex may be warranted.

My question is twofold:

Should I use a regex to remove these tags, or should I still use an HTML parsing module such as BeautifulSoup?
Which regex or BeautifulSoup construct should I use? In other words, how should I code this?

For clarity, the tags are structured as <img src="/path/to/file"/>

Thanks!

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-03T17:49:36+00:00

I would vote that in your case it is acceptable to use a regular expression. Something like this should work:

def remove_html_tags(data):
    p = re.compile(r'<.*?>')
    return p.sub('', data)

I found that snippet here (http://love-python.blogspot.com/2008/07/strip-html-tags-using-python.html)

edit: version which will only remove things of the form <img .... />:

def remove_img_tags(data):
    p = re.compile(r'<img.*?/>')
    return p.sub('', data)

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’ve seen a number of questions about removing HTML tags from strings, but I’m

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply