I’m using BeautifulSoup to escape all of the HTML tags (except for a set

Question

0

Asked: May 28, 20262026-05-28T13:24:49+00:00 2026-05-28T13:24:49+00:00

I’m using BeautifulSoup to escape all of the HTML tags (except for a set

0

I’m using BeautifulSoup to escape all of the HTML tags (except for a set of pre-approved tags, like a) from an arbitrary set of text. However, I only want it to escape the tags if they are actual valid HTML tags. If something looks like a tag, but isn’t, it ends up adding some HTML to close it off, which I don’t want.

Example: If someone enters in the text <integer>, my code ends up spitting out <integer></integer> instead of just <integer>

Here’s the code (value is the HTML string and VALID_TAGS is just a list of acceptable tag names).

soup = BeautifulSoup.BeautifulSoup(
  value, convertEntities=BeautifulSoup.BeautifulSoup.HTML_ENTITIES)
# Loop through all the tags. If it is invalid, escape the characters.
for tag in soup.findAll():
  if tag.name not in VALID_TAGS:
    tag.replaceWith(cgi.escape(str(tag)))
return soup.renderContents()

Thanks in advance.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-28T13:24:50+00:00

Figured this out using html5lib based on this answer as a starting point. Here’s a version of what I ended up with that does the same thing as the BeautifulSoup code I started with above, except works properly for the <integer> case I described:

p = html5lib.HTMLParser(tokenizer=sanitizer.HTMLSanitizer, tree=treebuilders.getTreeBuilder("dom"))
dom_tree = p.parseFragment(value)
walker = treewalkers.getTreeWalker("dom")
stream = walker(dom_tree)
s = serializer.htmlserializer.HTMLSerializer(quote_attr_values=True)
return s.render(stream)

Thanks to everyone who helped.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m using BeautifulSoup to escape all of the HTML tags (except for a set

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply