How to find all words except the ones in tags using RE module? I

Question

0

Asked: June 14, 20262026-06-14T02:32:02+00:00 2026-06-14T02:32:02+00:00

How to find all words except the ones in tags using RE module? I

0

How to find all words except the ones in tags using RE module?

I know how to find something, but how to do it opposite way? Like I write something to search for, but acutally I want to search for every word except everything inside tags and tags themselves?

So far I managed this:

f = open (filename,'r')
data = re.findall(r"<.+?>", f.read())

Well it prints everything inside <> tags, but how to make it find every word except thats inside those tags?
I tried ^, to use at the start of pattern inside [], but then symbols as . are treated literally without special meaning.
Also I managed to solve this, by splitting string, using '''\= <>"''', then checking whole string for words that are inside <> tags (like align, right, td etc), and appending words that are not inside <> tags in another list. But that a bit ugly solution.

Is there some simple way to search for every word except anything that’s inside <> and these tags themselves?
So let say string 'hello 123 <b>Bold</b> <p>end</p>'
with re.findall, would return:

['hello', '123', 'Bold', 'end']

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-14T02:32:04+00:00

Editorial Team

2026-06-14T02:32:04+00:00Added an answer on June 14, 2026 at 2:32 am

Something like re.compile(r'<[^>]+>').sub('', string).split() should do the trick.

You might want to read this post about processing context-free languages using regular expressions.

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

How to find all words except the ones in tags using RE module? I

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply