I’m using python 2.7 and BeautifulSoup . I need to find an acronym such

Question

0

Asked: June 13, 20262026-06-13T14:28:15+00:00 2026-06-13T14:28:15+00:00

I’m using python 2.7 and BeautifulSoup . I need to find an acronym such

0

I’m using python 2.7 and BeautifulSoup.
I need to find an acronym such as abc or a.b.c. and avoid false positive like qweabcrty. The pattern can be at the beginning of the string, at the end, can have space, quote, double quotes, hyphen (and so on) right before and after but not an alphanumeric character.

I come to this regex

[^\w]?a\.?b\.?c\.?[^\w]?

That is ok for

abc
a.b.c.
blah (abc)
abc-blah
blah-abc
blah abc blah
blah-abc-blah

But it is also found (and I don’t want to):

qweabcrty

If I remove the ? after both [^\w] it will not find anymore case 1, 2, 4 and 5, because it expects to find something before and/or after.

Long story short, how can I specify this:
abc can be anywere in the string BUT IF there is a character before and/or after it must not be an alphanumeric one.

The python code looks like:

import re
from bs4 import BeautifulSoup, SoupStrainer

html = """
<html>
 <a>abc</a>
 <a>a.b.c.</a>
 <a>blah (abc)</a>
 <a>abc-blah</a>
 <a>blah-abc</a>
 <a>blah abc blah</a>
 <a>blah-abc-blah</a>
 <a>qweabcrty</a>
</html>"""

links = BeautifulSoup(html, "lxml", parse_only=SoupStrainer(["a"]))

tags = links.find_all("a", text = re.compile("[^\w]?a\.?b\.?c\.?[^\w]?", re.I))
print tags

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-13T14:28:16+00:00

Try using the word boundary (\b) metacharacter:

html = """
<html>
 <a>abc</a>
 <a>a.b.c.</a>
 <a>blah (abc)</a>
 <a>abc-blah</a>
 <a>blah-abc</a>
 <a>blah abc blah</a>
 <a>blah-abc-blah</a>
 <a>qweabcrty</a>
</html>"""

import re
print re.sub(r'\b(abc|a\.\b.\.c)\b', '@@@', html)

prints

<html>
 <a>@@@</a>
 <a>@@@.</a>
 <a>blah (@@@)</a>
 <a>@@@-blah</a>
 <a>blah-@@@</a>
 <a>blah @@@ blah</a>
 <a>blah-@@@-blah</a>
 <a>qweabcrty</a>
</html>

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m using python 2.7 and BeautifulSoup . I need to find an acronym such

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply