Need to pull strings between href attribute tags in Python using the re module.

Question

0

Editorial Team

Asked: May 27, 20262026-05-27T18:34:06+00:00 2026-05-27T18:34:06+00:00

Need to pull strings between href attribute tags in Python using the re module.

0

Need to pull strings between href attribute tags in Python using the re module.

I’ve tried numerous patterns such as:

patFinderLink = re.compile('\>"(CVE.*)"\<\/a>')

Example: I need to pull what is between the tags (in this case “CVE-2010-3718“) from:

<pre>
<a href="https://www.redhat.com/security/data/cve/CVE-2010-3718.html">CVE-2010-3718</a>
</pre>

What am I doing wrong here? Any advice is greatly appreciated. Thank you in advance.

Sun

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-27T18:34:07+00:00

I am surprised no one suggested to use BeautifulSoup:

here is how I would do it :

from BeautifulSoup import BeautifulSoup
import re

hello = """
<pre>
<a href="https://www.redhat.com/security/data/cve/CVE-2010-3718.html">CVE-2010-3718</a>
<a href="https://www.redhat.com/security/data/cve/CVE-2010-3710.html">CVE-2010-3718</a>
<a href="https://www.redhat.com/security/data/cve/CVE-2010-3700.html">CVE-2010-3718</a>
</pre>
"""

target = re.compile("CVE-\d+-\d+.html")
commentSoup = BeautifulSoup(hello)
atags = commentSoup.findAll(href=target)
for a in atags:
    match = re.findall(target, a['href'])[0]
    print match

Result:

CVE-2010-3718.html
CVE-2010-3710.html
CVE-2010-3700.html

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Need to pull strings between href attribute tags in Python using the re module.

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply