I’m trying to get a url from html code using regular expressiones. I don’t know too much about regex, so I’m a bit confused because it is not working. This is the case:
<a href="cotizacion-valor/abengoa/" style="text-decoration:none;color:#006699;">ABG.MC</a>
And I’m trying to get “abengoa” using this regex:
".*cotizacion-valor\/(/w+)\/.*"
Also, I’m using python, so the code is:
regex_companies = ".*cotizacion-valor\/(/w+)\/.*"
match_companies = re.findall(regex_companies, content_web)
What is bad with my regex? Thanks
EDIT: One more question:
What can I do to get only the first match? Because this href is repeating along the document with the same content (but also with different, so I have to search all).
Your use of
/wis incorrect. You have to use\winstead of/w.