I’m building an app in python, and I need to get the URL of all links in one webpage. I already have a function that uses urllib to download the html file from the web, and transform it to a list of strings with readlines().
Currently I have this code that uses regex (I’m not very good at it) to search for links in every line:
for line in lines: result = re.match ('/href='(.*)'/iU', line) print result
This is not working, as it only prints ‘None’ for every line in the file, but I’m sure that at least there are 3 links on the file I’m opening.
Can someone give me a hint on this?
Thanks in advance
Well, just for completeness I will add here what I found to be the best answer, and I found it on the book Dive Into Python, from Mark Pilgrim.
Here follows the code to list all URL’s from a webpage:
Thanks for all the replies.