Are you running on OS 3.0? I saw the same…

Question

0

Asked: May 11, 20262026-05-11T11:50:45+00:00 2026-05-11T11:50:45+00:00

I’m building an app in python, and I need to get the URL of

0

I’m building an app in python, and I need to get the URL of all links in one webpage. I already have a function that uses urllib to download the html file from the web, and transform it to a list of strings with readlines().

Currently I have this code that uses regex (I’m not very good at it) to search for links in every line:

for line in lines:     result = re.match ('/href='(.*)'/iU', line)     print result

This is not working, as it only prints ‘None’ for every line in the file, but I’m sure that at least there are 3 links on the file I’m opening.

Can someone give me a hint on this?

Thanks in advance

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

score 0 · Answer 1 · 2026-05-11T11:50:46+00:00

Well, just for completeness I will add here what I found to be the best answer, and I found it on the book Dive Into Python, from Mark Pilgrim.

Here follows the code to list all URL’s from a webpage:

from sgmllib import SGMLParser  class URLLister(SGMLParser):     def reset(self):                                       SGMLParser.reset(self)         self.urls = []      def start_a(self, attrs):                              href = [v for k, v in attrs if k=='href']           if href:             self.urls.extend(href)  import urllib, urllister usock = urllib.urlopen('http://diveintopython.net/') parser = urllister.URLLister() parser.feed(usock.read())          usock.close()       parser.close()                     for url in parser.urls: print url

Thanks for all the replies.

How to approach applying for a job at a company ...

How to handle personal stress caused by utterly incompetent and ...

What is a programmer’s life like?

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions