I’m new to Python and am playing around with making a very basic web crawler. For instance, I have made a simple function to load a page that shows the high scores for an online game. So I am able to get the source code of the html page, but I need to draw specific numbers from that page. For instance, the webpage looks like this:
http://hiscore.runescape.com/hiscorepersonal.ws?user1=bigdrizzle13
where ‘bigdrizzle13’ is the unique part of the link. The numbers on that page need to be drawn out and returned. Essentially, I want to build a program that all I would have to do is type in ‘bigdrizzle13’ and it could output those numbers.
As another poster mentioned, BeautifulSoup is a wonderful tool for this job.
Here’s the entire, ostentatiously-commented program. It could use a lot of error tolerance, but as long as you enter a valid username, it will pull all the scores from the corresponding web page.
I tried to comment as well as I could. If you’re fresh to BeautifulSoup, I highly recommend working through my example with the BeautifulSoup documentation handy.
The whole program…
And here’s a test run.
Voila 🙂