I’m trying to scrape some content off another site and I’m not sure why

Question

0

Asked: May 19, 20262026-05-19T05:30:27+00:00 2026-05-19T05:30:27+00:00

I’m trying to scrape some content off another site and I’m not sure why

0

I’m trying to scrape some content off another site and I’m not sure why BeautifulSoup is producing this output. It is only finding a blank space inside the match, but the real HTML contains a large amount of markup. I apologize if this is something stupid on my part. I’m new to python.

Here’s my code:

import sys
import os
import mechanize
import re
from BeautifulSoup import BeautifulSoup

def scrape_trails(BASE_URL, data):
    #Get the trail names
    soup = BeautifulSoup(data)
    sitesDiv = soup.findAll("div", attrs={"id" : "sitesDiv"})
    print sitesDiv


def main():
    BASE_URL = "http://www.dnr.state.mn.us/skiing/skipass/list.html"
    br = mechanize.Browser()
    data = br.open(BASE_URL).get_data()
    links = scrape_trails(BASE_URL, data)


if __name__ == '__main__':
    main()

If you follow that URL you can see the sitesDiv contains a lot of markup. I’m not sure if I’m doing something wrong or if this is just malformed markup that the script can’t handle. Thanks!

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-19T05:30:28+00:00

The problem is that the HTML served from that URL has an empty div.sitesDiv:

<div id="sitesDiv">&nbsp;</div>

There’s a script on the page that fills in the div after the page is loaded. Your Python code doesn’t execute the Javascript, so the div is never modified, so it’s still empty when your code parses it.

The good news is that the data you’re looking for is served to the HTML as JSON from this URL: http://maps.dnr.state.mn.us/cgi-bin/mapserv54?map=/usr/local/mapserver/apps/prk/ski_pass/sites.map&mode=nquery&qformat=geojson . So you can skip BeautifulSoup altogether, and just read and parse the JSON directly to get the info you want.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m trying to scrape some content off another site and I’m not sure why

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply