I’m fetching and parsing a medium-large quantity of webpages. I noticed my script was

Question

0

Asked: June 4, 20262026-06-04T01:23:01+00:00 2026-06-04T01:23:01+00:00

I’m fetching and parsing a medium-large quantity of webpages. I noticed my script was

0

I’m fetching and parsing a medium-large quantity of webpages. I noticed my script was spontaneously ending with a Python session restart. Thus far it only seems to happen when I try to make soup out of the nasa.gov page. i.e.:

import urllib2
from bs4 import BeautifulSoup

page=urllib2.urlopen('http://www.nasa.gov')
soup=BeautifulSoup(page)

=====================================RESTART=======================================

Does anyone know why this might be occurring and whether there’s anyway I can avoid it? It doesn’t throw an exception or anything, the session just restarts. This happens on two different machines, although I’d be interested if it isn’t reproducible by others (I’m using Python 2.7.2 – Enthought Distribution)

EDIT/UPDATE:

I’ve just tried to substitute lxml for BeautifulSoup, but it causes the same spontaneous restart. i.e.

from lxml import html
page=html.parse('http://www.nasa.gov')

============================== RESTART =================================

As soon as Python opens and tries to parse the page the session restarts. Interestingly, reading the page and printing it to the console works fine.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-04T01:23:02+00:00

Editorial Team

2026-06-04T01:23:02+00:00Added an answer on June 4, 2026 at 1:23 am

The Doctype is wrong for that url. Try this:

page=urllib2.urlopen('http://www.nasa.gov/').read().replace("<!DOCTYPE \"xmlns:xsl='http://www.w3.org/1999/XSL/Transform'\">", "<!DOCTYPE html>")

soup=BeautifulSoup(page)

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m fetching and parsing a medium-large quantity of webpages. I noticed my script was

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply