I have started to learn how to scrape information from websites using urllib and

Question

0

Asked: June 13, 20262026-06-13T05:29:23+00:00 2026-06-13T05:29:23+00:00

I have started to learn how to scrape information from websites using urllib and

0

I have started to learn how to scrape information from websites using urllib and beautifulsoup. I want to grab all the text from this page (in the code) and put it into a text file.

import urllib
from bs4 import BeautifulSoup as Soup
base_url = "http://www.galactanet.com/oneoff/theegg_mod.html"



url = (base_url)
soup = Soup(urllib.urlopen(url))

print(soup.get_text())

When I run this it grabs the text although it outputs it with spaces between all the letters and still shows me HTML, unsure why though.

i   n   '   >      Y   u   p   .       B   u   t       d   o   n      t       f   e   e

Like that, any idea’s?

Also what would I do to put this info into a text file for me?

(Using beautifulsoup4 and running ubuntu 12.04 and python 2.7)

Thank you 🙂

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-13T05:29:24+00:00

I had some trouble with the encoding, so I changed your code slightly, then added the piece to print the results to a file:

import urllib
from bs4 import BeautifulSoup as Soup

base_url = "http://www.galactanet.com/oneoff/theegg_mod.html"

url = (base_url)
content = urllib.urlopen(url)
soup = Soup(content)
# print soup.original_encoding
theegg_text = soup.get_text().encode("windows-1252")

f = open("somefile.txt", "w")
f.write(theegg_text);
f.close()

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have started to learn how to scrape information from websites using urllib and

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply