I have started to learn how to scrape information from websites using urllib and beautifulsoup. I want to grab all the text from this page (in the code) and put it into a text file.
import urllib
from bs4 import BeautifulSoup as Soup
base_url = "http://www.galactanet.com/oneoff/theegg_mod.html"
url = (base_url)
soup = Soup(urllib.urlopen(url))
print(soup.get_text())
When I run this it grabs the text although it outputs it with spaces between all the letters and still shows me HTML, unsure why though.
i n ' > Y u p . B u t d o n t f e e
Like that, any idea’s?
Also what would I do to put this info into a text file for me?
(Using beautifulsoup4 and running ubuntu 12.04 and python 2.7)
Thank you 🙂
I had some trouble with the encoding, so I changed your code slightly, then added the piece to print the results to a file: