I’ve a python 2 script to scrape data from a reddit thread, but I’ve currently got python 3 installed.
I’ve no python knowledge, but have googled that the (initial?) problem is urllib2.
Would this be easy to convert to python 3, or would it be better to install python 2?
Code here:
import urllib2
import json
# grab data
raw_data = urllib2.urlopen('http://www.reddit.com/r/books/comments/cy3gy/what_books_are_you_reading_right_now/.json').read()
thread_data = json.loads(raw_data)[1]
toplevel_comments = thread_data['data']['children']
# extract books, upvotes and downvotes
votes = dict()
for book in toplevel_comments:
try:
book_name = book['data']['body']
book_upvotes = book['data']['ups']
book_downvotes = book['data']['downs']
votes[book_name] = (book_upvotes, book_downvotes)
except KeyError:
break
# create a dictionary sorted by upvotes
votes_by_up = reversed(sorted(votes.items(), key = lambda t: t[1][0]))
# print
for item in votes_by_up:
book_name, votes = item
book_upvotes, book_downvotes = votes
print(book_name + ' -- ' + str(book_upvotes) + ' upvotes, ' +
str(book_downvotes) + ' downvotes')
The
2to3utility that comes with Python will do most of the work for you – call it on your file with the -w argument to convert the file automatically.After that, you will need to convert
raw_datafrom a byte string to a character string. Usedecoderight before yourjson.loadsline: