I’m using the following code on ScraperWiki to search Twitter for a specific hashtag.
It’s working great and is picking out any postcode provided in the tweet (or returning false if none is available). This is achieved with the line data['location'] = scraperwiki.geo.extract_gb_postcode(result['text']).
But I’m only interested in tweets which include postcode information (this is because they’re going to be added to a Google Map at a later stage).
What would be the easiest way to do this? I’m relatively au fait with PHP, but Python’s a completely new area for me.
Thanks in advance for your help.
Best wishes,
Martin
import scraperwiki
import simplejson
import urllib2
QUERY = 'enter_hashtag_here'
RESULTS_PER_PAGE = '100'
NUM_PAGES = 10
for page in range(1, NUM_PAGES+1):
base_url = 'http://search.twitter.com/search.json?q=%s&rpp=%s&page=%s' \
% (urllib2.quote(QUERY), RESULTS_PER_PAGE, page)
try:
results_json = simplejson.loads(scraperwiki.scrape(base_url))
for result in results_json['results']:
#print result
data = {}
data['id'] = result['id']
data['text'] = result['text']
data['location'] = scraperwiki.geo.extract_gb_postcode(result['text'])
data['from_user'] = result['from_user']
data['created_at'] = result['created_at']
print data['from_user'], data['text']
scraperwiki.sqlite.save(["id"], data)
except:
print 'Oh dear, failed to scrape %s' % base_url
break
Do you just want this? I tried on the free ScraperWiki test page and seems to do what you want. If you’re looking for something more complicated, let me know.
Outputs:
I’ve refined it a bit so it’s a bit picker than the scraperwiki check for extracting gb postcodes, which lets though quite a few false positives. Basically I took the accepted answer from here, and added some negative lookbehind/lookahead to filter out a few more. It looks like the scraper wiki check does the regex without the negative lookbehind/lookahead. Hope that helps a bit.