Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7883479
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 3, 20262026-06-03T04:32:34+00:00 2026-06-03T04:32:34+00:00

I’m using the following code on ScraperWiki to search Twitter for a specific hashtag.

  • 0

I’m using the following code on ScraperWiki to search Twitter for a specific hashtag.
It’s working great and is picking out any postcode provided in the tweet (or returning false if none is available). This is achieved with the line data['location'] = scraperwiki.geo.extract_gb_postcode(result['text']).
But I’m only interested in tweets which include postcode information (this is because they’re going to be added to a Google Map at a later stage).
What would be the easiest way to do this? I’m relatively au fait with PHP, but Python’s a completely new area for me.
Thanks in advance for your help.
Best wishes,
Martin

import scraperwiki
import simplejson
import urllib2

QUERY = 'enter_hashtag_here'
RESULTS_PER_PAGE = '100'
NUM_PAGES = 10

for page in range(1, NUM_PAGES+1):
    base_url = 'http://search.twitter.com/search.json?q=%s&rpp=%s&page=%s' \
         % (urllib2.quote(QUERY), RESULTS_PER_PAGE, page)
    try:
        results_json = simplejson.loads(scraperwiki.scrape(base_url))
        for result in results_json['results']:
            #print result
            data = {}
            data['id'] = result['id']
            data['text'] = result['text']
            data['location'] = scraperwiki.geo.extract_gb_postcode(result['text'])
            data['from_user'] = result['from_user']
            data['created_at'] = result['created_at']
            print data['from_user'], data['text']
            scraperwiki.sqlite.save(["id"], data)
    except:
        print 'Oh dear, failed to scrape %s' % base_url
        break
  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-03T04:32:35+00:00Added an answer on June 3, 2026 at 4:32 am

    Do you just want this? I tried on the free ScraperWiki test page and seems to do what you want. If you’re looking for something more complicated, let me know.

    import scraperwiki
    import simplejson
    import urllib2
    
    QUERY = 'meetup'
    RESULTS_PER_PAGE = '100'
    NUM_PAGES = 10
    
    for page in range(1, NUM_PAGES+1):
        base_url = 'http://search.twitter.com/search.json?q=%s&rpp=%s&page=%s' \
             % (urllib2.quote(QUERY), RESULTS_PER_PAGE, page)
        try:
            results_json = simplejson.loads(scraperwiki.scrape(base_url))
            for result in results_json['results']:
                #print result
                data = {}
                data['id'] = result['id']
                data['text'] = result['text']
                data['location'] = scraperwiki.geo.extract_gb_postcode(result['text'])
                data['from_user'] = result['from_user']
                data['created_at'] = result['created_at']
                if data['location']:
                    print data['location'], data['from_user']
                    scraperwiki.sqlite.save(["id"], data)
        except:
            print 'Oh dear, failed to scrape %s' % base_url
            break
    

    Outputs:

    P93JX VSDC
    FV36RL Bootstrappers
    Ci76fP Eli_Regalado
    UN56fn JasonPalmer1971
    iQ3H6zR GNOTP
    Qr04eB fcnewtech
    sE79dW melindaveee
    ud08GT MariaPanlilio
    c9B8EE akibantech
    ay26th Thepinkleash
    

    I’ve refined it a bit so it’s a bit picker than the scraperwiki check for extracting gb postcodes, which lets though quite a few false positives. Basically I took the accepted answer from here, and added some negative lookbehind/lookahead to filter out a few more. It looks like the scraper wiki check does the regex without the negative lookbehind/lookahead. Hope that helps a bit.

    import scraperwiki
    import simplejson
    import urllib2
    import re
    
    QUERY = 'sw4'
    RESULTS_PER_PAGE = '100'
    NUM_PAGES = 10
    
    postcode_match = re.compile('(?<![0-9A-Z])([A-PR-UWYZ0-9][A-HK-Y0-9][AEHMNPRTVXY0-9]?[ABEHMNPRVWXY0-9]? {0,2}[0-9][ABD-HJLN-UW-Z]{2}|GIR 0AA)(?![0-9A-Z])', re.I)
    
    for page in range(1, NUM_PAGES+1):
        base_url = 'http://search.twitter.com/search.json?q=%s&rpp=%s&page=%s' \
             % (urllib2.quote(QUERY), RESULTS_PER_PAGE, page)
        try:
            results_json = simplejson.loads(scraperwiki.scrape(base_url))
            for result in results_json['results']:
                #print result
                data = {}
                data['id'] = result['id']
                data['text'] = result['text']
                data['location'] = scraperwiki.geo.extract_gb_postcode(result['text'])
                data['from_user'] = result['from_user']
                data['created_at'] = result['created_at']
                if data['location'] and postcode_match.search(data['text']):
                    print data['location'], data['text']
                    scraperwiki.sqlite.save(["id"], data)
        except:
            print 'Oh dear, failed to scrape %s' % base_url
            break
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm using v2.0 of ClassTextile.php, with the following call: $testimonial_text = $textile->TextileRestricted($_POST['testimonial']); ... and
I ran into a problem. Wrote the following code snippet: teksti = teksti.Trim() teksti
I'm new to using the Perl treebuilder module for HTML parsing and can't figure
That's pretty much it. I'm using Nokogiri to scrape a web page what has
link Im having trouble converting the html entites into html characters, (&# 8217;) i
I am reading a book about Javascript and jQuery and using one of the
I have a string like this: La Torre Eiffel paragonata all&#8217;Everest What PHP function
I have this code to decode numeric html entities to the UTF8 equivalent character.
We're building an app, our first using Rails 3, and we're having to build
I'm parsing an RSS feed that has an &#8217; in it. SimpleXML turns this

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.