Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 9208123
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 18, 20262026-06-18T00:32:55+00:00 2026-06-18T00:32:55+00:00

Here is my problem. I have a sample text file where I store the

  • 0

Here is my problem. I have a sample text file where I store the text data by crawling various html pages. This text contains information about various events and its time and location. I want to fetch the coordinates of these locations. I have no idea on how I can do that in python. I am using nltk to recognize named entities in this sample text. Here is the code:

import nltk

with open('sample.txt', 'r') as f:
    sample = f.read()

sentences = nltk.sent_tokenize(sample)
tokenized_sentences = [nltk.word_tokenize(sentence) for sentence in sentences]
tagged_sentences = [nltk.pos_tag(sentence) for sentence in tokenized_sentences]
chunked_sentences = nltk.batch_ne_chunk(tagged_sentences, binary=True)

#print chunked_sentences
#print tokenized_sentences
#print tagged_sentences

def extract_entity_names(t):
    entity_names = []

    if hasattr(t, 'node') and t.node:
        if t.node == 'NE':
            entity_names.append(' '.join([child[0] for child in t]))
        else:
            for child in t:
                entity_names.extend(extract_entity_names(child))

    return entity_names

entity_names = []
for tree in chunked_sentences:
    # Print results per sentence
    # print extract_entity_names(tree)

    entity_names.extend(extract_entity_names(tree))

# Print all entity names
#print entity_names

# Print unique entity names
print set(entity_names)

Sample file is something like this:

La bohème at Covent Garden

When: 18 Jan 2013 (various dates) , 7.30pm Where: Covent Garden,
London, John Copley’s perennially popular Royal Opera production of
Puccini’s La bohème is revived for the first of two times this season,
aptly over the Christmas period. Sir Mark Elder conducts Rolando
Villazón as Rodolfo and Maija Kovalevska as Mimì. Mimì meets poet
Rodolfo (Dmytro Popov sings the role on 5 and 18 January) one cold
Christmas Eve in Paris’ Latin Quarter. Fumbling around in the dark
after her candle has gone out, they fall in love. Rodolfo lives with
three other lads: philosopher Colline (Nahuel di Pierro/Jihoon Kim on
18 January), musician Schaunard (David Bizic) and painter Marcello
(Audun Iversen), who loves Musetta (Stefania Dovhan). Both couples
break up and the opera ends in tragedy as Rodolfo finds Mimì dying of
consumption in a freezing garret.

I want to fetch coordinates for Covent Garden,London from this text. How can I do it ?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-18T00:32:57+00:00Added an answer on June 18, 2026 at 12:32 am

    You really have two questions:

    1. How to extract location text (or potential location text).
    2. How to get location (latitude, longitude) by calling a Geocoding service with location text.

    I can help with the second question. (But see edit below for some help with your first question.)

    With the old Google Maps API (which is still working), you could get the geocoding down to one line (one ugly line):

    def geocode(address):
        return tuple([float(s) for s in list(urllib.urlopen('http://maps.google.com/maps/geo?' + urllib.urlencode({'output': 'csv','q': address})))[0].split(',')[2:]])
    

    Check out the Google Maps API Geocoding Documentation:

    Here’s the readable 7 line version plus some wrapper code (when calling from the command line remember to enclose address in quotes):

    import sys
    import urllib
    
    googleGeocodeUrl = 'http://maps.google.com/maps/geo?'
    
    def geocode(address):
        parms = {
            'output': 'csv',
            'q': address}
    
        url = googleGeocodeUrl + urllib.urlencode(parms)
        resp = urllib.urlopen(url)
        resplist = list(resp)
        line = resplist[0]
        status, accuracy, latitude, longitude = line.split(',')
        return latitude, longitude
    
    def main():
        if 1 < len(sys.argv):
            address = sys.argv[1]
        else:
            address = '1600 Amphitheatre Parkway, Mountain View, CA 94043, USA'
    
        coordinates = geocode(address)
        print coordinates
    
    if __name__ ==  '__main__':
        main()
    

    It’s simple to parse the CSV format, but the XML format has better error reporting.

    Edit – Help with your first question

    I looked in to nltk. It’s not trivial, but I can recommend Natural Language Toolkit Documentation, CH 7 – Extracting Information from Text, specifically, 7.5 Named Entity Recognition. At the end of the section, they point out:

    NLTK provides a classifier that has already been trained to recognize named entities, accessed with the function nltk.ne_chunk(). If we set the parameter binary=True , then named entities are just tagged as NE; otherwise, the classifier adds category labels such as PERSON, ORGANIZATION, and GPE.

    You’re specifying True, but you probably want the category labels, so:

    chunked_sentences = nltk.batch_ne_chunk(tagged_sentences)
    

    This provides category labels (named entity type), which seemed promising. But after trying this on your text and a few simple phrases with location, it’s clear more rules are needed. Read the documentation for more info.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

My input.txt file contains the following sample text: you have to let's come and
Here's the problem: I have a data-bound list of items, basically a way for
I have a big text file containing data that needs to be extracted and
Below is my sample text file { Here is my schema file [Sample File.txt]
I have some html data stored in text files right now. I recently decided
This problem only occurs on IE8 (not IE8 compatibility mode). I have a file
I have this two lines of html code... <div id=slider1 data-param1=XXX data-param2=XXX></div> <script src=script.js
here is sample of the text file I am working with: <Opera> Tristan/NNP and/CC
I have an application which populates a text file with information in CSV format.
I have a simple stupid question. Here's the problem, for years, my company has

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.