Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6730079
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 26, 20262026-05-26T10:21:55+00:00 2026-05-26T10:21:55+00:00

This is a follow-up of my question . I am using nltk to parse

  • 0

This is a follow-up of my question. I am using nltk to parse out persons, organizations, and their relationships. Using this example, I was able to create chunks of persons and organizations; however, I am getting an error in the nltk.sem.extract_rel command:

AttributeError: 'Tree' object has no attribute 'text'

Here is the complete code:

import nltk
import re
#billgatesbio from http://www.reuters.com/finance/stocks/officerProfile?symbol=MSFT.O&officerId=28066
with open('billgatesbio.txt', 'r') as f:
    sample = f.read()

sentences = nltk.sent_tokenize(sample)
tokenized_sentences = [nltk.word_tokenize(sentence) for sentence in sentences]
tagged_sentences = [nltk.pos_tag(sentence) for sentence in tokenized_sentences]
chunked_sentences = nltk.batch_ne_chunk(tagged_sentences)

# tried plain ne_chunk instead of batch_ne_chunk as given in the book
#chunked_sentences = [nltk.ne_chunk(sentence) for sentence in tagged_sentences]

# pattern to find <person> served as <title> in <org>
IN = re.compile(r'.+\s+as\s+')
for doc in chunked_sentences:
    for rel in nltk.sem.extract_rels('ORG', 'PERSON', doc,corpus='ieer', pattern=IN):
        print nltk.sem.show_raw_rtuple(rel)

This example is very similar to the one given in the book, but the example uses prepared ‘parsed docs,’ which appears of nowhere and I don’t know where to find its object type. I scoured thru the git libraries as well. Any help is appreciated.

My ultimate goal is to extract persons, organizations, titles (dates) for some companies; then create network maps of persons and organizations.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-26T10:21:56+00:00Added an answer on May 26, 2026 at 10:21 am

    It looks like to be a “Parsed Doc” an object needs to have a headline member and a text member both of which are lists of tokens, where some of the tokens are marked up as trees. For example this (hacky) example works:

    import nltk
    import re
    
    IN = re.compile (r'.*\bin\b(?!\b.+ing)')
    
    class doc():
      pass
    
    doc.headline=['foo']
    doc.text=[nltk.Tree('ORGANIZATION', ['WHYY']), 'in', nltk.Tree('LOCATION',['Philadelphia']), '.', 'Ms.', nltk.Tree('PERSON', ['Gross']), ',']
    
    for rel in  nltk.sem.extract_rels('ORG','LOC',doc,corpus='ieer',pattern=IN):
       print nltk.sem.relextract.show_raw_rtuple(rel)
    

    When run this provides the output:

    [ORG: 'WHYY'] 'in' [LOC: 'Philadelphia']
    

    Obviously you wouldn’t actually code it like this, but it provides a working example of the data format expected by extract_rels, you just need to determine how to do your preprocessing steps to get your data massaged into that format.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Follow up to this question for Facebook Friends.getAppUsers using Graph API that pulls friends
This is a very direct follow-up on this question . Using matplotlib , I'd
This is a follow up question as hadley pointed out unless I fix the
This is a follow up question to Using 301/303/307 redirects for dynamic short urls
This is a follow up question to: Dynamically updating a4j:repeat using data from form
This is a follow-up to my question about using multi-line regex in Vim .
This a follow up to the question I posed here . Using the same
This is a follow-up to this question: Using javascript:function syntax versus jQuery selector to
This is a follow up to my previous question about using XMLHttpRequest() to post
This is a follow-up question to SSL Handshaking Using Self-Signed Certs and SSLEngine (JSSE)

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.