This is a follow-up of my question . I am using nltk to parse

Question

0

Asked: May 26, 20262026-05-26T10:21:55+00:00 2026-05-26T10:21:55+00:00

This is a follow-up of my question . I am using nltk to parse

0

This is a follow-up of my question. I am using nltk to parse out persons, organizations, and their relationships. Using this example, I was able to create chunks of persons and organizations; however, I am getting an error in the nltk.sem.extract_rel command:

AttributeError: 'Tree' object has no attribute 'text'

Here is the complete code:

import nltk
import re
#billgatesbio from http://www.reuters.com/finance/stocks/officerProfile?symbol=MSFT.O&officerId=28066
with open('billgatesbio.txt', 'r') as f:
    sample = f.read()

sentences = nltk.sent_tokenize(sample)
tokenized_sentences = [nltk.word_tokenize(sentence) for sentence in sentences]
tagged_sentences = [nltk.pos_tag(sentence) for sentence in tokenized_sentences]
chunked_sentences = nltk.batch_ne_chunk(tagged_sentences)

# tried plain ne_chunk instead of batch_ne_chunk as given in the book
#chunked_sentences = [nltk.ne_chunk(sentence) for sentence in tagged_sentences]

# pattern to find <person> served as <title> in <org>
IN = re.compile(r'.+\s+as\s+')
for doc in chunked_sentences:
    for rel in nltk.sem.extract_rels('ORG', 'PERSON', doc,corpus='ieer', pattern=IN):
        print nltk.sem.show_raw_rtuple(rel)

This example is very similar to the one given in the book, but the example uses prepared ‘parsed docs,’ which appears of nowhere and I don’t know where to find its object type. I scoured thru the git libraries as well. Any help is appreciated.

My ultimate goal is to extract persons, organizations, titles (dates) for some companies; then create network maps of persons and organizations.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T10:21:56+00:00

It looks like to be a “Parsed Doc” an object needs to have a headline member and a text member both of which are lists of tokens, where some of the tokens are marked up as trees. For example this (hacky) example works:

import nltk
import re

IN = re.compile (r'.*\bin\b(?!\b.+ing)')

class doc():
  pass

doc.headline=['foo']
doc.text=[nltk.Tree('ORGANIZATION', ['WHYY']), 'in', nltk.Tree('LOCATION',['Philadelphia']), '.', 'Ms.', nltk.Tree('PERSON', ['Gross']), ',']

for rel in  nltk.sem.extract_rels('ORG','LOC',doc,corpus='ieer',pattern=IN):
   print nltk.sem.relextract.show_raw_rtuple(rel)

When run this provides the output:

[ORG: 'WHYY'] 'in' [LOC: 'Philadelphia']

Obviously you wouldn’t actually code it like this, but it provides a working example of the data format expected by extract_rels, you just need to determine how to do your preprocessing steps to get your data massaged into that format.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

This is a follow-up of my question . I am using nltk to parse

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply