i’m trying to extract from proper noun from a tagged file. But the problem is the the code that i’m trying with sometimes gives a error That is :
Traceback (most recent call last):
File "E:\pt\paragraph", line 35, in <module>
sen1= noun(mylist[s])
File "E:\pt\paragraph", line 5, in noun
word, tag = word.split('/')
ValueError: too many values to unpack
The code works fine for some texts but for some it gives the error.
The code:
def noun(words):
nouns = []
for word in words.split():
word, tag = word.split('/')
if (tag.lower() == 'np'):
nouns.append(word);
return nouns
def splitParagraph(paragraph):
import re
paragraphs = paragraph.split('\n\n')
return paragraphs
if __name__ == '__main__':
import nltk
rp = open("t3.txt", 'r')
text = rp.read()
mylist = []
para = splitParagraph(text)
for s in para:
mylist.append(s)
for s in range(len(mylist)-1):
sen1= noun(mylist[s])
sen2= noun(mylist[s+1])
The currently i’m trying with works if i remove the 1st paragraph other wise it gives the error.
Sample of the text:
A/at good/jj man/nn-hl departs/vbz-hl ./. Goodbye/uh-hl ,/,-hl Mr./np-hl Sam/np-hl./. Sam/np Rayburn/np was/bedz a/at good/jj man/nn ,/, a/at good/jj American/np ,/, and/cc ,/, third/od ,/, a/at good/jj Democrat/np ./. He/pps was/bedz all/abn of/in these/dts rolled/vbn into/in one/cd sturdy/jj figure/nn ;/. ;/. Mr./np Speaker/nn-tl ,/, Mr./np Sam/np ,/, and/cc Mr./np Democrat/np ,/, at/in one/cd and/cc the/at same/ap time/nn ./.
The/at House/nn-tl was/bedz his/pp$ habitat/nn and/cc there/rb he/pps flourished/vbd ,/, first/rb as/cs a/at young/jj representative/nn ,/, then/rb as/cs a/at forceful/jj committee/nn chairman/nn ,/, and/cc finally/rb in/in the/at post/nn for/in which/wdt he/pps seemed/vbd intended/vbn from/in birth/nn ,/, Speaker/nn-tl of/in-tl the/at-tl House/nn-tl ,/, and/cc second/od most/ql powerful/jj man/nn in/in Washington/np ./.
if i remove the 1st paragraph (A/at good/jj man/nn-hl departs…) the code works. How to solve this problem.
thanks in advance.
your “word” contains more than one “/”.
So unpacking it into (tag, word) will not work. You’ll have to figure out how you want to handle the case where your tag/word has more than one “/”.
I just realized you can use the “maxsplit” option to the strings split method if you only want to split on the first “/”.