I am trying to parse the result output from a natural language parser (Stanford

Question

0

Asked: May 27, 20262026-05-27T01:25:05+00:00 2026-05-27T01:25:05+00:00

I am trying to parse the result output from a natural language parser (Stanford

0

I am trying to parse the result output from a natural language parser (Stanford parser).
Some of the results are as below:

dep(Company-1, rent-5')
conj_or(rent-5, share-10)
amod(information-12, personal-11)
prep_about(rent-5, you-14)
amod(companies-20, non-affiliated-19)
aux(provide-23, to-22)
xcomp(you-14, provide-23)
dobj(provide-23, products-24)
aux(requested-29, 've-28)

The result am trying to get are:

['dep', 'Company', 'rent']
['conj_or', 'rent', 'share']
['amod', 'information', 'personal']
...
['amod', 'companies', 'non-affiliated']
...
['aux', 'requested', "'ve"]

First I tried to directly get these elements out, but failed.
Then I realized regex should be the right way forward.

However, I am totally unfamiliar with regex. With some exploration, I got:

m = re.search('(?<=())\w+', line)
m2 =re.search('(?<=-)\d', line)

and stuck.

The first one can correctly get the first elements, e.g. 'dep', 'amod', 'conj_or', but I actually have not totally figured out why it is working…

Second line is trying to get the second elements, e.g. 'Company', 'rent', 'information', but I can only get the number after the word. I cannot figure out how to lookbefore rather than lookbehind…

BTW, I also cannot figure out how to deal with exceptions such as 'non-affiliated' and "'ve".

Could anyone give some hints or help. Highly appreciated.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-27T01:25:06+00:00

It is difficult to give an optimal answer without knowing the full range of possible outputs, however, here’s a possible solution:

>>> [re.findall(r'[A-Za-z_\'-]+[^-\d\(\)\']', line) for line in s.split('\n')]
[['dep', 'Company', 'rent'], 
 ['conj_or', 'rent', 'share'], 
 ['amod', 'information', 'personal'], 
 ['prep_about', 'rent', 'you'], 
 ['amod', 'companies', 'non-affiliated'], 
 ['aux', 'provide', 'to'], 
 ['xcomp', 'you', 'provide'], 
 ['dobj', 'provide', 'products'], 
 ['aux', 'requested', "'ve"]]

It works by finding all the groups of contiguous letters ([A-Za-z] represent the interval between capital A and Z and small a and z) or the characters “_” and “‘” in the same line.

Furthermore it enforce the rule that your matched string must not have in the last position a given list of characters ([^...] is the syntax to say “must not contain any of the characters (replace “…” with the list of characters)).

The character \ escapes those characters like “(” or “)” that would otherwise be parsed by the regex engine as instructions.

Finally, s is the example string you gave in the question…

HTH!

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am trying to parse the result output from a natural language parser (Stanford

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply