I have a string of syntactically parsed text: s = ‘ROOT (S (VP (VP

Question

0

Asked: June 4, 20262026-06-04T03:31:35+00:00 2026-06-04T03:31:35+00:00

I have a string of syntactically parsed text: s = ‘ROOT (S (VP (VP

0

I have a string of syntactically parsed text:

 s = 'ROOT (S (VP (VP (VB the) (SBAR (S (NP (DT same) (NN lecturer)) (VP (VBZ says)'

I’d like to match ‘the same’ to s. It’s key that ‘the’ and ‘same’ only match when separated by syntactic markup (i.e, (, NP, S, etc.). So, ‘the same’ should NOT find a match in s2:

 s2= 'ROOT (S (VP (VP (VB the) (SBAR (S (NP (DT lecturer) (NN same)) (VP (VBZ says)'

I’ve tried a double negative lookahead assertion to no avail:

 >>>rx = r'the(?![a-z]*)same(?![a-z]*)'
 >>>re.findall(rx,s)
 []

The idea is to match’the’ when not followed by lowercase characters and then match ‘same’ when not followed by lowercase characters.

Does anyone have a better approach?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-04T03:31:36+00:00

Editorial Team

2026-06-04T03:31:36+00:00Added an answer on June 4, 2026 at 3:31 am

So you want to match if all of the characters between the and same are not lowercase letters, here is how you can write that in regex:

the[^a-z]*same

Note that you might want to add word boundaries as well, so you don’t match something like foothe ... samebar, that would look like this:

\bthe\b[^a-z]*\bsame\b

0

Reply
Share
Share

- Report

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a string of syntactically parsed text: s = ‘ROOT (S (VP (VP

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply