I am trying to extract all occurrences of tagged words from a string using

Question

0

Asked: May 26, 20262026-05-26T04:27:04+00:00 2026-05-26T04:27:04+00:00

I am trying to extract all occurrences of tagged words from a string using

0

I am trying to extract all occurrences of tagged words from a string using regex in Python 2.7.2. Or simply, I want to extract every piece of text inside the [p][/p] tags.
Here is my attempt:

regex = ur"[\u005B1P\u005D.+?\u005B\u002FP\u005D]+?"
line = "President [P] Barack Obama [/P] met Microsoft founder [P] Bill Gates [/P], yesterday."
person = re.findall(pattern, line)

Printing person produces ['President [P]', '[/P]', '[P] Bill Gates [/P]']

What is the correct regex to get: ['[P] Barack Obama [/P]', '[P] Bill Gates [/p]']
or ['Barrack Obama', 'Bill Gates'].

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T04:27:05+00:00

import re
regex = ur"\[P\] (.+?) \[/P\]+?"
line = "President [P] Barack Obama [/P] met Microsoft founder [P] Bill Gates [/P], yesterday."
person = re.findall(regex, line)
print(person)

yields

['Barack Obama', 'Bill Gates']

The regex ur"[\u005B1P\u005D.+?\u005B\u002FP\u005D]+?" is exactly the same
unicode as u'[[1P].+?[/P]]+?' except harder to read.

The first bracketed group [[1P] tells re that any of the characters in the list ['[', '1', 'P'] should match, and similarly with the second bracketed group [/P]].That’s not what you want at all. So,

Remove the outer enclosing square brackets. (Also remove the
stray 1 in front of P.)
To protect the literal brackets in [P], escape the brackets with a
backslash: \[P\].
To return only the words inside the tags, place grouping parentheses
around .+?.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am trying to extract all occurrences of tagged words from a string using

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply