The programmer who wrote the following line probably uses a python package called regex

Question

0

Asked: June 10, 20262026-06-10T17:41:47+00:00 2026-06-10T17:41:47+00:00

The programmer who wrote the following line probably uses a python package called regex

0

The programmer who wrote the following line probably uses a python package called regex.

UNIT = regex.compile("(?:{A}(?:'{A})?)++|-+|\S".format(A='\p{Word_Break=ALetter}'))

Can some one help explain what A='\p{Word_Break=ALetter}' and -+ means?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-10T17:41:48+00:00

The \p{property=value} operator matches on unicode codepoint properties, and is documented on the package index page you linked to:

Unicode codepoint properties, including scripts and blocks
\p{property=value}; \P{property=value}; \p{value} ; \P{value}

The entry matches any unicode character whose codepoint has a Word_Break property with the value ALetter (there are currently 24941 matches in the Unicode codepoint database, see the Unicode Text Segmentation, Word Boundaries chapter specifiation for details).

The example you gave also uses standard python string formatting to interpolate a partial expression into the regular expression being compiled. The “{A}” part is just a placeholder for the .format(A='...') part to fill. The end result is:

"(?:\p{Word_Break=ALetter}(?:'\p{Word_Break=ALetter})?)++|-+|\S"

The -+ sequence just matches 1 or more - dashes, just like in the python re module expressions, it is not anything special, really.

Now, the ++ before that is more interesting. It’s a possessive quantifier, and using it prevents the regex matcher from trying out all possible permutations of the pattern. It’s a performance optimization, one that prevents catastrophic backtracking issues.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

The programmer who wrote the following line probably uses a python package called regex

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply