Using re.sub in Python 2.7, the following example uses a simple backreference:
re.sub('-{1,2}', r'\g<0> ', 'pro----gram-files')
It outputs the following string as expected:
'pro-- -- gram- files'
I would expect the following example to be identical, but it is not:
def dashrepl(matchobj):
return r'\g<0> '
re.sub('-{1,2}', dashrepl, 'pro----gram-files')
This gives the following unexpected output:
'pro\\g<0> \\g<0> gram\\g<0> files'
Why do the two examples give different output? Did I miss something in the documentation that explains this? Is there any particular reason that this behavior is preferable to what I expected? Is there a way to use backreferences in a replacement function?
As there are simpler ways to achieve your goal, you can use them.
As you already see, your replacement function gets a match object as it argument.
This object has, among others, a method
group()which can be used instead:which will give exactly your result.
But you are completely right – the docs are a bit confusing in that way:
they describe the
replargument:and
You could interpret this as if “the replacement string” returned by the function would also apply to the processment of backslash escapes.
But as this processment is described only for the case that “it is a string”, it becomes clearer, but not obvious at the first glance.