Let’s say I have the following multi-line string:
# Section
## Subsection
## Subsection
# Section
## Subsection
### Subsubsection
### Subsubsection
# Section
## Subsection
and I want it to become:
# 1 Section
## 1.1 Subsection
## 1.2 Subsection
# 2 Section
## 2.1 Subsection
### 2.1.1 Subsubsection
### 2.1.2 Subsubsection
# 3 Section
## 3.1 Subsection
In Python, using the re module, is it be possible to run a substitution on the string which would:
- Match the beginning of each line based on the number of
#‘s - Keep track of past matches of commonly-numbered groups of
#‘s - Insert this counter when appropriate into the line
…assuming that any of these ‘counters’ are always non-zero?
This problem is testing the limits of my regex knowledge. I already know I can just iterate over the lines and increment/insert some variables, so that’s not the solution I want. I’m simply curious if this kind of functionality exists solely within a regular expressions, as I know that some sort of counting already exists (e.g., number of substitutions to make).
« Ok, sure, but what if the ‘variable manipulation’ is being done in a callback function of re.sub, can it be done then? I guess a simplified form of my question is: “Can one use regular expressions to substitue differently based on previous matches?” »
It sounds like we need a generator function as a callback; unfortunately, re.sub() doesn’t accept a generator function as a callback.
So we must use some trick:
.
« Default parameter values are evaluated when the function definition is executed. This means that the expression is evaluated once, when the function is defined, and that that same “pre-computed” value is used for each call. This is especially important to understand when a default parameter is a mutable object, such as a list or a dictionary: if the function modifies the object (e.g. by appending an item to a list), the default value is in effect modified. This is generally not what was intended. »
http://docs.python.org/reference/compound_stmts.html#function
But here, it IS my plain intent.
Result:
EDIT 1 : I corrected else nb[:] = nb[0:len(match.group())] to else: only
EDIT 2 : the code can be condensed to