I want to build a regular expression based on a template and a set of predefined blocks, and using string.Template for the substitution.
For example:
- template:
/data/${year}_${month}_${day}/${year}${month}${day}_${type}_${id}.dat - blocks:
- day:
(?P<day>\d{2}) - month:
(?P<month>\d{2}) - year:
(?P<year>\d{4}) - type:
(?P<typechar>[BDPCLNIYSQJ]) - id:
(?P<id>\d{8})
- day:
>>> string.Template(template).safe_substitute(blocks)
/data/(?P<year>\d{4})_(?P<month>\d{2})_(?P<day>\d{2})/(?P<year>\d{4})(?P<month>\d{2})(?P<day>\d{2})_(?P<typechar>[BDPCLNIYSQJ])_(?P<id>\d{8}).dat
The problem is with duplicated name groups, which are not accepted in the regular expression.
I’m looking either for a way to correct the template (before or after the substitution), a way to trick re to swallow the duplicates, or a complete new approach to the problem.
After following a friend’s advice, I found a way to achieve the desired result.
The idea is to modify the template string to eliminate duplicate vars before substituting the regex blocks. In fact it’s not removing the duplicates but replacing them with a reference to the first one with the (?P=name) syntax. This way you force the contents to be the same everywhere you use that block.
I will assume the regex groupname is the same as the template block name. This is not true in the question example, but it can be changed without any problem.
To transform the duplicates I use the following function:
which returns the transformed template without duplicates and enforcing all similar blocks to have the same content.
Afterwards it’s just a matter of replacing the blocks
Not mentioned in the question, but the same original template can also be used to reconstruct the string we want to match with the regex, which is the original aim of this code.