My goal here is to create a very simple template language. At the moment, I’m working on replacing a variable with a value, like this:
This input:
The Web
Should produce this output:
The Web This Is A Test Variable
I’ve got it working. But looking at my code, I’m running multiple identical regexes on the same strings — that just offends my sense of efficiency. There’s got to be a better, more Pythonic way. (It’s the two ‘while’ loops that really offend.)
This does pass the unit tests, so if this is silly premature optimization, tell me — I’m willing to let this go. There may be dozens of these variable definitions and uses in a document, but not hundreds. But I suspect there’s obvious (to other people) ways of improving this, and I’m curious what the StackOverflow crowd will come up with.
def stripMatchedQuotes(item): MatchedSingleQuotes = re.compile(r''(.*)'', re.LOCALE) MatchedDoubleQuotes = re.compile(r''(.*)'', re.LOCALE) item = MatchedSingleQuotes.sub(r'\1', item, 1) item = MatchedDoubleQuotes.sub(r'\1', item, 1) return item def processVariables(item): VariableDefinition = re.compile(r'<%(.*?)=(.*?)%>', re.LOCALE) VariableUse = re.compile(r'<%(.*?)%>', re.LOCALE) Variables={} while VariableDefinition.search(item): VarName, VarDef = VariableDefinition.search(item).groups() VarName = stripMatchedQuotes(VarName).upper().strip() VarDef = stripMatchedQuotes(VarDef.strip()) Variables[VarName] = VarDef item = VariableDefinition.sub('', item, 1) while VariableUse.search(item): VarName = stripMatchedQuotes(VariableUse.search(item).group(1).upper()).strip() item = VariableUse.sub(Variables[VarName], item, 1) return item
The first thing that may improve things is to move the re.compile outside the function. The compilation is cached, but there is a speed hit in checking this to see if its compiled.
Another possibility is to use a single regex as below:
Finally, you can combine this into the regex in processVariables. Taking Torsten Marek’s suggestion to use a function for re.sub, this improves and simplifies things dramatically.
Here are my timings for 100000 runs:
[Edit] Add missing non-greedy specifier
[Edit2] Added .upper() calls so case insensitive like original version