Having a problem getting scanString to work in a case where parseString gives a correct result.
This sequence works:
alpha_rev = pyp.Word(pyp.alphas, max=2)
num_rev = pyp.Word('123456789', max=2)
space = pyp.White(ws=" ").suppress()
revisionExpr = (
pyp.StringStart().leaveWhitespace() +
space +
pyp.Combine(alpha_rev +
pyp.Optional(num_rev)("rev"))
)
rev_string = ' K WI, This is the title'
for match_str, start, end in (
revisionExpr.scanString(rev_string, maxMatches=1)):
print match_str
['K']
Sometimes there is a “Rev” or “Rev.” before the revision; this fails:
revisionExpr = (
pyp.StringStart().leaveWhitespace() +
space +
pyp.Combine(alpha_rev +
pyp.Optional(num_rev)("rev"))
|
pyp.CaselessLiteral("Rev") + pyp.Optional('.') +
pyp.Combine(alpha_rev +
pyp.Optional(num_rev)("rev"))
)
for match_str, start, end in (
revisionExpr.scanString(rev_string, maxMatches=1)):
print match_str
print match_str
NameError: name 'match_str' is not defined
Why is “|” causing the the match to fail? Note that this works with both the first and second example:
revisionTokens = revisionExpr.parseString(rev_string)
If I extract the second part of the last example (after the “|”) into a form like the first example, it works if I add “Rev.” in front of the “K” in rev_string. Unfortunately, the leading whitespace in the first expression is necessary to uniquely identifiy the revision string, otherwise, in this example, “WI” would match.
I’m trying to use scanString instead of parseString because it returns the starting and ending positions of the match which helps with some later processing.
The problem is that your “or” operator (“|”) is only looking at the elements directly to the left and right of it. You have not grouped your grammar elements correctly. Here is your grammar broken down a bit more:
As you can see, this isn’t quite what you wanted – it’s going to look for either the text “Rev” or the actual revision, followed by another revision. A fixed version of the expression is below:
However, you can make your grammar a bit more concise:
In this version, you only mark the “Rev.” text as optional, in the position where it is expected, rather than giving the parse the option to parse just a revision OR “Rev.” + a revision. This avoids any issues arising from using the “|” operator at all.
Don’t forget that PyParsing uses operator overloading to provide nicer syntax, if the syntax causes confusion (like in this scenario) you may be better off just using the long-form method calls, like “pyp.Or(a, b)”.