I’m trying to clean out some code by removing either leading or trailing white space characters using PyParsing. Removing leading white spaces was quite easy as I could make use of the FollowedBy subclass which matches a string but does not include it. Now I would need the same for something that follows my identifying string.
Here a small example:
from pyparsing import *
insource = """
annotation (Documentation(info="
<html>
<b>FOO</b>
</html>
"));
"""
# Working replacement:
HTMLStartref = OneOrMore(White(' \t\n')) + (FollowedBy(CaselessLiteral('<html>')))
## Not working because of non-existing "LeadBy"
# HTMLEndref = LeadBy(CaselessLiteral('</html>')) + OneOrMore(White(' \t\n')) + FollowedBy('"')
out = Suppress(HTMLStartref).transformString(insource)
out2 = Suppress(HTMLEndref).transformString(out)
As output one gets:
>>> print out
annotation (Documentation(info="<html>
<b>FOO</b>
</html>
"));
and should get:
>>> print out2
annotation (Documentation(info="<html>
<b>FOO</b>
</html>"));
I looked at the documentation but could not find a “LeadBy” equivalent to FollowedBy or a way how to achieve that.
What you are asking for is something like “lookbehind”, that is, match only if something is preceded by a particular pattern. I don’t really have an explicit class for that at the moment, but for what you want to do, you can still transform left-to-right, and just leave in the leading part, and not suppress it, just suppress the whitespace.
Here are a couple of ways to address your problem: