I’m trying to clean out some code by removing either leading or trailing white

Question

0

Asked: June 8, 20262026-06-08T21:26:49+00:00 2026-06-08T21:26:49+00:00

I’m trying to clean out some code by removing either leading or trailing white

0

I’m trying to clean out some code by removing either leading or trailing white space characters using PyParsing. Removing leading white spaces was quite easy as I could make use of the FollowedBy subclass which matches a string but does not include it. Now I would need the same for something that follows my identifying string.

Here a small example:

from pyparsing import *

insource = """
annotation (Documentation(info="  
  <html>  
<b>FOO</b>
</html>  
 "));
"""
# Working replacement:
HTMLStartref = OneOrMore(White(' \t\n')) + (FollowedBy(CaselessLiteral('<html>')))

## Not working because of non-existing "LeadBy" 
# HTMLEndref = LeadBy(CaselessLiteral('</html>')) + OneOrMore(White(' \t\n')) + FollowedBy('"')

out = Suppress(HTMLStartref).transformString(insource)
out2 = Suppress(HTMLEndref).transformString(out)

As output one gets:

>>> print out
annotation (Documentation(info="<html>
<b>FOO</b>
</html>
 "));

and should get:

>>> print out2
annotation (Documentation(info="<html>
<b>FOO</b>
</html>"));

I looked at the documentation but could not find a “LeadBy” equivalent to FollowedBy or a way how to achieve that.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-08T21:26:50+00:00

What you are asking for is something like “lookbehind”, that is, match only if something is preceded by a particular pattern. I don’t really have an explicit class for that at the moment, but for what you want to do, you can still transform left-to-right, and just leave in the leading part, and not suppress it, just suppress the whitespace.

Here are a couple of ways to address your problem:

# define expressions to match leading and trailing
# html tags, and just suppress the leading or trailing whitespace
opener = White().suppress() + Literal("<html>")
closer = Literal("</html>") + White().suppress()

# define a single expression to match either opener
# or closer - have to add leaveWhitespace() call so that
# we catch the leading whitespace in opener
either = opener|closer
either.leaveWhitespace()

print either.transformString(insource) 


# alternative, if you know what the tag will look like:
# match 'info=<some double quoted string>', and use a parse
# action to extract the contents within the quoted string,
# call strip() to remove leading and trailing whitespace,
# and then restore the original '"' characters (which are
# auto-stripped by the QuotedString class by default)
infovalue = QuotedString('"', multiline=True)
infovalue.setParseAction(lambda t: '"' + t[0].strip() + '"')
infoattr = "info=" + infovalue

print infoattr.transformString(insource)

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m trying to clean out some code by removing either leading or trailing white

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply