Overview So, I’m in the middle of refactoring a project, and I’m separating out

Question

0

Asked: June 1, 20262026-06-01T20:21:29+00:00 2026-06-01T20:21:29+00:00

Overview So, I’m in the middle of refactoring a project, and I’m separating out

0

Overview

So, I’m in the middle of refactoring a project, and I’m separating out a bunch of parsing code. The code I’m concerned with is pyparsing.

I have a very poor understanding of pyparsing, even after spending a lot of time reading through the official documentation. I’m having trouble because (1) pyparsing takes a (deliberately) unorthodox approach to parsing, and (2) I’m working on code I didn’t write, with poor comments, and a non-elementary set of existing grammars.

(I can’t get in touch with the original author, either.)

Failing Test

I’m using PyVows to test my code. One of my tests is as follows (I think this is clear even if you’re unfamiliar with PyVows; let me know if it isn’t):

def test_multiline_command_ends(self, topic):
                output = parsed_input('multiline command ends\n\n',topic)
                expect(output).to_equal(
r'''['multiline', 'command ends', '\n', '\n']
- args: command ends
- multiline_command: multiline
- statement: ['multiline', 'command ends', '\n', '\n']
  - args: command ends
  - multiline_command: multiline
  - terminator: ['\n', '\n']
- terminator: ['\n', '\n']''')

But when I run the test, I get the following in the terminal:

Failed Test Results

Expected topic("['multiline', 'command ends']\n- args: command ends\n- command: multiline\n- statement: ['multiline', 'command ends']\n  - args: command ends\n  - command: multiline") 
      to equal "['multiline', 'command ends', '\\n', '\\n']\n- args: command ends\n- multiline_command: multiline\n- statement: ['multiline', 'command ends', '\\n', '\\n']\n  - args: command ends\n  - multiline_command: multiline\n  - terminator: ['\\n', '\\n']\n- terminator: ['\\n', '\\n']"

Note:

Since the output is to a Terminal, the expected output (the second one) has extra backslashes. This is normal. The test ran without issue before this piece of refactoring began.

Expected Behavior

The first line of output should match the second, but it doesn’t. Specifically, it’s not including the two newline characters in that first list object.

So I’m getting this:

"['multiline', 'command ends']\n- args: command ends\n- command: multiline\n- statement: ['multiline', 'command ends']\n  - args: command ends\n  - command: multiline"

When I should be getting this:

"['multiline', 'command ends', '\\n', '\\n']\n- args: command ends\n- multiline_command: multiline\n- statement: ['multiline', 'command ends', '\\n', '\\n']\n  - args: command ends\n  - multiline_command: multiline\n  - terminator: ['\\n', '\\n']\n- terminator: ['\\n', '\\n']"

Earlier in the code, there is also this statement:

pyparsing.ParserElement.setDefaultWhitespaceChars(' \t')

…Which I think should prevent exactly this kind of error. But I’m not sure.

Even if the problem can’t be identified with certainty, simply narrowing down where the problem is would be a HUGE help.

Please let me know how I might take a step or two towards fixing this.

Edit: So, uh, I should post the parser code for this, shouldn’t I? (Thanks for the tip, @andrew cooke !)

Parser code

Here’s the __init__ for my parser object.

I know it’s a nightmare. That’s why I’m refactoring the project. ☺

def __init__(self, Cmd_object=None, *args, **kwargs):
        #   @NOTE
        #   This is one of the biggest pain points of the existing code.
        #   To aid in readability, I CAPITALIZED all variables that are
        #   not set on `self`.
        #
        #   That means that CAPITALIZED variables aren't
        #   used outside of this method.
        #
        #   Doing this has allowed me to more easily read what
        #   variables become a part of other variables during the
        #   building-up of the various parsers.
        #
        #   I realize the capitalized variables is unorthodox
        #   and potentially anti-convention.  But after reaching out
        #   to the project's creator several times over roughly 5
        #   months, I'm still working on this project alone...
        #   And without help, this is the only way I can move forward.
        #
        #   I have a very poor understanding of the parser's
        #   control flow when the user types a command and hits ENTER,
        #   and until the author (or another pyparsing expert)
        #   explains what's happening to me, I have to do silly
        #   things like this. :-|
        #
        #   Of course, if the impossible happens and this code
        #   gets cleaned up, then the variables will be restored to
        #   proper capitalization.
        #
        #   —Zearin
        #   http://github.com/zearin/
        #   2012 Mar 26

        if Cmd_object is not None:
            self.Cmd_object = Cmd_object
        else:
            raise Exception('Cmd_object be provided to Parser.__init__().')

        #   @FIXME
        #       Refactor methods into this class later
        preparse    = self.Cmd_object.preparse
        postparse   = self.Cmd_object.postparse

        self._allow_blank_lines  =  False

        self.abbrev              =  True       # Recognize abbreviated commands
        self.case_insensitive    =  True       # Commands recognized regardless of case
        # make sure your terminators are not in legal_chars!
        self.legal_chars         =  u'!#$%.:?@_' + PYP.alphanums + PYP.alphas8bit
        self.multiln_commands    =  [] if 'multiline_commands' not in kwargs else kwargs['multiln_commands']
        self.no_special_parse    =  {'ed','edit','exit','set'}
        self.redirector          =  '>'         # for sending output to file
        self.reserved_words      =  []
        self.shortcuts           =  { '?' : 'help' ,
                                      '!' : 'shell',
                                      '@' : 'load' ,
                                      '@@': '_relative_load'
                                    }
#         self._init_grammars()
#         
#     def _init_grammars(self):
        #   @FIXME
        #       Add Docstring

        #   ----------------------------
        #   Tell PYP how to parse
        #   file input from '< filename'
        #   ----------------------------
        FILENAME    = PYP.Word(self.legal_chars + '/\\')
        INPUT_MARK  = PYP.Literal('<')
        INPUT_MARK.setParseAction(lambda x: '')
        INPUT_FROM  = FILENAME('INPUT_FROM')
        INPUT_FROM.setParseAction( self.Cmd_object.replace_with_file_contents )
        #   ----------------------------

        #OUTPUT_PARSER = (PYP.Literal('>>') | (PYP.WordStart() + '>') | PYP.Regex('[^=]>'))('output')
        OUTPUT_PARSER           =  (PYP.Literal(   2 * self.redirector) | \
                                   (PYP.WordStart()  + self.redirector) | \
                                    PYP.Regex('[^=]' + self.redirector))('output')

        PIPE                    =   PYP.Keyword('|', identChars='|')

        STRING_END              =   PYP.stringEnd ^ '\nEOF'

        TERMINATORS             =  [';']
        TERMINATOR_PARSER       =   PYP.Or([
                                        (hasattr(t, 'parseString') and t)
                                        or 
                                        PYP.Literal(t) for t in TERMINATORS
                                    ])('terminator')

        self.comment_grammars    =  PYP.Or([  PYP.pythonStyleComment,
                                              PYP.cStyleComment ])
        self.comment_grammars.ignore(PYP.quotedString)
        self.comment_grammars.setParseAction(lambda x: '')
        self.comment_grammars.addParseAction(lambda x: '')

        self.comment_in_progress =  '/*' + PYP.SkipTo(PYP.stringEnd ^ '*/')

        #   QuickRef: Pyparsing Operators
        #   ----------------------------
        #   ~   creates NotAny using the expression after the operator
        #
        #   +   creates And using the expressions before and after the operator
        #
        #   |   creates MatchFirst (first left-to-right match) using the
        #       expressions before and after the operator
        #
        #   ^   creates Or (longest match) using the expressions before and
        #       after the operator
        #
        #   &   creates Each using the expressions before and after the operator
        #
        #   *   creates And by multiplying the expression by the integer operand;
        #       if expression is multiplied by a 2-tuple, creates an And of
        #       (min,max) expressions (similar to "{min,max}" form in
        #       regular expressions); if min is None, intepret as (0,max);
        #       if max is None, interpret as expr*min + ZeroOrMore(expr)
        #
        #   -   like + but with no backup and retry of alternatives
        #
        #   *   repetition of expression
        #
        #   ==  matching expression to string; returns True if the string
        #       matches the given expression
        #
        #   <<  inserts the expression following the operator as the body of the
        #       Forward expression before the operator
        #   ----------------------------


        DO_NOT_PARSE            =   self.comment_grammars       |   \
                                    self.comment_in_progress    |   \
                                    PYP.quotedString

        #   moved here from class-level variable
        self.URLRE              =   re.compile('(https?://[-\\w\\./]+)')

        self.keywords           =   self.reserved_words + [fname[3:] for fname in dir( self.Cmd_object ) if fname.startswith('do_')]

        #   not to be confused with `multiln_parser` (below)
        self.multiln_command  =   PYP.Or([
                                        PYP.Keyword(c, caseless=self.case_insensitive)
                                        for c in self.multiln_commands
                                    ])('multiline_command')

        ONELN_COMMAND           =   (   ~self.multiln_command +
                                        PYP.Word(self.legal_chars)
                                    )('command')


        #self.multiln_command.setDebug(True)


        #   Configure according to `allow_blank_lines` setting
        if self._allow_blank_lines:
            self.blankln_termination_parser = PYP.NoMatch
        else:
            BLANKLN_TERMINATOR  = (2 * PYP.lineEnd)('terminator')
            #BLANKLN_TERMINATOR('terminator')
            self.blankln_termination_parser = (
                                                (self.multiln_command ^ ONELN_COMMAND)
                                                + PYP.SkipTo(
                                                    BLANKLN_TERMINATOR,
                                                    ignore=DO_NOT_PARSE
                                                ).setParseAction(lambda x: x[0].strip())('args')
                                                + BLANKLN_TERMINATOR
                                              )('statement')

        #   CASE SENSITIVITY for
        #   ONELN_COMMAND and self.multiln_command
        if self.case_insensitive:
            #   Set parsers to account for case insensitivity (if appropriate)
            self.multiln_command.setParseAction(lambda x: x[0].lower())
            ONELN_COMMAND.setParseAction(lambda x: x[0].lower())


        self.save_parser        = ( PYP.Optional(PYP.Word(PYP.nums)^'*')('idx')
                                  + PYP.Optional(PYP.Word(self.legal_chars + '/\\'))('fname')
                                  + PYP.stringEnd)

        AFTER_ELEMENTS          =   PYP.Optional(PIPE +
                                                    PYP.SkipTo(
                                                        OUTPUT_PARSER ^ STRING_END,
                                                        ignore=DO_NOT_PARSE
                                                    )('pipeTo')
                                                ) + \
                                    PYP.Optional(OUTPUT_PARSER +
                                                 PYP.SkipTo(
                                                     STRING_END,
                                                     ignore=DO_NOT_PARSE
                                                 ).setParseAction(lambda x: x[0].strip())('outputTo')
                                            )

        self.multiln_parser = (((self.multiln_command ^ ONELN_COMMAND)
                                +   PYP.SkipTo(
                                        TERMINATOR_PARSER,
                                        ignore=DO_NOT_PARSE
                                    ).setParseAction(lambda x: x[0].strip())('args')
                                +   TERMINATOR_PARSER)('statement')
                                +   PYP.SkipTo(
                                        OUTPUT_PARSER ^ PIPE ^ STRING_END,
                                        ignore=DO_NOT_PARSE
                                    ).setParseAction(lambda x: x[0].strip())('suffix')
                                + AFTER_ELEMENTS
                             )

        #self.multiln_parser.setDebug(True)

        self.multiln_parser.ignore(self.comment_in_progress)

        self.singleln_parser  = (
                                    (   ONELN_COMMAND + PYP.SkipTo(
                                        TERMINATOR_PARSER
                                        ^ STRING_END
                                        ^ PIPE
                                        ^ OUTPUT_PARSER,
                                        ignore=DO_NOT_PARSE
                                    ).setParseAction(lambda x:x[0].strip())('args'))('statement')
                                + PYP.Optional(TERMINATOR_PARSER)
                                + AFTER_ELEMENTS)
        #self.multiln_parser  = self.multiln_parser('multiln_parser')
        #self.singleln_parser = self.singleln_parser('singleln_parser')

        self.prefix_parser       =  PYP.Empty()

        self.parser = self.prefix_parser + (STRING_END                      |
                                            self.multiln_parser             |
                                            self.singleln_parser            |
                                            self.blankln_termination_parser |
                                            self.multiln_command            +
                                            PYP.SkipTo(
                                                STRING_END,
                                                ignore=DO_NOT_PARSE)
                                            )

        self.parser.ignore(self.comment_grammars)

        # a not-entirely-satisfactory way of distinguishing
        # '<' as in "import from" from
        # '<' as in "lesser than"
        self.input_parser = INPUT_MARK                + \
                            PYP.Optional(INPUT_FROM)  + \
                            PYP.Optional('>')         + \
                            PYP.Optional(FILENAME)    + \
                            (PYP.stringEnd | '|')

        self.input_parser.ignore(self.comment_in_progress)

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-01T20:21:31+00:00

I fixed it!

Pyparsing was not at fault!

I was. ☹

By separating out the parsing code into a different object, I created the problem. Originally, an attribute used to “update itself” based on the contents of a second attribute. Since this all used to be contained in one “god class”, it worked fine.

Simply by separating the code into another object, the first attribute was set at instantiation, but no longer “updated itself” if the second attribute it depended on changed.

Specifics

The attribute multiln_command (not to be confused with multiln_commands—aargh, what confusing naming!) was a pyparsing grammar definition. The multiln_command attribute should have updated its grammar if multiln_commands ever changed.

Although I knew these two attributes had similar names but very different purposes, the similarity definitely made it harder to track the problem down. I have no renamed multiln_command to multiln_grammar.

However! ☺

I am grateful to @Paul McGuire’s awesome answer, and I hope it saves me (and others) some grief in the future. Although I feel a bit foolish that I caused the problem (and misdiagnosed it as a pyparsing issue), I’m happy some good (in the form of Paul’s advice) came of asking this question.

Happy parsing, everybody. 🙂

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Overview So, I’m in the middle of refactoring a project, and I’m separating out

Overview

Failing Test

Failed Test Results

Expected Behavior

Parser code

Leave an answerCancel reply

1 Answer

I fixed it!

Specifics

However! ☺

Leave an answer
Cancel reply