I’m using pyparsing to parse documents containing text in which the line ends vary

Question

0

Asked: May 26, 20262026-05-26T23:32:26+00:00 2026-05-26T23:32:26+00:00

I’m using pyparsing to parse documents containing text in which the line ends vary

0

I’m using pyparsing to parse documents containing text in which the line ends vary in location. I need to write a parser expression that matches the text regardless of line break location. The following does NOT work:

from __future__ import print_function
from pyparsing import *

string_1 = """The quick brown 
fox jumps over the lazy dog.
"""

string_2 = """The quick brown fox jumps
over the lazy dog.
"""

my_expr = Literal(string_1)
print(my_expr.searchString(string_1)
print(my_expr.searchString(string_2)

This results in the following being displayed on the console:

[['The quick brown \nfox jumps over the lazy dog.\n']]
[]

Since line breaks are included in ParserElement.DEFAULT_WHITE_CHARS, I don’t understand why both strings do not match my expression. How do I create a parser element which DOES match text regardless of where the line breaks occur?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T23:32:27+00:00

Your question is a good example of why I discourage people from defining literals with embedded whitespace, because this defeats pyparsing’s built-in whitespace skipping. Pyparsing skips over whitespace between expressions. In your case, you are specifying only a single expression, a Literal comprising an entire string of words, including whitespace between them.

You can get whitespace skipped by breaking your string up into separate Literals (adding a string to a pyparsing expression automatically constructs a Literal from that string):

from pyparsing import *

my_expr = Literal("The") + "quick" + "brown" + "fox" + "jumps" + "over" + "the" + "lazy" + "dog"

string_1 = """The quick brown 
fox jumps over the lazy dog.
"""

string_2 = """The quick brown fox jumps
over the lazy dog.
"""

for test in (string_1, string_2):
    print '-'*40
    print test
    print my_expr.parseString(test)
    print

If you don’t like typing all those separate quoted strings, you can have Python split the string up for you, map them to Literals, and feed the whole list to make up a pyparsing And:

my_expr = And(map(Literal, "The quick brown fox jumps over the lazy dog".split()))

If you want to preserve the original whitespace, wrap your expression in originalTextFor:

my_expr = originalTextFor(my_expr)

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m using pyparsing to parse documents containing text in which the line ends vary

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply