I’m using pyparsing to parse documents containing text in which the line ends vary in location. I need to write a parser expression that matches the text regardless of line break location. The following does NOT work:
from __future__ import print_function
from pyparsing import *
string_1 = """The quick brown
fox jumps over the lazy dog.
"""
string_2 = """The quick brown fox jumps
over the lazy dog.
"""
my_expr = Literal(string_1)
print(my_expr.searchString(string_1)
print(my_expr.searchString(string_2)
This results in the following being displayed on the console:
[['The quick brown \nfox jumps over the lazy dog.\n']]
[]
Since line breaks are included in ParserElement.DEFAULT_WHITE_CHARS, I don’t understand why both strings do not match my expression. How do I create a parser element which DOES match text regardless of where the line breaks occur?
Your question is a good example of why I discourage people from defining literals with embedded whitespace, because this defeats pyparsing’s built-in whitespace skipping. Pyparsing skips over whitespace between expressions. In your case, you are specifying only a single expression, a Literal comprising an entire string of words, including whitespace between them.
You can get whitespace skipped by breaking your string up into separate Literals (adding a string to a pyparsing expression automatically constructs a Literal from that string):
If you don’t like typing all those separate quoted strings, you can have Python split the string up for you, map them to Literals, and feed the whole list to make up a pyparsing And:
If you want to preserve the original whitespace, wrap your expression in
originalTextFor: