Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7543157
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 30, 20262026-05-30T08:21:12+00:00 2026-05-30T08:21:12+00:00

I have a text file that looks similar to; section header 1: some words

  • 0

I have a text file that looks similar to;

section header 1:
some words can be anything
more words could be anything at all
etc etc lala

some other header:
as before could be anything
hey isnt this fun

I am trying to contruct a grammar with pyparser that would result in the following list structure when asking for the parsed results as a list; (IE; the following should be printed when iterating through the parsed.asList() elements)

[‘section header 1:’,[[‘some words can be anything’],[‘more words could be anything at all’],[‘etc etc lala’]]]
[‘some other header:’,[[‘as before could be anything’],[‘hey isnt this fun’]]]

The header names are all known beforehand, and individual headers may or may not appear. If they do appear, thre is always at least one line of content.

The problem I am having, is that I am having trouble gettnig the parser to recognise where ‘section header 1:’ ands, and ‘some other header:’ begins. I end up with a parsed.asList() looking like;

[‘section header 1:’,[[”some words can be anything’],[‘more words could be anything at all’],[‘etc etc lala’],[‘some other header’],[”as before could be anything’],[‘hey isnt this fun’]]]

(IE: section header 1: gets seen correctly, but everythng following it gets added to section header 1, including further header lines etc..)

Ive tried various things, played with leaveWhitespace() and LineEnd() in various ways but I can’t figure it out.

The base parser I am hacking about with is (contrived example – in reality this is a class definition etc..).

header_1_line=Literal('section header 1:')

text_line=Group(OneOrMore(Word(printables)))

header_1_block=Group(header_1_line+Group(OneOrMore(text_line)))

header_2_line=Literal('some other header:')

header_2_block=Group(header_2_line+Group(OneOrMore(text_line)))

overall_structure=ZeroOrMore(header_1_block|header_2_block)

and is being called with

parsed=overall_structure.parseFile()

Cheers, Matt.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-30T08:21:14+00:00Added an answer on May 30, 2026 at 8:21 am

    Matt –

    Welcome to pyparsing! You have fallen into one of the most common pitfalls in working with pyparsing, and that is that people are smarter than computers. When you look at your input text, you can easily see which text can be headers and which text can’t be. Unfortunately, pyparsing is not so intuitive, so you have to tell it explicitly what can and can’t be text.

    When you look at your sample text, you are not accepting just any line of text as possible text within a section header. How do you know that ‘some other header:’ is not valid as text? Because you know that that string matches one of the known header strings. But in your current code, you have told pyparsing that any collection of Word(printables) is valid text, even if that collection is a valid section header.

    To fix this, you have to add some explicit lookahead to your parser. Pyparsing offers two constructs, NotAny and FollowedBy. NotAny can be abbreviated using the ‘~’ operator, so we can write this pseudocode expression for text:

    text = ~any_section_header + everything_up_to_the_end_of_the_line
    

    Here is a complete parser using negative lookahead to make sure you read each section, breaking on section headings:

    from pyparsing import ParserElement, LineEnd, Literal, restOfLine, ZeroOrMore, Group, StringEnd
    
    test = """
    section header 1:
     some words can be anything
     more words could be anything at all
     etc etc lala 
    
    some other header:
     as before could be anything
     hey isnt this fun
    """
    ParserElement.defaultWhitespaceChars=(" \t")
    NL = LineEnd().suppress()
    END = StringEnd()
    
    header_1=Literal('section header 1:') 
    header_2=Literal('some other header:')
    any_header = (header_1 | header_2)
    # text isn't just anything! don't accept header line, and stop at the end of the input string
    text=Group(~any_header + ~END + restOfLine) 
    
    overall_structure = ZeroOrMore(Group(any_header +
                                         Group(ZeroOrMore(text))))
    overall_structure.ignore(NL)
    
    from pprint import pprint
    print(overall_structure.parseString(test).asList())
    

    In my first attempt, I forgot to also look for the end of string, so my restOfLine expression looped forever. By adding a second lookahead for the string end, my program terminates successfully. Exercise left for you: instead of enumerating all possible headers, define a header line as any line that ends with a ‘:’.

    Good luck with your pyparsing efforts,
    — Paul

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a text file that looks like this. A 102 B 456 C
I have a text file that looks a bit like: random text random text,
I have a text file that looks like this: value1 value2 value3 There are
I have a file (called print_1012720.txt ) that looks like the text shown below.
I have a text file that contains some strings separated by ,. Strings are
I have a text file that looks like: text texttext texttext texttext texttext text
I have multiple lines of texts in a text file that look similar to
I have an XML file containing some scientific text that I want to display
I have a RichTextBox that looks similar to this: TEXT TEXT-1 227.905 174.994 180
I have Text file that contains data separated with a comma , . How

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.