Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 1053707
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 16, 20262026-05-16T17:17:41+00:00 2026-05-16T17:17:41+00:00

I am parsing a relatively simple text, where each line describes a game unit.

  • 0

I am parsing a relatively simple text, where each line describes a game unit. I have little knowledge of parsing techniques, so I used the following ad hoc solution:

class Unit:
    # rules is an ordered dictionary of tagged regex that is intended to be applied in the given order
    # the group named V would correspond to the value (if any) for that particular tag
    rules = (
        ('Level', r'Lv. (?P<V>\d+)'),
        ('DPS', r'DPS: (?P<V>\d+)'),
        ('Type', r'(?P<V>Tank|Infantry|Artillery'),
        #the XXX will be expanded into a list of valid traits
        #note: (XXX| )* wouldn't work; it will match the first space it finds,
        #and stop at that if it's in front of something other than a trait
        ('Traits', r'(?P<V>(XXX)(XXX| )*)'),
        # flavor text, if any, ends with a dot
        ('FlavorText', r'(?P<V>.*\."?$)'),
        )
    rules = collections.OrderedDict(rules)
    traits = '|'.join('All-Terrain', 'Armored', 'Anti-Aircraft', 'Motorized')
    rules['Traits'] = re.sub('XXX', effects, rules['Traits'])

    for x in rules:
        rules[x] = re.sub('<V>', '<'+x+'>', rules[x])
        rules[x] = re.compile(rules[x])

    def __init__(self, data)
        # data looks like this:
        # Lv. 5 Tank DPS: 55 Motorized Armored
        for field, regex in Item.rules.items():
            data = regex.sub(self.parse, data, 1)
        if data:
            raise ParserError('Could not parse part of the input: ' + data)

    def parse(self, m):
        if len(m.groupdict()) != 1:
            Exception('Expected a single named group')
        field, value = m.groupdict().popitem()
        setattr(self, field, value)
        return ''

It works fine, but I feel I reached the limit of regex power. Specifically, in the case of Traits, the value ends up being a string that I need to split and convert into a list at a later point: e.g., obj.Traits would be set to ‘Motorized Armored’ in this code, but in a later function changed to (‘Motorized’, ‘Armored’).

I’m thinking of converting this code to use either EBNF or pyparsing grammar or something like that. My goals are:

  • make this code neater and less error-prone
  • avoid the ugly treatment of the case with a list of values (where I need do replacement inside the regex first, and later post-process the result to convert a string into a list)

What would be your suggestions about what to use, and how to rewrite the code?

P.S. I skipped some parts of the code to avoid clutter; if I introduced any errors in the process, sorry – the original code does work 🙂

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-16T17:17:42+00:00Added an answer on May 16, 2026 at 5:17 pm

    I started to write up a coaching guide for pyparsing, but looking at your rules, they translate pretty easily into pyparsing elements themselves, without dealing with EBNF, so I just cooked up a quick sample:

    from pyparsing import Word, nums, oneOf, Group, OneOrMore, Regex, Optional
    
    integer = Word(nums)
    level = "Lv." + integer("Level")
    dps = "DPS:" + integer("DPS")
    type_ = oneOf("Tank Infantry Artillery")("Type")
    traits = Group(OneOrMore(oneOf("All-Terrain Armored Anti-Aircraft Motorized")))("Traits")
    flavortext = Regex(r".*\.$")("FlavorText")
    
    rule = (Optional(level) & Optional(dps) & Optional(type_) & 
            Optional(traits) & Optional(flavortext))
    

    I included the Regex example so you could see how a regular expression could be dropped in to an existing pyparsing grammar. The composition of rule using ‘&’ operators means that the individual items could be found in any order (so the grammar takes care of the iterating over all the rules, instead of you doing it in your own code). Pyparsing uses operator overloading to build up complex parsers from simple ones: ‘+’ for sequence, ‘|’ and ‘^’ for alternatives (first-match or longest-match), and so on.

    Here is how the parsed results would look – note that I added results names, just as you used named groups in your regexen:

    data = "Lv. 5 Tank DPS: 55 Motorized Armored"
    
    parsed_data = rule.parseString(data)
    print parsed_data.dump()
    print parsed_data.DPS
    print parsed_data.Type
    print ' '.join(parsed_data.Traits)
    

    prints:

    ['Lv.', '5', 'Tank', 'DPS:', '55', ['Motorized', 'Armored']]
    - DPS: 55
    - Level: 5
    - Traits: ['Motorized', 'Armored']
    - Type: Tank
    55
    Tank
    Motorized Armored
    

    Please stop by the wiki and see the other examples. You can easy_install to install pyparsing, but if you download the source distribution from SourceForge, there is a lot of additional documentation.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

Parsing a text file in vb.net and need to locate the latitude and longitude
For parsing player commands, I've most often used the split method to split a
I'm parsing text from a file and storing it in a string. The problem
In ruby I am parsing a date in the following format: 24092008. I want
I am parsing text that has a heading and then data that applies to
I am parsing an RSS feed from the following URL: http://rss.sciam.com/ScientificAmerican-Global?format=xml // $xml_text is
I`m parsing SQL query with C# Regex. I need also to make my pattern
I'm looking at parsing a delimited string, something on the order of a,b,c But
I've been parsing through some log files and I've found that some of the
Any hints on parsing / converting / operating on hex values in c# ?

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.