Considering the following: import re sequence = ‘FFFFFF{7}FFFFFF’ patterns = [ (‘([0-9a-fA-F]+)’, ‘Sequence’), (‘(\\([0-9a-fA-F]+\\|[0-9a-fA-F]+\\))’,

Question

0

Asked: June 19, 20262026-06-19T03:03:13+00:00 2026-06-19T03:03:13+00:00

Considering the following: import re sequence = ‘FFFFFF{7}FFFFFF’ patterns = [ (‘([0-9a-fA-F]+)’, ‘Sequence’), (‘(\\([0-9a-fA-F]+\\|[0-9a-fA-F]+\\))’,

0

Considering the following:

import re
sequence = 'FFFFFF{7}FFFFFF'
patterns = [ ('([0-9a-fA-F]+)', 'Sequence'),
    ('(\\([0-9a-fA-F]+\\|[0-9a-fA-F]+\\))', 'Option'),
    ('({[0-9a-fA-F]+})', 'Range'),
    ('(\\[[0-9a-fA-F]+:([0-9a-fA-F]+|\*)\\])', 'Slice'),
    ('(\\?\\?)+', 'Byte_value_Wildcard'),
    ('(\\*)+', 'Byte_length_wildcard') ]
fragment_counter = 0
fragment_dict= {}
fragments_list = []
while sequence:
    found = False
    for pattern, name in patterns:
        m = re.match (pattern,sequence)
        if m:
            fragment_counter+=1
            m = m.groups () [0]
            fragment_dict["index"]=fragment_counter
            fragment_dict["fragment_type"]=name
            fragment_dict["value"]=m
            print fragment_dict
            fragments_list.append(fragment_dict)
            sequence = sequence [len (m):]
            found = True
            break
     if not found: raise Exception ('Unrecognized sequence')

print fragments_list

Each time it hits the “print fragment_dict”line, I get the correct (expected) output:

{'index': 1, 'fragment_type': 'Sequence', 'value': 'FFFFFF'}
{'index': 2, 'fragment_type': 'Range', 'value': '{7}'}
{'index': 3, 'fragment_type': 'Sequence', 'value': 'FFFFFF'}

however, the list item fragments_list is 3 copies of the final dict, not each of the lines as I expect:

[{'index': 3, 'fragment_type': 'Sequence', 'value': 'FFFFFF'}, {'index': 3, 'fragment_type': 'Sequence', 'value': 'FFFFFF'}, {'index': 3, 'fragment_type': 'Sequence', 'value': 'FFFFFF'}]

I am assuming this is because the append is referencing the instance of the dict, not copying each iteration of the dict. I looked at using the list() function, but on the dict item it just gives me a list of the dict keys.

What am I doing wrong?
I’m not wedded to data type, I just needed a way to hold 3 data elements (and maybe a 4th) for each fragment I find.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-19T03:03:14+00:00

Nice to see that you are actually using the code of my answer to your last question. I am always happy when my answers do actually help.

Taking into consideration, that in your previous question you stated that this parsing was only the warm-up for afterwards processing the parsed tokens, you might consider this:

Create a class for each token type you have. Implement a process method inside each of those, that will sometime do the actual processing (instead of the wolfing, bearing, foxing and badgering I do in the below code).

Then parse a whole stream with the Stream class. You can iterate over the tokens via Stream.tokens and you can process all the tokens contained by calling Stream.process.

You put these classes in one python file, import it into your main code and you just need to create an instance of Stream to parse it and process it.

Something like this:

#! /usr/bin/python3.2

import re

class Sequence:
    def __init__ (self, raw): self.__raw = raw
    def __str__ (self): return 'Sequence {}'.format (self.__raw)
    def process (self): print ('Wolfing sequence {}'.format (self.__raw) )

class Option:
    def __init__ (self, raw): self.__raw = raw
    def __str__ (self): return 'Option {}'.format (self.__raw)
    def process (self): print ('Foxing option {}'.format (self.__raw) )

class Range:
    def __init__ (self, raw): self.__raw = raw
    def __str__ (self): return 'Range {}'.format (self.__raw)
    def process (self): print ('Bearing range {}'.format (self.__raw) )

class Slice:
    def __init__ (self, raw): self.__raw = raw
    def __str__ (self): return 'Slice {}'.format (self.__raw)
    def process (self): print ('Badgering slice {}'.format (self.__raw) )


class Stream:
    patterns = [ ('([0-9a-fA-F]+)', Sequence),
        ('(\\([0-9a-fA-F]+\\|[0-9a-fA-F]+\\))', Option),
        ('({[0-9a-fA-F]+})', Range),
        ('(\\[[0-9a-fA-F]+:[0-9a-fA-F]+\\])', Slice) ]

    def __init__ (self, stream):
        self.__tokens = []
        while stream:
            found = False
            for pattern, cls in self.patterns:
                m = re.match (pattern, stream)
                if m:
                    m = m.groups () [0]
                    self.__tokens.append (cls (m) )
                    stream = stream [len (m):]
                    found = True
                    break
            if not found: raise Exception ('Unrecognized sequence')

    @property
    def tokens (self): return (token for token in self.__tokens)

    def process (self):
        for token in self.__tokens: token.process ()

stream = Stream ('524946(46|58){4}434452[22:33]367672736E')
print ('These are the tokens:')
for idx, token in enumerate (stream.tokens):
    print ('{} at position {}.'.format (token, idx) )

print ('\nNow let\'s process them all:')
stream.process ()

This yields:

These are the tokens:
Sequence 524946 at position 0.
Option (46|58) at position 1.
Range {4} at position 2.
Sequence 434452 at position 3.
Slice [22:33] at position 4.
Sequence 367672736E at position 5.

Now let's process them all:
Wolfing sequence 524946
Foxing option (46|58)
Bearing range {4}
Wolfing sequence 434452
Badgering slice [22:33]
Wolfing sequence 367672736E

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Considering the following: import re sequence = ‘FFFFFF{7}FFFFFF’ patterns = [ (‘([0-9a-fA-F]+)’, ‘Sequence’), (‘(\\([0-9a-fA-F]+\\|[0-9a-fA-F]+\\))’,

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply