Considering the following:
import re
sequence = 'FFFFFF{7}FFFFFF'
patterns = [ ('([0-9a-fA-F]+)', 'Sequence'),
('(\\([0-9a-fA-F]+\\|[0-9a-fA-F]+\\))', 'Option'),
('({[0-9a-fA-F]+})', 'Range'),
('(\\[[0-9a-fA-F]+:([0-9a-fA-F]+|\*)\\])', 'Slice'),
('(\\?\\?)+', 'Byte_value_Wildcard'),
('(\\*)+', 'Byte_length_wildcard') ]
fragment_counter = 0
fragment_dict= {}
fragments_list = []
while sequence:
found = False
for pattern, name in patterns:
m = re.match (pattern,sequence)
if m:
fragment_counter+=1
m = m.groups () [0]
fragment_dict["index"]=fragment_counter
fragment_dict["fragment_type"]=name
fragment_dict["value"]=m
print fragment_dict
fragments_list.append(fragment_dict)
sequence = sequence [len (m):]
found = True
break
if not found: raise Exception ('Unrecognized sequence')
print fragments_list
Each time it hits the “print fragment_dict”line, I get the correct (expected) output:
{'index': 1, 'fragment_type': 'Sequence', 'value': 'FFFFFF'}
{'index': 2, 'fragment_type': 'Range', 'value': '{7}'}
{'index': 3, 'fragment_type': 'Sequence', 'value': 'FFFFFF'}
however, the list item fragments_list is 3 copies of the final dict, not each of the lines as I expect:
[{'index': 3, 'fragment_type': 'Sequence', 'value': 'FFFFFF'}, {'index': 3, 'fragment_type': 'Sequence', 'value': 'FFFFFF'}, {'index': 3, 'fragment_type': 'Sequence', 'value': 'FFFFFF'}]
I am assuming this is because the append is referencing the instance of the dict, not copying each iteration of the dict. I looked at using the list() function, but on the dict item it just gives me a list of the dict keys.
What am I doing wrong?
I’m not wedded to data type, I just needed a way to hold 3 data elements (and maybe a 4th) for each fragment I find.
Nice to see that you are actually using the code of my answer to your last question. I am always happy when my answers do actually help.
Taking into consideration, that in your previous question you stated that this parsing was only the warm-up for afterwards processing the parsed tokens, you might consider this:
Create a class for each token type you have. Implement a
processmethod inside each of those, that will sometime do the actual processing (instead of the wolfing, bearing, foxing and badgering I do in the below code).Then parse a whole stream with the
Streamclass. You can iterate over the tokens viaStream.tokensand you can process all the tokens contained by callingStream.process.You put these classes in one python file, import it into your main code and you just need to create an instance of
Streamto parse it and process it.Something like this:
This yields: