I needed a utililty function earlier today to strip some data out of a file and wrote an appaling regular expresion to do it. The input was a file with lots of line with the format:
<address> <11 * ascii character value> <11 characters> 00C4F244 75 6C 74 73 3E 3C 43 75 72 72 65 ults><Curre
I wanted to strip out everything bar the 11 characters at the end and used the following expression:
'^[0-9A-F+]{8}[\\s]{2}[0-9A-F\\s]{34}'
This matched to the bits I didn’t want which I then removed from the original string. I’d like to see how you’d do this but the particular areas I couldn’t get working were:
1: having the regex engine return the characters I wanted rather than the characters I didn’t and
2: finding a way of repeating the match on a single ascii value followed by the space (eg ’75 ‘ = [0-9A-F]{2}[\s]{1}?) and repeating that 11 times rather than grabbing 34 characters.
Looking at it again the easiest thing to do would be to match to the last 11 characters of each input line but this isn’t very flexible and in the interests of learning regex I would like to see how you can match through from the start of the sequence.
Edit: Thanks guys, this is what I wanted:
'(?:^[0-9A-F]{8} )(?:[0-9A-F]{2} ){11} (.*)'
Wish I could turn more than one of you green.
1) ^[0-9A-F+]{8}[\s]{2}[0-9A-F\s]{34}(.*)
Parens are used for grouping with extraction. How you retrieve it depends on your language context, but now some sort of $1 is set to everything after the initial pattern.
2) ^[0-9A-F+]{8}[\s]{2}(?:[0-9A-F\s]){11}\s(.*)
(?:) is grouping without extraction. So (?:[0-9A-F\s]){11} considers the subpattern there as a unit and looks for it repeated 11 times.
I’m assuming PCRE here, by the way.