This is the string that I want to parse: 2 Sep 27 Sep 28 SOME TEXT HERE 35.00
I want to parse it into a list so that the values look like:
list[0] = 'Sep 28'
list[1] = 'SOME TEXT HERE'
list[2] = '35.00'
The RegEx that I’ve been working on:
^\d{1}\s{1}[a-zA-Z]{3}\s{1}\d{2}\s{1}([a-zA-Z]{3}\s{1}\d{2})\s{1}([a-zA-Z0-9]*\s{1})+(\d+.\d+)
My values are:
list[0] = 'Sep 28'
list[1] = 'HERE'
list[2] = '35.00'
The list[1] value is off. I’m also probably not parsing the spaces right, but I couldn’t find any guidance in the “Pickaxe” book or online.
Your problem is in your second capture group:
The parenthesized group is repeated, matching each of the words
'SOME','TEXT', and'HERE'individually, leaving your second capture group with only the final match,'HERE'.You need to put the
+inside the capturing parenthesized groups, and use non-capturing parentheses(?:...)to enclose your existing group. Non-capturing parentheses, which use(?:to start the group and)to end the group, are a way in a regular expression to group parts of your match together without capturing the group. You can use repetition operators (+,*,{n}, or{n,m}) on a non-capturing group and then capture the entire expression:In total:
As a side note, this is a pretty clunky regex. You never really need to specify
{1}in a regex as a single match is the default. Similarly,\d\dis one character less typing than\d{2}. Also, you probably just want\winstead of[a-zA-Z0-9]. Since you don’t seem to care about case, you probably just want to use the/ioption and simplify the letter character classes. Something like this is a more idiomatic regular expression:Finally, though the Ruby documentation for regular expressions is a little thin, Ruby uses somewhat standard Perl-compatible regular expressions, and you can find more information about regular expressions generally at regular-expressions.info