I have a text file with multiple occurrences of tables like show below:
_____________________________________
Heading 1 | Heading 2
_______________ | ___________________
Label1 18857.10 | Label3 710.00
Label2 2361.50 | Label4 0.00
| Label5 2531.37
| Label6 0.00
| Label7 0.00
| Label8 0.01
________________| ___________________
16495.60 | Label9 3969.06
_______________ | ___________________
I want to store the numerical values into variables using regular expressions. Since I’m new to regular expressions, I couldn’t find a way to do it. Can anyone help me with this?
If your table is exactly what you showed, this works.
regex:
/(\w+) (\d+(\.\d+)?)/Slashes
/at the begining and end are delimiting the regex.(\w+)means, “match any letter,number or underscore once or more timesone space follows, you can add + after the space, to match more then one, or put \s instead of space, to match any white character, like tab for example..
(\d+(\.\d+)?)…\d+means one or more digits,(\.\d+)means dot followed by one or more digits, question mark means that the previous parenthesis(\.\d+)is optional.Preg_match_all stores those matches in third parameter and returns number of matches. In
$result[$i][0]is the whole match,$result[$i][1]is first sub-expression(\w+),$result[$i][2]is second parenthesis(\d+(\.\d+)?),$result[$i][3]is the decimal part(\.\d+), it is inside$result[$i][2], but you don’t need$result[$i][3], just for explanation 🙂The code prints:
edit: sorry, it doesn’t work, it didn’t match that naked 16495.60 value. Let me think a bit more…
…
is bit better, here’s how it works:
[a-zA-Z0-9]+matches non-zero ammount of letters or numbers?after parenthesis means, the whole parenthesis expression is optional.+one or more spaces(\d+(\.\d+)?)non-zero ammount of digits followed by optional { dot and another non-zero ammount of digits }This whole regex does not include
|or new-line, so all matching should happen in only one field of the table.The result variable should be:
edit2: GRAB THOSE SNIPPETS AGAIN! There should be a backslash before the dot, in
(\.\d+)!!! I formated it wrong and it disappeared.** Rewrote it, should be fine now.