Let’s say for an instance I have this string:
var a=23434,bc=3434,erd=5656,ddfeto='dsf3df34dff3',eof='sdfwerwer34',wer=4554;
How should I match all the initializations assigned as integer? Here’s my current try, but I don’t understand why it’s matching everything.
$pattern = '/var (.*=\d)/';
preg_match_all($pattern,$page,$matches);
EDIT: I’m trying to match each initialization:
1 => a=23434
2 => bc=3434
and so on…
EDIT: Here’s an update on my try:
$pattern = '/[^v^a^r] (.*=\d+),/';
preg_match_all($pattern,$page,$matches);
0 => 'var a=23434,bc=3434,erd=5656,'
1 => 'a=23434,bc=3434,erd=5656'
The function is using “greedy” matching. You don’t want that. In PHP, you can either follow your wildcard with a
?to specify non-greedy matching, as in:or using the
Uflag as documented here, as in:which will make all wildcards use non-greedy matching.
EDIT: Also, since you’re including “var”, you would probably need to change it to
or
to match any number of
(.*=\d)patterns.EDIT: Update per discussion:
PHP
Produces
Note: This filters out the entries that have the RHS enclosed in single quotes. If you don’t want that, let us know.
EDIT #2: My answer to your question exceeded the size of the comment box so I edited my answer.
The
[a-zA-z]expression matches only alphabetical characters of either case. Note that the updated code also removed the “ungreedy” modifier, so we actually want it to be greedy now. And since we want it to be greedy, the.will “eat” too much. Go ahead, play around with the code, see what happens when you change it to.*it is a good opportunity to get more familiar with regex.Since the
.“eats” too much, we need to restrict it from matching all characters to matching the ones we want. We could have used something likewhere the
[^\s,]*would match any number of non-whitespace, non-comma characters. This would also have worked for your test cases.But in this case, we can say confidently what the characters we want to include are, so instead of “blacklisting” characters, we’ll “whitelist” them. In this case we specify that we want to match any alphabetical character of either case.
As is the case with many things, especially in programming, there are many ways to skin a cat. There are a number of alternative regex patterns that would have also worked for your test cases. Its up to you to understand the limits of each, how they will perform on edge cases, and how maintainable they are, and make a decision.