I guess this is more or less a two-part question, but here’s the basics first: I am writing some PHP to use preg_match_all to look in a variable for strings book-ended by {}. It then iterates through each string returned, replaces the strings it found with data from a MySQL query.
The first question is this: Any good sites out there to really learn the ins and outs of PCRE expressions? I’ve done a lot of searching on Google, but the best one I’ve been able to find so far is http://www.regular-expressions.info/. In my opinion, the information there is not well-organized and since I’d rather not get hung up having to ask for help whenever I need to write a complex regex, please point me at a couple sites (or a couple books!) that will help me not have to bother you folks in the future.
The second question is this: I have this regex
"/{.*(_){1}(.*(_){1}[a-z]{1}|.*)}/"
and I need it to catch instances such as {first_name}, {last_name}, {email}, etc. I have three problems with this regex.
The first is that it sees “{first_name} {last_name}” as one string, when it should see it as two. I’ve been able to solve this by checking for the existence of the space, then exploding on the space. Messy, but it works.
The second problem is that it includes punctuation as part of the captured string. So, if you have “{first_name} {last_name},”, then it returns the comma as part of the string. I’ve been able to partially solve this by simply using preg_replace to delete periods, commas, and semi-colons. While it works for those punctuation items, my logic is unable to handle exclamation points, question marks, and everything else.
The third problem I have with this regex is that it is not seeing instances of {email} at all.
Now, if you can, are willing, and have time to simply hand me the solution to this problem, thank you as that will solve my immediate problem. However, even if you can do this, please please provide an lmgfty that provides good web sites as references and/or a book or two that would provide a good education on this subject. Sites would be preferable as money is tight, but if a book is the solution, I’ll find the money (assuming my local library system is unable to procure said volume).
Back then I found PHP’s own PCRE syntax reference quite good: http://uk.php.net/manual/en/reference.pcre.pattern.syntax.php
Let’s talk about your expression. It’s quite a bit more verbose than necessary; I’m going to simplify it while we go through this.
A rather simpler way of looking at what you’re trying to match: “find a
{, then any number of letters or underscores, then a}“. A regular expression for that is (in PHP’s string-y syntax):'/\{[a-z_]+\}/'This will match all of your examples but also some wilder ones like
{__a_b}. If that’s not an option, we can go with a somewhat more complex description: “find a{, then a bunch of letters, then (as often as possible) an underscore followed by a bunch of letters, then a}“. In a regular expression:/\{([a-z]+(_[a-z]+)*\}/This second one maybe needs a bit more explanation. Since we want to repeat the thing that matches
_foosegments, we need to put it in parentheses. Then we say: try finding this as often as possible, but it’s also okay if you don’t find it at all (that’s the meaning of*).So now that we have something to compare your attempt to, let’s have a look at what caused your problems:
{}, including}and{and a whole bunch of other things. In other words,{abcde{_fgh}would be accepted by your regex, as would{abcde} fg_h {ijkl}._in there, right after the first.*. The(_){1}(which means exactly the same as_) says: whatever happens, explode if this ain’t here! Clearly you don’t actually want that, because it’ll never match{email}.Here’s a complete description in plain language of what your regex matches:
{._._._and the single letter, absolutely anything is okay, too.}.This is probably pretty far from what you wanted. Don’t worry, though. Regular expressions take a while to get used to. I think it’s very helpful if you think of it in terms of instructions, i.e. when building a regular expression, try to build it in your head as a “find this, then find that”, etc. Then figure out the right syntax to achieve exactly that.
This is hard mainly because not all instructions you might come up with in your head easily translate into a piece of a regular expression… but that’s where experience comes in. I promise you that you’ll have it down in no time at all… if you are fairly methodical about making your regular expressions at first.
Good luck! 🙂