I have the following code:
/* record 863.content.en */
UPDATE language_def
SET en='<html>blah blah markup</html>'
WHERE page_id=863,
AND string_id='content';
/* record_end 863.content.en */
I would like to create an expression to match that statement where:
- the data in between the periods of
863.content.enare variable BUT SPECIFIC (there will be many of these statements in a row) - the data in between the two comments is variable but NOT specific
This is what I have so far:
'[/*]\s*record\s*specific_number[.]specific_string1[.]specific_string2\s*[*/].*[/*]\s*record_end\s*specific_number[.]specific_string1[.]specific_string2\s*[*/]'
There are a few problems with your regex.
First of all, as FrankeTheKneeMan pointed out, you need delimiters.
#is a good choice for HTML matches (the standard choice is/but that interferes with tags too often):Now while
[.]is a nice way of escaping a single character, it doesn’t work the same for[/*]. This is a character class, that matches either/or*. Same for[*/]. Use this instead:Now
.*is the remaining problem. Actually there are too, one is critical, the other might not be. The first is that.does not match line breaks by default. You can change this by using thes(singleline) modifier. The second is, that*is greedy. Should a section appear twice in the string, you would get everything from the first corresponding/* recordto the last corresponding/* record_end, even if there is unrelated stuff in between. Since your records seem to be very specific, I suppose this is not the case. But still it is generally good practice, to make the quantifier ungreedy, so that it consumes as little as possible. Here is your final regex string:For your presented example, this is
If you want to find all of these sections, then you can make
863,contentandenvariable, capture them (using parentheses) and use a backreference to make sure you get the correspondingrecord_end: