I am battling regular expressions now as I type.
I would like to determine a pattern for the following example file: b410cv11_test.ext. I want to be able to do a search for files that match the pattern of the example file aforementioned. Where do I start (so lost and confused) and what is the best way of arriving at a solution that best matches the file pattern? Thanks in advance.
Further clarification of question:
I would like the pattern to be as follows: must start with ‘b’, followed by three digits, followed by ‘cv’, followed by two digits, then an underscore, followed by ‘release’, followed by .’ext’
Now that you have a human readable description of your file name, it’s quite straight forward to translate it into a regular expression (at least in this case 😉
The caret (
^) anchors a regular expression to the beginning of what you want to match, so your re has to start with this symbol.Any non-special character in your re will match literally, so you just use ‘b’ for this part:
^b.This depends a bit on which flavor of re you use:
The most general way of expressing this is to use brackets (
[]). Those mean ‘match any one of the characters listed within.[ASDF]for example would match eitherAorSorDorF,[0-9]would match anything between 0 and 9.Your re library probably has a shortcut for ‘any digit’. In
sedandawkyou could use[[:digit:]][sic!], in python and many other languages you can use\d.So now your re reads
^b\d.The most simple way to express this would be to just repeat the atom three times like this:
\d\d\d.Again your language might provide a shortcut: braces (
{}). Sometimes you would have to escape them with a backslash (if you are using sed or awk, read about ‘extended regular expressions’). They also give you a way to say ‘at least x, but no more than y occurances of the previous atom’:{x,y}.Now you have:
^b\d{3}Literal matching again, now we have
^b\d{3}cvWe already covered this:
^b\d{3}cv\d{2}.Again, this should all match literally, but the dot (
.) is a special character. This means you have to escape it with a backslash:^\d{3}cv\d{2}_release\.extLeaving out the backslash would mean that a filename like ‘b410cv11_test_ext’ would also match, which may or may not be a problem for you.
Finally, if you want to guarantee that there is nothing else following ‘.ext’, anchor the re to the end of the thing to match, use the dollar sign (
$).Thus the complete regular expression for your specific problem would be:
Easy.
Whatever language or library you use, there has to be a reference somewhere in the documentation that will show you what the exact syntax in your case should be. Once you have learned to break down the problem into a suitable description, understanding the more advanced constructs will come to you step by step.