I have some files to scan with patterns using preg_match like:
File name:
(a group: one)
one.txt
(another group: one-aaa)
one-aaa.txt
one-aaa_1.txt
one-aaa_b.txt
one-aaa_3.txt
one-aaa_whatever.txt
(some other group: one-bbb)
one-bbb.jpg
one-bbb_1.txt
one-bbb_2.txt
one-bbb_t.txt
one-bbb_whatever.txt
The group is defined by names (hence: one, one-aaa, one-bbb are different groups), and limited to file .txt.
Please do not suggest to use different directories. Those files are already scattered at some directories, I need a way to find matches by keyword, not directories.
Now I can define groups manually by specifying “one”, “one-aaa”, etc, but have trouble with preg_match. My preg_match returns “one” and “one-aaa” as a single group:
$keyword = 'one';
$match = '/(^)' . $keyword . '(.*\.txt$)/';
$match = '/\b(' . $keyword . ')\b(.*\.txt$)/';
The expected return:
one.txt
Unexpected returns:
one.txt
one-aaa.txt, etc
UPDATE 1:
When keyword changed to ‘one-aaa’, I want it return: one-aaa.txt, one-aaa_1.txt, and the likes.
The way I group is:
$keyword = str_replace('_', ' ', $file->name);
returns: one, one-aaa, one-bbb, etc
What I want to say in plain English:
- find matches that start with “one”, returns: one_1.txt, one_2.txt
- find matches that start with “one-aaa”, returns: one-aaa_1.txt, one-aaa_2.txt, etc
Can anyone shed the light on the correct regex?
Thanks
UPDATE 2:
Somebody here previously provided suggestion to avoid greedy regex, and use .*? instead, but the answer was deleted. It finally works this way as per his suggestion:
$match = '/^\b(' . $keyword . ')\b(.*?.txt$)/';
Who should I assign an answer to now? Can anyone volunteer to write a working answer like above, or a betterment of it?
UPDATE 3:
Oops, I talked too soon. It didn’t work, but the key was reset somewhow when I change key|value pairs thats why I lost track of the double inclusion. Sorry the above still no go.
UPDATE 4:
I finally made it with additional condition to simply exclude the output, if they don’t match the group. Extra codes and also extra scanning, bad, but at least it works as expected by now. Still using the suggested regex above.
Still looking for ultimate regex solution, if any. If no, then “no” should be the chosen answer
Thanks
Of course – the “.*” lets those other characters in. change it to:
“.*” means any character, appearing 0 times or more…
edit:
After seeing your updates, assuming one_10 or one_100 can also exist.
You can try:
$match = '/^' . $keyword . '(_[0-9]+)?\.txt$/';This means that after the keyword may come an underscore with a number following.