I’m trying to learn Regular Expressions. I know the basics, and I’m not terrible

Question

0

Asked: June 12, 20262026-06-12T02:59:18+00:00 2026-06-12T02:59:18+00:00

I’m trying to learn Regular Expressions. I know the basics, and I’m not terrible

0

I’m trying to learn Regular Expressions. I know the basics, and I’m not terrible at regex, I’m just no pro – hence I’ve got a question for you guys. If you know regex, I bet it’ll be simple.

What I’ve got currently is this:

/(\w+)\s-{1}\s(\w+)\.{1}(\w{3,4})/

What I’m trying to do is create a little script for myself that tidies up my music collection by formatting all of the filenames. I know there’s other stuff out there already but this is a learning experience for me. I already screwed up all the titles once by replacing things like “Hell Aint A Bad Place To Be” with “Hell Aint a Bad Place To Be”. In my wisdom I somehow ended up with “Hell Aint a ad Place to be” (I was looking a A followed by a space and an uppercase character). Obviously that was a nightmare to fix and it had to be done manually. Needless to say I’m testing samples first now.

Anyway, the above regex is sort of a stage 1 of many. Eventually I want to build it up, but for now I just need to get the simple bits working.

In the end I’d like to turn:

"arctic Monkeys- a fake tales of a san francisco"

into

"Arctic Monkeys - A Fake Tales of a San Francisco"

I know I’ll need lookbehind assertions to grab when you’re after a ‘-‘, because if the first word is ‘a’, ‘of’ etc. which I’d normally lowercase, I need to uppercase them (the above is a bad example for this use case I know).

Any way of fixing the existing regular expression would be great, and and tips on where to look on my cheatsheet to finish the rest off would be great (I’m not looking for a fully-fledged answer, since I need to learn to do it myself, I just can’t figure why w+ is only getting one word).

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-12T02:59:20+00:00

I believe there is a much simpler way of approaching this problem: split the string into words, based on a much simpler regex, and then apply whatever processing you want to those words. This will allow you to perform more complicated transformations on the text in a much cleaner way. Here’s an example:

<?php

$song = "arctic Monkeys- a fake tales of a san francisco";

// Split on spaces or - (the - is still present
// because it's only a lookahead match)
$words = preg_split("/([\s]+|(?=-))/", $song);

/*
Output for print_r:
Array
(
    [0] => arctic
    [1] => Monkeys
    [2] => -
    [3] => a
    [4] => fake
    [5] => tales
    [6] => of
    [7] => a
    [8] => san
    [9] => francisco
)
*/
print_r($words);

$new_words = array();
foreach ($words as $k => $word) {
        $new_words[] = processWord($word, $k, $words);
}

// This will output:
// Arctic Monkeys - A Fake Tales of a San Francisco
echo implode(' ', $new_words);

// You can add as many processing rules you want in here - in a very clean way
function processWord($word, $idx, $words) {
        if ($words[$idx - 1] == '-') return ucfirst($word);
        return strlen($word) > 2 ? ucfirst($word) : $word;
}

Here’s an example of this code running: http://codepad.org/t6pc8WpR

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m trying to learn Regular Expressions. I know the basics, and I’m not terrible

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply