I’m trying to learn Regular Expressions. I know the basics, and I’m not terrible at regex, I’m just no pro – hence I’ve got a question for you guys. If you know regex, I bet it’ll be simple.
What I’ve got currently is this:
/(\w+)\s-{1}\s(\w+)\.{1}(\w{3,4})/
What I’m trying to do is create a little script for myself that tidies up my music collection by formatting all of the filenames. I know there’s other stuff out there already but this is a learning experience for me. I already screwed up all the titles once by replacing things like “Hell Aint A Bad Place To Be” with “Hell Aint a Bad Place To Be”. In my wisdom I somehow ended up with “Hell Aint a ad Place to be” (I was looking a A followed by a space and an uppercase character). Obviously that was a nightmare to fix and it had to be done manually. Needless to say I’m testing samples first now.
Anyway, the above regex is sort of a stage 1 of many. Eventually I want to build it up, but for now I just need to get the simple bits working.
In the end I’d like to turn:
"arctic Monkeys- a fake tales of a san francisco"
into
"Arctic Monkeys - A Fake Tales of a San Francisco"
I know I’ll need lookbehind assertions to grab when you’re after a ‘-‘, because if the first word is ‘a’, ‘of’ etc. which I’d normally lowercase, I need to uppercase them (the above is a bad example for this use case I know).
Any way of fixing the existing regular expression would be great, and and tips on where to look on my cheatsheet to finish the rest off would be great (I’m not looking for a fully-fledged answer, since I need to learn to do it myself, I just can’t figure why w+ is only getting one word).
I believe there is a much simpler way of approaching this problem: split the string into words, based on a much simpler regex, and then apply whatever processing you want to those words. This will allow you to perform more complicated transformations on the text in a much cleaner way. Here’s an example:
Here’s an example of this code running: http://codepad.org/t6pc8WpR