I have a question about regex/Python. Sorry if this topic has been discussed millions of times – usually I find the answers on so/google etc. but I’m stuck in the millions of answers with this one.. (To be honest – I own a regex book, but somehow I’m too stupid to really understand it…)
For a music-management-system I need to extract information out of paths, providing different sets of options. Here two examples:
If the path is: (Case 1)
"/The Prodigy/The Fat Of The Land/04 - Funky Stuff.flac"
it should extract:
- artist: “The Prodigy”
- release: “The Fat Of The Land”
- Tracknumber: 4
- Title: “Funky Stuff”
And for eg: (Case 2)
"/[XLR 483] The Fat Of The Land/04 - The Prodigy - The Funky Stuff.flac"
should extract:
- catno: “XLR 483”
- release: “The Fat Of The Land”
- Tracknumber: 4
- artist: “The Prodigy”
- Title: “Funky Stuff”
There is no need for a regex that covers both cases, these are just two examples. I’ll then provide them as options (or starting-point to add own ones).
Any help would be greatly appreciated!
@ S.Lott: I don’t have a regex for this, I started with splitting the string:
parts = rel_path.split('/')
track = parts[-1]
release = parts[-2]
artist = parts[-3]
but this looks like an extremely inflexible and un-elegant solution to me.
edit:
So far I have something like:
pattern = re.compile('^/(?P<artist>[a-zA-Z0-9 ]+)/(?P<release>[a-zA-Z0-9 ]+)/(?P<track>[a-zA-Z0-9 -_]+).[a-zA-Z]*.*')
rel_path = '/The Prodigy/The Fat Of The Land/04 - Funky Stuff.flac'
match = pattern.search(rel_path)
artist = match.group('artist')
release = match.group('release')
track = match.group('track')
Although not necessary, but re is handy choice for this problem.
I use expressions such as
[a-zA-Z0-9 ]to explicitly specify the chars I expect in the string. It is just my preference to have a white-list-like regex to make the code more secure. There are many other ways to compose equivalent patterns. You will find all you need here http://docs.python.org/library/re.html, you don’t need a book for that.