Basically I have some filenames where there is a year in the middle. I am only interested in getting any letter or number up until the year value, but only letters and numbers, not commas, dots, underscores, etc. Is it possible? Maybe with Regex?
For instance:
"A-Good-Life-2010-For-Archive"
"Any.Chararacter_Can+Come.Before!2011-RedundantInfo"
"WhatyouseeIsWhatUget.2012-Not"
"400-Gestures.In1.2000-Communication"
where I want:
"AGoodLife"
"AnyChararacterCanComeBefore"
"WhatyouseeIsWhatUget"
"400GesturesIn1"
By numbers I mean any number that doesn’t look like a year, i.e. 1 digit, 2 digits, 3 digits, 5 digits, and so on. I only want to recognize 4 digit numbers as years.
You’ll have to do this in two parts — first to remove the symbols you don’t want, and second to grab everything up to the year (or vice versa).
To do grab everything up to the year, you can use:
I’ve made the year regex so it only matches things in the 1900s or 2000s, to make sure you don’t match four-digit numbers as year if they’re not a year (e.g. “Ali-Baba-And-the-1234-Thieves.2011”).
However, if your movie title involves a year, then this won’t really work (“2001:-Space-Odyssey(1968)”).
To then replace all the non-characters, you can replace
"[^a-zA-Z0-9]"with"". (I’ve allowed digits because a movie might have legitimate numbers in the title).UPDATED from comments below:
.*?to.*in the regex so that the title is as greedy as possible and only uses the last year-candidate as the year.(?!\d)to the end of the year regex and a(?<!\d)to the start so that it doesn’t match “My-title-1” instead of “My-title-120012-fdsa” & “2001” in “My-title-120012-fdsa” (I didn’t add the boundary\bbecause the title might be “A-Good-Life2010” which has no boundary around the year).@"...") so I don’t need to worry about escaping backslashes in the regex because of C# interpreting backslashes.