I have no experience with regular expressions and would love some help and suggestions on a possible solution to deleting parts of file names contained in a csv file.
Problem:
A list of exported file names contains a random unique identifier that I need isolated. The unique identifier has no predictable pattern, however the aspects which need removing do. Each file name ends with one of the following variations:
V, -V, or %20V followed by a random number sequence with possible spaces, additional “-“,”” and ending with .PDF
examples:
GTD-LVOE-43-0021 V10 0.PDF
GTD-LVOE-43-0021-V34-2.PDF
GTD-LVOE-43-0021_V02_9.PDF
GTD-LVOE-43-0021 V49.9.PDF
Solution:
My plan was to write a script to select of the first occurrence of a V from the end of the string and then delete it and everything to the right of it. Then the file names can be cleaned up by deleting any “-” or “_” and white space that occurs at the end of a string.
Question:
How can I do this with a regular expression and is my line of thinking even close to the right approach to solving this?
REGEX:
[\s\-_]V.*?\.PDFMight do the trick. You’d still need to replace away any leading – and _, but it should get you down the path, hopefully.
This would read as follows..
start with a
whitespace,-OR_followed by aV. Then take everything until you get to the first.PDF