Let’s say I have a string which contains a Unix-style local path to a file like in following examples:
String s1 = "something something ./files/icon.gif";
String s2 = "The files are texts/text1.txt and texts/text2.txt";
String s3 = "<img src="images/img/run.png" alt="" />"
So, I’d need to extract only filepaths:
"./files/icon.gif"
"texts/text1.txt", "texts/text2.txt"
"images/img/run.png"
I’ve come up with the following regex:
\.?[[a-zA-Z0-9]*/]+\.[a-zA-Z0-9]+
And it does the job for these test cases.
Now, my worries are that this could pull out other text which is not a filepath and only looks like one because it has slashes and dots in the right places.
Is there a better way to handle this problem (possibly even without using regular expressions)?
You can’t do it. Unix file names can contain literally anything except for NULs and
/s, so any string with no embedded NULs is a valid path. See:So all your strings are valid file paths. If you want to extract everything that looks like “reasonable” paths, then you must define “reasonable” first and even then you’ll probably fail because of something like “TCP/IP” in the source text.