I’ve been trying to come up with a regular expression that would filter out all valid Unix paths from a given text but would not match any URL (such as http://...)
The following paths are all valid:
/home/username/some_file.txt
/home/username/some_file.longext
"/path/to/file/some file.longext"
But it should not match any of these:
http://www.somelink.com
ftp://www.somelink.co.uk
https://www.somelink.com and so on
I came up with this, but it matches all URLs too, which is something I’m trying to filter out:
"?[a-zA-Z0-9\/].*\.[a-zA-Z0-9].*"?
EDIT:
I should mention the input text is actually contents from a file with URLs inside as well as valid Unix Paths so the regex needs to be able to match on any path anywhere inside the text apart from matching URLs.
It seems as simple as trying to match a slash at the beginning of the string, assuming that your paths are absolute and that there is no need to check if path exists, it’s readable or similar. It should begin like
^"?/. That will be enought to filter out URLs.