I am searching for an efficient way to test if files exists which have a file-name of a certain pattern.
Examples using wildcards:
- ????.*
- ???????.*
- *.png
- *.jpg
Examples using regular expressions:
- [012]{4}.*
- [012]{7}.*
The problem is that the directory I have to test contains up to 500.000 files.
The only way I know to perform such tests is to use the methods of the File class:
String[] list()
String[] list(FilenameFilter filter)
File[] listFiles()
File[] listFiles(FileFilter filter)
File[] listFiles(FilenameFilter filter)
The problem is that basically they are all implemented the same way: First the call list() for getting all available files and the they apply the filter on it.
Please imagine yourself what happens if we want to apply this on a folder containing 500.000 files…
If there any alternative in Java for retrieving the filename of the first matching file regarding files in a directory without having to enumerate all of them?
If JNI is the only option – is there a library can do this that comes with pre-compiled binaries for the six major platforms (Linux, Windows and OSX each 32 and 64 bit)?
I think that you are confused. As far as I know, no current OS supports pattern listing/searching in its filesystem interface. All utilities that support patterns do so by listing the directory (e.g. by using
readdir()on POSIX systems) and then performing string matching.Therefore, there is no generic low-level way to do that more efficiently in Java or any other language. That said, you should investigate at least the following approaches:
making sure that you only retrieve the file names and that you do not probe the file nodes themselves for additional metadata (e.g. their size), as that would cause additional operations for each file.
retrieving the file list once and caching it, perhaps in association with a filesystem event notification interface for updates (e.g. JNotify or the Java 7 WatchService interface).
EDIT:
I had a look at my Java implementation. The only obvious drawback in the methods of the
Fileclass is that listing a directory does not stop once a match is found. That would only matter, however, if you only perform the search once – otherwise it would still be far more efficient to cache the full directory list.If you can use a relatively recent Java version, you might want to have a look at the Java NIO classes (1, 2) which do not seem to have the same weakness.