I’m working on project. One part of it is read given folders files.
Program travels into deep and collects filenames and other info which i wrap into my own DFile class, and puts it into collection for further work.
It worked when was singlethreaded (using recursive read), but I want to do that in multithreading perspective, ignoring the thing that disk IO and multithreading won’t increase performance. I want it for learning purpose.
So far, I’ve been jumping from one decision to another, changing plans how it will be and can’t get it good. Your help would be appreciated.
What I want, that I supply root folder name, and my program runs several minithreads (user defined number of threads for this purpose), each thread reads given folders content:
– When it finds file, wraps it into DFile and puts into shared between threads collection
– When it finds folder, puts folder (as File object) into jobQueue, for other available thread to take work on it.
I can’t get this system correctly. I’ve been changing code, puting idea what classes should be from one class with static collections to many.
So far few classes I am listing here:
DirectoryCrawler http://pastebin.com/8tVGpGT9
Won’t publish rest of my work (maybe in other topic, because purpose of the program absolutely not covered here). Program should read folder and make a list of files in it, then sort it (where I’ll probably use multithreading too), then search for same hashed files and there’s constantly working thread for writing those equal file groups into result file. I don’t need to gain any performance, files gonna be small, as at first I was working on speed, I don’t need it now.
Any help regarding design of reading would be appreciated
EDIT:
So much of headache :((. Doesn’t work correctly 🙁 Here so far:
crawler (like a minithread for reading one folder, found files goes to fileList which is in other class, and folders to queue) pastebin. com/AkJLAUhD
scanner class (Don’t even know should it be runnable or no). DirectoryScanner (main, should control crawlers, hold main filelist) pastebin. com/2abGMgG9 .
DFile itself pastebin. com/8uqPWh6Z (something became wrong with hashing, now when sorting all get same hash.. worked .. (hashing is for other task unrelated)) .
Filelist past ebin. com/Q2yM6ZwS
testcode:
DirectoryScanner reader = new DirectoryScanner(4);
for (int i = 0; i < 4; i ++) {
reader.runTask(new DirectoryCrawler("myroot", reader));
}
try {
reader.kill();
while (!reader.isDone()) {
System.out.println("notdone");
}
reader.getFileList().print();
}
myroot is a folder with some files for test
Anything, i can’t even think of should scanner be itself runnable, or only crawlers. Because while scanning I actualy don’t want to start doing other stuff like sorting (because nothing to sort while not gathered all files) ..
You need the Executor threadpool and some classes:
A Fsearch class. This contains your container for the results. It also has a factory method that returns an Ffolder, counting up a ‘foldersOutstanding’ counter, and an OnComplete that counts them back in by counting down ‘foldersOutstanding’:
You need a Ffolder class to represent a folder and is passed its path as ctor parameter. It should have a run method that iterates is folder path that is supplied as a parameter along with the Fsearch instance.
Create and load up an Fsearch with the root folder and fire it into the pool. It creates a folder class, passing its root path and itslef, and loads that on. Then it waits on a ‘searchComplete’ event.
That first Ffolder iterates its folder, creating, (or depooling), DFiles for each ‘ordinary’ file and pushing them into the Fsearch container. If it finds a folder, it gets another Ffolder from the Fsearch, loads it with the new path and loads that onto the pool as well.
When an Ffolder has finished iterating its own folder, it calls the OnComplete’ method of the Fsearch. The OnComplete is counting down the ‘foldersOutstanding’ and, when it is decremented to zero, all the folders have been scanned and files processed. The thread that did this final decrement signals the searchComplete event so that the Fsearch can continue. The Fsearch could call some ‘OnSearchComplete’ event that is was passed when it was created.
It goes almost without saying that the Fsearch callbacks must be thread-safe.
Such an exercise does not have to be academic.
The container in the Fsearch, where all the DFiles go, could be a producer-consumer queue. Other threads could start processing the DFiles as the search is in progress, instead of waiting until the end.
I have done this before, (but not in Java), – it works OK. A design like this can easily do multiple searches in parallel – it’s fun to issue an Fsearch for several hard drive roots at once – the clattering noise is impressive
Forgot to say – the big gain from such a design is when searching several networked drives with high latency. They can all be searched in parallel. The speedup over a miserable single-threaded sequential search is many times. By the time a single-thread seach has finished queueing up the DFiles for one drive, the multi-search has searched four drives and already had most of its DFiles processed.
NOTE:
1) If implemented strictly as above, the threadpool thread taht executes the FSearch is blocked on the ‘OnSearchComplete’ event until the search is over, so ‘using up’ one thread. There must therefore be more threadpool threads than live Fsearch instances else there will be no threads left over to do the actual searching, (yes, of course that happened to me:).
2) Unlike a single-thread search, results don’t come back in any sort of predictable or repeatable order. If, for example, you signal your results as they come in to a GUI thread and try to display them in a TreeView, the path through the treeview component will likely be different for each result, updating the visual treeview will be lengthy. This can result in the Windows GUI input queue getting full, (10000 limit), because the GUI cannot keep up or, if using object pools for the Ffolder etc, the pool can empty, slugging performance and, if the GUI thread tries to get an Ffolder to issue a new search from the empty pool and so blocks, all-round deadlock with all Ffolder instances stuck in Windows messages, (yes, of course that happened to me:). It’s best to not let such things happen!
Example – something like this I found – it’s quite old Windows/C++ Builder code but it still works – I tried it on my Rad Studio 2009 , removed all the legacy/proprietary gunge and added some extra comments. All it does here is count up the folders and files, just as an example. There are only a couple of ‘runnable’ classes The myPool->submit() methods loads a runnable onto the pool and it’s run() method gets executed. The base ctor has an ‘OnComplete’ EventHander, (TNotifyEvent), delgate parameter – that gets fired by the pool thread when the run() method returns.
//******************************* CLASSES ********************************
//******************************* METHODS ********************************
..and here it is, working: