How come that Solution 2 is more efficient than Solution 1?
(The time is the average of 100 runs, and the total folders they go through is 13217)
// Solution 1 (2608,9ms)
let rec folderCollector path =
async { let! dirs = Directory.AsyncGetDirectories path
do! [for z in dirs -> folderCollector z]
|> Async.Parallel |> Async.Ignore }
// Solution 2 (2510,9ms)
let rec folderCollector path =
let dirs = Directory.GetDirectories path
for z in dirs do folderCollector z
I would have thought that Solution 1 would be faster because it’s async, and that I run it in Parallel. What am I’m missing?
As Daniel and Brian already clearly explained, your solution is probably creating too many short-lived asynchronous computations (so the overhead is more than the gains from parallelism). The
AsyncGetDirectoriesoperation also probably isn’t really non-blocking as it is not doing much work. I don’t see a truly async version of this operation anywhere – how is it defined?Anyway, using the ordinary
GetDirectories, I tried the following version (which creates only a small number of parallel asyncs):Calling a simple synchronous version after certain number of recursive calls is a common trick – it is used when parallelizing any tree-like structure that is very deep. Using
folderCollector path 2, this will start only tens of parallel tasks (as opposed to thousands), so it will be more efficient.On a sample directory I used (with 4800 sub-dirs and 27000 files), I get:
folderCollectorSync pathtakes 1 secondfolderCollector path 2takes takes 600ms (result is similar for any nesting between 1 and 4)