I’m at the moment playing with the MailboxProcessor. Therefore I made up a few agents that can crawl a directory on the computer, and all subdirectories – and then print the files in each directory:
let fileCollector =
MailboxProcessor.Start(fun self ->
let rec loop() =
async { let! file = self.Receive()
printfn "%s" file
return! loop() }
loop())
let folderCollector =
MailboxProcessor.Start(fun self ->
let rec loop() =
async { let! dir = self.Receive()
do! Async.StartChild(
async { let! files = Directory.AsyncGetFiles dir
for z in files do fileCollector.Post z }) |> Async.Ignore
return! loop() }
loop())
let crawler =
MailboxProcessor.Start(fun self ->
let rec loop() =
async { let! dir = self.Receive()
folderCollector.Post dir
do! Async.StartChild(
async { let! dirs = Directory.AsyncGetDirectories dir
for z in dirs do self.Post z }) |> Async.Ignore
return! loop() }
loop())
crawler.Post @"C:\Projects"
printfn "Done" // Message getting fired right away, due to the async stuff.
Now how would I tell when the folderCollector, fileCollector and crawler are done, so that the printfn statement at the end, would be called AFTER the crawler successfully have crawled all subdirectories and printed all files?
Update:
By using the technique showen by Tomas Petricek in http://tomasp.net/blog/parallel-extra-image-pipeline.aspx, I’ve managed to make up following code:
let folders = new BlockingQueueAgent<string>(100)
let files = new BlockingQueueAgent<string>(100)
let rec folderCollector path =
async { do! folders.AsyncAdd(path)
do! Async.StartChild(
async { let! dirs = Directory.AsyncGetDirectories path
for z in dirs do
do! folderCollector z }) |> Async.Ignore }
let fileCollector =
async { while true do
let! dir = folders.AsyncGet()
do! Async.StartChild(
async { let! fs = Directory.AsyncGetFiles dir
for z in fs do
do! files.AsyncAdd z }) |> Async.Ignore }
let rec printFiles() =
async { let! file = files.AsyncTryGet(75)
match file with
| Some s ->
printfn "%s" s
return! displayFiles()
| None -> () }
let cts = new CancellationTokenSource()
Async.Start(folderCollector @"C:\Projects", cts.Token)
Async.Start(fileCollector, cts.Token)
Async.RunSynchronously(printFiles(), cancellationToken = cts.Token)
printfn "DONE!"
Update: Update: Alright, so I’ve mixed up following code:
let folders = new BlockingQueueAgent<string option>(10)
let files = new BlockingQueueAgent<string option>(10)
let folderCollector path =
async { let rec loop path =
async { do! folders.AsyncAdd(Some path)
let! dirs = Directory.AsyncGetDirectories path
do! [ for z in dirs -> loop z ] |> Async.Parallel |> Async.Ignore }
do! loop path
do! folders.AsyncAdd(None) }
let rec fileCollector() =
async { let! dir = folders.AsyncGet 125
match dir with
| Some s ->
let fs = Directory.GetFiles s
do! [ for z in fs -> printfn "%s" z; files.AsyncAdd(Some z) ] |> Async.Parallel |> Async.Ignore // <-- Fails silence if files are full
do! fileCollector() // <-- unreachable
| None -> printfn "Done!"; ()}
That looks fine eh? For some reason at the do! fileCollector() line in the fileCollector() function, wont execute
if the files BlockingQueueAgent is full. Instead it fails silence.
However if I do:
let folderCollector path =
async { let rec loop path =
async { do! folders.AsyncAdd(Some path)
let! dirs = Directory.AsyncGetDirectories path
do! [ for z in dirs -> loop z ] |> Async.Parallel |> Async.Ignore }
do! loop path
do! folders.AsyncAdd(None) }
let rec fileCollector() =
async { let! dir = folders.AsyncGet 75
match dir with
| Some s ->
let fs = Directory.GetFiles s
do! Async.StartChild(async { do! [ for z in fs -> printfn "%s" z; files.AsyncAdd(Some z) ]
|> Async.Parallel |> Async.Ignore } ) |> Async.Ignore
do! fileCollector()
| None -> printfn "Done!"; ()}
It works just fine. However now I cant keep track of when the fileCollector is done, since it’s running a bunch of async computations, and therefore even when it gets to “None” in the queue, it might still have some work to do. What’s going on?
Update:
I’ve modified the fileCollector to same “style” as folderCollector, but the problem remains. The modified version:
let fileCollector() =
async { let rec loop() =
async { let! dir = folders.AsyncGet 750
match dir with
| Some s ->
let! fs = Directory.AsyncGetFiles s
do! [ for z in fs -> printfn "%A" z; files.AsyncAdd(Some z) ]
|> Async.Parallel |> Async.Ignore
return! loop()
| None -> printfn "Done!"; () }
do! loop()
printfn "after" // Never gets this far...
do! files.AsyncAdd(None) }
To answer your second question (from the comment) regarding the updated version based on pipelines – I think you could use
BlockingQueueAgent<option<string>>and use the valueNonewhen you finished generating all files (theNonevalue would then propagate through the pipeline and you could end all workflows when they getNone).To do that, you need to modify
folderCollectorto actually detect when it finishes iterating. It is not tested, but the following should work (the point is that you need to wait for a completion of the recursive call):All workflows would potentially get
Noneas a result ofAsyncGet. When that happens, they should sendNoneto the next worker in the pipeline. The last one can terminate when it receivesNone: