I have an algorithm that performs a breadth-first search of resources:
def crawl(starting_node)
items=[starting_node]
until items.empty?
item = items.shift
kids = item.slow_network_action # takes seconds
kids.each{ |kid| items << kid }
end
end
I’d like to use a few concurrent threads to parallelize the slow_network_action.
What’s a reasonable way to do this?
Here’s a technique that works, but I feel certain is not the right approach:
def crawl(starting_node)
mutex = Mutex.new
items = [starting_node]
4.times.map{
loop do
unless item=mutex.synchronize{ items.shift }
sleep LONGER_THAN_LONGEST_NETWORK_ACTION
break unless item=mutex.synchronize{ items.shift }
end
kids = item.slow_network_action
mutex.synchronize{
kids.each{ |kid| items << kid }
}
end
}.each(&:join)
end
I’d like to do something like have the threads actually sleep while waiting for an item to be added to the queue, wake up when an item is added, and have all threads exit when everyone is waiting, when none have been added.
This alternate code almost works but for the deadlocks that can (and do) occur, and the total lack of a proper exit strategy:
require 'thread'
def crawl(starting_node)
items = Queue.new
items << starting_node
4.times.map{
while item=items.shift
kids = item.slow_network_action
kids.each{ |kid| items << kid }
end
}.each(&:join)
end
This should point you in the right direction:
This makes the
itemsarray into a monitor and does any synchronization through that, along with an asociatedConditionVariablecreated from the monitor.This is similiar to how a
Queueworks internally, except that this also checks for when all work is finished (which actually adds a bit of complexity).The threads main loop starts with an empty
kidsarray that gets added toitemsin order to avoid needing two separate synchronized blocks in the loop, and the race conditions that would go with them.Note that this uses
broadcastwhich causes all waiting threads to wake, and could potentially cause a thundering herd. I don’t think this should cause any problems here. The alternative would be to add the elements ofkidsone at a time, and callsignalfor each one. This would add more complexity for dealing with the case when all work is finished though.