I’m working with 6 csv files that each contain the attributes of an object. I can read them one at a time, but the idea of splitting each one to a thread to do in parallel is very appealing.
I’ve created a database object (no relational DBs or ORMs allowed) that has an array for each of the objects it is holding. I’ve tried the following to make each CSV open and initialize concurrently, but have seen no impact on speed.
threads = []
CLASS_FILES.each do |klass, filename|
threads << Thread.new do
file_to_objects(klass, filename)
end
end
threads.each {|thread| thread.join}
update
end
def self.load(filename)
CSV.open("data/#{filename}", CSV_OPTIONS)
end
def self.file_to_objects(klass, filename)
file = load(filename)
method_name = filename.sub("s.csv","")
file.each do |line|
instance = klass.new(line.to_hash)
Database.instance.send("#{method_name}") << instance
end
end
How can I speed things up in ruby (MRI 1.9.3)? Is this a good case for Rubinius?
Even though Ruby 1.9.3 uses native threads in order to implement concurrency, it has a global interpreter lock which makes sure only one thread executes at a time.
Therefore, nothing really runs in parallel in C Ruby. I know that JRuby imposes no internal lock on any thread, so try using it to run your code, if possible.
This answer by Jörg W Mittag has a more in-depth look at the threading models of the several Ruby implementations. It isn’t clear to me whether Rubinius is fit for the job, but I’d give it a try.