I am developing a Rails app. I would like to use an array to hold 2,000,000 data, then insert the data into database like following:
large_data = Get_data_Method() #get 2,000,000 raw data
all_values = Array.new
large_data.each{ |data|
all_values << data[1] #e.g. data[1] has the format "(2,'john','2002-09-12')"
}
sql="INSERT INTO cars (id,name,date) VALUES "+all_values.join(',')
ActiveRecord::Base.connection.execute(sql)
When I run the code, it takes a long long time at the point of large_data.each{...} . Actually I am now still waiting for it to finish(it has been running for 1 hour already still not finish the large_data.each{...} part).
Is it because of the number of elements is too large for the ruby array that the array can not hold 2,000,000 elements ? or ruby array can hold that much elements and it is reasonable to wait this long?
Since I would like to use bulk insertion in SQL to speed up the large data insertion time in mysql database, so I would like to use only one INSERT INTO statement, that’s why I did the above thing. If this is a bad design, can you recommand me a better way?
Some notes:
Don’t use the pattern “empty array + each + push”, use Enumerable#map.
Is it possible to write
get_datato return items lazily? if the answer is yes, check enumerators and use them to do batched inserts into the database instead of puting all objects at once. Something like this:That said, there’re projects for doing efficient bulk insertions, check ar-extensions and activerecord-import for Rails >= 3.