below is the code to download a txt file from internet approx 9000 lines and populate the database, I have tried a lot but it takes a lot of time more than 7 minutes. I am using win 7 64 bit and ruby 1.9.3. Is there a way to do it faster ??
require 'open-uri'
require 'dbi'
dbh = DBI.connect("DBI:Mysql:mfmodel:localhost","root","")
#file = open('http://www.amfiindia.com/spages/NAV0.txt')
file = File.open('test.txt','r')
lines = file.lines
2.times { lines.next }
curSubType = ''
curType = ''
curCompName = ''
lines.each do |line|
line.strip!
if line[-1] == ')'
curType,curSubType = line.split('(')
curSubType.chop!
elsif line[-4..-1] == 'Fund'
curCompName = line.split(" Mutual Fund")[0]
elsif line == ''
next
else
sCode,isin_div,isin_re,sName,nav,rePrice,salePrice,date = line.split(';')
sCode = Integer(sCode)
sth = dbh.prepare "call mfmodel.populate(?,?,?,?,?,?,?)"
sth.execute curCompName,curSubType,curType,sCode,isin_div,isin_re,sName
end
end
dbh.do "commit"
dbh.disconnect
file.close
106799;-;-;HDFC ARBITRAGE FUND RETAIL PLAN DIVIDEND OPTION;10.352;10.3;10.352;29-Jun-2012
This is the format of data to be inserted in the table. Now there are 8000 such lines and how can I do an insert by combining all that and call the procedure just once. Also, does mysql support arrays and iteration to do such a thing inside the routine. Please give your suggestions.Thanks.
EDIT
I have to make insertion’s into the tables depending on whether they are already exist or not, also I need to make use of conditional comparison’s before inserting into the table. I definitely can’t write SQL statements for these, so I wrote SQL stored procedures. Now I have a list @the_data, how do I pass that to the procedure and then iterate through it all on MySQL side. Any ideas ?
insert into mfmodel.company_masters (company_name) values
#{@the_data.map {|str| "('#{str[0]}')"}.join(',')}
this makes 100 insertions but 35 of them are redundant so I need to search the table for existing entries before doing a insertion.
Any Ideas ? thanks
From your comment, it looks like you are spending all your time executing DB queries. On a recent Ruby project, I also had to optimize some slow code which was importing data from CSV files into the database. I got about a 500x performance increase by importing all the data by using a single bulk
INSERTquery, rather than 1 query for each row of the CSV file. I accumulated all the data in an array, and then built a single SQL query using string interpolation andArray#join.From your comments, it seems that you may not know how to build and execute dynamic SQL for a bulk
INSERT. First get your data in a nested array, with the fields to be inserted in a known order. Just for an example, imagine we have data like this:You seem to be using Rails and MySQL, so the dynamic SQL will have to use MySQL syntax. To build and execute the
INSERT, you can do something like:You said that you need to insert data into 2 different tables. That’s not a problem; just accumulate the data for each table in a different array, and execute 2 dynamic queries, perhaps inside a transaction. 2 queries will be much faster than 9000.
Again, you said in the comments that you may need to update some records rather than inserting. That was also the case in the “CSV import” case which I mentioned above. The solution is only slightly more complicated:
Then you must build a dynamic
INSERTand a dynamicUPDATEfor each table involved. Google forUPDATEsyntax if you need it, and go wild with all your favorite string processing functions!Going back to the sample code above, note the difference between numeric and string fields. If it is possible that the strings may contain single quotes, you will have to make sure that all the single quotes are escaped. The behavior of
String#gsubmay be surprise you when you try to do this: it assigns a special meaning to\'. The best way I have found so far to escape single quotes is:string.gsub("'") { "\\'" }. Perhaps other posters know a better way.If you are inserting dates, make sure they are converted to MySQL’s date syntax.
Yes, I know that “roll-your-own” SQL sanitization is very iffy. There may even be security bugs with the above approach; if so, I hope my better-informed peers will set me straight. But the performance gains are just too great to ignore. Again, if this can be done using a prepared query with placeholders, and you know how, please post!
Looking at your code, it looks like you are inserting the data using a stored procedure (
mfmodel.populate). Even if you do want to use a stored procedure for this, why do you havedbh.preparein the loop? You should be able to move that line outside oflines.each.