I’m currently developing a custom ticket system for internal purposes for which I’ve written a RAILS application for. The ticket data is sourced from another location for which I can only get daily CSV extracts from. I’ve written a task that FTP’s the CSV file locally and then runs the below task to import it into a MySQL DB using ActiveRecord.
However it is extremely slow! Each CSV file I get each day contains approximately 20,000-40,000 rows with a total data size of 8-10MB. Each row contains a ticket that has been created or modified in the past day, which explains why I’m checking to see if the ticket already exists in the code below.
Running in production mode without debug output doesn’t make much of the difference.
desc 'Takes Orion csv file and parses into DB.'
task :importcsv, [:local_file_path] => :environment do |t, args|
require 'csv'
@error_count = 0
@success_count = 0
csv = CSV.read(args.local_file_path, col_sep: ",", encoding: "ISO8859-1", headers: true)
csv.each do |row|
if(/PR(.*)/.match(row[0])? true : false) # Skip PR tickets because they're a waste of space right now
@error_count += 1
next
end
if(row[0] == " ") # Break loop if ticketid is just whitespace
break
end
if(row[0].empty?) # Break loop if no ticketid
break
end
ticket = Ticket.find_or_create_by_ticketid(row[0], :severity => row[7],
:status => row[1],
:causecode => row[17],
:title => row[25],
:reportergrp => row[18],
:resolvergrp => row[5],
:resolvername => row[27],
:opendate => row[14],
:closedate => row[13],
:accountname => row[23],
:resolutiondesc => row[26] )
@success_count += 1
end
Rails.logger.info " #{@success_count} out of #{@error_count + @success_count} tickets were added or updated."
end
Here is a sample of the development debug output:
############### START PARSING ORION DATA ###############
Fetching data for date 2012-02-01...
Data already exists locally. Did not download.
Adding data to DB...
[1m[36mTicket Load (99.8ms)[0m [1mSELECT `tickets`.* FROM `tickets` WHERE `tickets`.`ticketid` = '03052019' LIMIT 1[0m
[1m[35m (0.3ms)[0m BEGIN
[1m[36mTicket Exists (24.2ms)[0m [1mSELECT 1 FROM `tickets` WHERE `tickets`.`ticketid` = BINARY '03052019' LIMIT 1[0m
[1m[35mSQL (1.4ms)[0m INSERT INTO `tickets` (`accountname`, `causecode`, `closedate`, `created_at`, `opendate`, `reportergrp`, `resolutiondesc`, `resolvergrp`, `resolvername`, `severity`, `status`, `ticketid`, `title`, `updated_at`) VALUES ('WESTPAC', 'AP_DATA', '2010-12-30 00:00:00', '2012-02-21 04:55:09', '2010-05-19 00:00:00', 'HDNZ', '-', 'DINZ', 'Sam Gardner', 3, 'CLOSED', '03052019', 'HTML GENERATED REPORTS CONT. OF FAULT: 03042', '2012-02-21 04:55:09')
[1m[36m (2.3ms)[0m [1mCOMMIT[0m
[1m[35mTicket Load (69.1ms)[0m SELECT `tickets`.* FROM `tickets` WHERE `tickets`.`ticketid` = '03089753' LIMIT 1
[1m[36m (0.4ms)[0m [1mBEGIN[0m
[1m[35mTicket Exists (19.8ms)[0m SELECT 1 FROM `tickets` WHERE `tickets`.`ticketid` = BINARY '03089753' LIMIT 1
[1m[36mSQL (0.9ms)[0m [1mINSERT INTO `tickets` (`accountname`, `causecode`, `closedate`, `created_at`, `opendate`, `reportergrp`, `resolutiondesc`, `resolvergrp`, `resolvername`, `severity`, `status`, `ticketid`, `title`, `updated_at`) VALUES ('WESTPAC', 'SW_PROGRAMCODE', NULL, '2012-02-21 04:55:09', '2010-07-20 00:00:00', 'HDNZ', '-', 'IANZ', 'Mitch Bell', 3, 'RESTORED', '03089753', 'CEE: EDS ERROR', '2012-02-21 04:55:09')[0m
[1m[35m (1.7ms)[0m COMMIT
[1m[36mTicket Load (66.2ms)[0m [1mSELECT `tickets`.* FROM `tickets` WHERE `tickets`.`ticketid` = '03236150' LIMIT 1[0m
[1m[35m (0.2ms)[0m BEGIN
[1m[36mTicket Exists (21.5ms)[0m [1mSELECT 1 FROM `tickets` WHERE `tickets`.`ticketid` = BINARY '03236150' LIMIT 1[0m
[1m[35mSQL (0.4ms)[0m INSERT INTO `tickets` (`accountname`, `causecode`, `closedate`, `created_at`, `opendate`, `reportergrp`, `resolutiondesc`, `resolvergrp`, `resolvername`, `severity`, `status`, `ticketid`, `title`, `updated_at`) VALUES ('WESTPAC', 'AP_DATA', '2011-12-12 00:00:00', '2012-02-21 04:55:09', '2011-03-04 00:00:00', 'HDNZ', '-', 'DINZ', 'Liam Fitzpatrick', 3, 'CLOSED', '03236150', 'SAMETIME CONNECTION ISSUES', '2012-02-21 04:55:09')
[1m[36m (1.5ms)[0m [1mCOMMIT[0m
[1m[35mTicket Load (64.5ms)[0m SELECT `tickets`.* FROM `tickets` WHERE `tickets`.`ticketid` = '03261509' LIMIT 1
[1m[36m (0.2ms)[0m [1mBEGIN[0m
[1m[35mTicket Exists (20.8ms)[0m SELECT 1 FROM `tickets` WHERE `tickets`.`ticketid` = BINARY '03261509' LIMIT 1
[1m[36mSQL (0.4ms)[0m [1mINSERT INTO `tickets` (`accountname`, `causecode`, `closedate`, `created_at`, `opendate`, `reportergrp`, `resolutiondesc`, `resolvergrp`, `resolvername`, `severity`, `status`, `ticketid`, `title`, `updated_at`) VALUES ('WESTPAC', ' ', NULL, '2012-02-21 04:55:09', '2011-05-08 00:00:00', 'OPSNZ', '-', 'ANNZ', 'Anusha Konduti', 3, 'OPEN', '03261509', 'P2PTSM002:-INFOMAN ONLY (TONZ): ANR2578W SCHEDULE WEEKLY_SYS', '2012-02-21 04:55:09')[0m
[1m[35m (1.4ms)[0m COMMIT
Turns out for whatever reason InnoDB was just taking an extremely long time. I switched to iSAM and it completely MUCH faster.