I am trying to process a text file of more than 1GB and saving the data in to Mysql database using python.
I had pasted some sample code below
import os
import MySQLdb as mdb
conn = mdb.connect(user='root', passwd='redhat', db='Xml_Data', host='localhost', charset="utf8")
file_path = "/home/local/user/Main/Module-1.0.4/file_processing/part-00000.txt"
file_open = open('part-00000','r')
for line in file_open:
result_words = line.split('\t')
query = "insert into PerformaceReport (campaignID, keywordID, keyword, avgPosition)"
query += " VALUES (%s,%s,'%s',%s) " % (result_words[0],result_words[1],result_words[2],result_words[3])
cursor = conn.cursor()
cursor.execute( query )
conn.commit()
Actually there are more than 18 columns the data is being inserted in to, i had just pasted only four(for example)
So when i run the above code the execution time is taking some hours
All my doubts are
- Is there any alternate way for processing the 1GB text file in python very fastly ?
- Is there any framework that process the 1GB text file and saves the data in to database very fastly ?
- How to process a text file of large size(1GB) within minutes(is it possible) and save data in to database?
All my concern about is , we need to process the 1GB file as fast as possible but not in hours
Edited Code
query += " VALUES (%s,%s,'%s',%s) " % (int(result_words[0] if result_words[0] != '' else ''),int(result_words[2] if result_words[2] != '' else ''),result_words[3] if result_words[3] != '' else '',result_words[4] if result_words[4] != '' else '')
Actually i am submitting the values in the above format(by checking the result existence)
The code is untested and might contain minor errors, but should be faster, not as fast as using
LOAD DATA INFILEthough.