I am trying to process a text file of more than 1GB and saving

Question

0

Editorial Team

Asked: June 14, 20262026-06-14T12:56:35+00:00 2026-06-14T12:56:35+00:00

I am trying to process a text file of more than 1GB and saving

0

I am trying to process a text file of more than 1GB and saving the data in to Mysql database using python.

I had pasted some sample code below

import os
import MySQLdb as mdb

conn = mdb.connect(user='root', passwd='redhat', db='Xml_Data', host='localhost', charset="utf8")

file_path = "/home/local/user/Main/Module-1.0.4/file_processing/part-00000.txt"

file_open = open('part-00000','r')

for line in file_open:
    result_words = line.split('\t')
    query = "insert into PerformaceReport (campaignID, keywordID, keyword, avgPosition)"
    query += " VALUES (%s,%s,'%s',%s) " % (result_words[0],result_words[1],result_words[2],result_words[3])
    cursor = conn.cursor()
    cursor.execute( query )
    conn.commit()

Actually there are more than 18 columns the data is being inserted in to, i had just pasted only four(for example)

So when i run the above code the execution time is taking some hours

All my doubts are

Is there any alternate way for processing the 1GB text file in python very fastly ?
Is there any framework that process the 1GB text file and saves the data in to database very fastly ?
How to process a text file of large size(1GB) within minutes(is it possible) and save data in to database?
All my concern about is , we need to process the 1GB file as fast as possible but not in hours

Edited Code

query += " VALUES (%s,%s,'%s',%s) " % (int(result_words[0] if result_words[0] != '' else ''),int(result_words[2] if result_words[2] != '' else ''),result_words[3] if result_words[3] != '' else '',result_words[4] if result_words[4] != '' else '')

Actually i am submitting the values in the above format(by checking the result existence)

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-14T12:56:36+00:00

import os
import MySQLdb as mdb
import csv

def read_file():
    file_path = "/home/local/user/Main/Module-1.0.4/file_processing/part-00000.txt"
    with open('part-00000','r') as infile:
        file_open= csv.reader(infile, delimiter='\t')
        cache = []
        for line in file_open:
            cache.append(line)
            if len(cache) > 500:
                yield cache
                cache = []
        yield cache 

conn = mdb.connect(user='root', passwd='redhat', db='Xml_Data', host='localhost', charset="utf8")
cursor = conn.cursor()
query = "insert into PerformaceReport (campaignID, keywordID, keyword, avgPosition) VALUES (%s,%s,%s,%s)"
for rows in read_file():
    try:
        cursor.executemany(query, rows)
    except mdb.Error:
        conn.rollback()
    else:
        conn.commit()

The code is untested and might contain minor errors, but should be faster, not as fast as using LOAD DATA INFILE though.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I am trying to process a text file of more than 1GB and saving

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply