Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • Home
  • SEARCH
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 5938177
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 22, 20262026-05-22T15:38:52+00:00 2026-05-22T15:38:52+00:00

Trying to load a file into python. It’s a very big file (1.5Gb), but

  • 0

Trying to load a file into python. It’s a very big file (1.5Gb), but I have the available memory and I just want to do this once (hence the use of python, I just need to sort the file one time so python was an easy choice).

My issue is that loading this file is resulting in way to much memory usage. When I’ve loaded about 10% of the lines into memory, Python is already using 700Mb, which is clearly too much. At around 50% the script hangs, using 3.03 Gb of real memory (and slowly rising).

I know this isn’t the most efficient method of sorting a file (memory-wise) but I just want it to work so I can move on to more important problems 😀 So, what is wrong with the following python code that’s causing the massive memory usage:

print 'Loading file into memory'
input_file = open(input_file_name, 'r')
input_file.readline() # Toss out the header
lines = []
totalLines = 31164015.0
currentLine = 0.0
printEvery100000 = 0
for line in input_file:
    currentLine += 1.0
    lined = line.split('\t')
    printEvery100000 += 1
    if printEvery100000 == 100000:
        print str(currentLine / totalLines)
        printEvery100000 = 0;
    lines.append( (lined[timestamp_pos].strip(), lined[personID_pos].strip(), lined[x_pos].strip(), lined[y_pos].strip()) )
input_file.close()
print 'Done loading file into memory'

EDIT: In case anyone is unsure, the general consensus seems to be that each variable allocated eats up more and more memory. I “fixed” it in this case by 1) calling readLines(), which still loads all the data, but only has one ‘string’ variable overhead for each line. This loads the entire file using about 1.7Gb. Then, when I call lines.sort(), I pass a function to key that splits on tabs and returns the right column value, converted to an int. This is slow computationally, and memory-intensive overall, but it works. Learned a ton about variable allocation overhad today 😀

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-22T15:38:52+00:00Added an answer on May 22, 2026 at 3:38 pm

    Here is a rough estimate of the memory needed, based on the constants derived from your example. At a minimum you have to figure the Python internal object overhead for each split line, plus the overhead for each string.

    It estimates 9.1 GB to store the file in memory, assuming the following constants, which are off by a bit, since you’re only using part of each line:

    • 1.5 GB file size
    • 31,164,015 total lines
    • each line split into a list with 4 pieces

    Code:

    import sys
    def sizeof(lst):
        return sys.getsizeof(lst) + sum(sys.getsizeof(v) for v in lst)
    
    GIG = 1024**3
    file_size = 1.5 * GIG
    lines = 31164015
    num_cols = 4
    avg_line_len = int(file_size / float(lines))
    
    val = 'a' * (avg_line_len / num_cols)
    lst = [val] * num_cols
    
    line_size = sizeof(lst)
    print 'avg line size: %d bytes' % line_size
    print 'approx. memory needed: %.1f GB' % ((line_size * lines) / float(GIG))
    

    Returns:

    avg line size: 312 bytes
    approx. memory needed: 9.1 GB
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm trying to insert some import lines into a python source file, but i
I have been trying to load a text file into a combobox, and then
All, I am trying to load up a bmp file into a GLubyte array
I am trying to load a configuration file into a hash during my PerlChildInitHandler
I'm trying to load a default set of data from a csv file into
I have been running into OutOfMemory Exceptions while trying to load an 800MB text
I'm trying to load some decimal values from a file but I can't work
I'm trying to load a video file into my iPad app as an AVURLAsset
I'm trying to load a javascript file into a PHP page and then i
I'm trying to load a CSV file into MySQL, and I keep getting syntax

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.