Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6115237
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 23, 20262026-05-23T15:04:05+00:00 2026-05-23T15:04:05+00:00

I was trying to read data from some huge file and write them back,

  • 0

I was trying to read data from some huge file and write them back, but I realised that the main cost came from assigning data to a list rather then reading or writing data from/to file….

    rows = [None] * 1446311
    begin = datetime.datetime.now()
    for i in range( 1446311 ):
       row = csvReader.next()
       rows[i] = row
    print datetime.datetime.now() - begin

above code is taking 18 sec but 5 sec if I comment out line 5 (rows[i] = row), I have build the list in advance (i.e. reserve the memory) but why it is still so slow? anything I could do the make it faster? I tried row for row in csvReader but it performs worse…

regards,
John

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-23T15:04:06+00:00Added an answer on May 23, 2026 at 3:04 pm

    I get similar results, but not quite so dramatic as yours. (Note the use of the timeit module for timing code execution, and note that I’ve factored out the list creation since its common to both test cases.)

    import csv
    from timeit import Timer
    
    def write_csv(f, n):
        """Write n records to the file named f."""
        w = csv.writer(open(f, 'wb'))
        for i in xrange(n):
            w.writerow((i, "squared", "equals", i**2))
    
    def test1(rows, f, n):
        for i, r in enumerate(csv.reader(open(f))):
            rows[i] = r
    
    def test2(rows, f, n):
        for i, r in enumerate(csv.reader(open(f))):
            pass
    
    def test(t): 
        return (Timer('test%d(rows, F, N)' % t,
                      'from __main__ import test%d, F, N; rows = [None] * N' % t)
                .timeit(number=1))
    
    >>> N = 1446311
    >>> F = "test.csv"
    >>> write_csv(F, N)
    >>> test(1)
    2.2321770191192627
    >>> test(2)
    1.7048690319061279
    

    Here’s my guess as to what is going on. In both tests, the CSV reader reads a record from the file and creates a data structure in memory representing that record.

    In test2, where the record is not stored, the data structure gets deleted more or less immediately (on the next iteration of the loop, the row variable is updated, so the reference count of the previous record is decremented, and so the memory is reclaimed). This makes the memory used for the previous record available for reuse: this memory is already in the computer’s virtual memory tables, and probably still in the cache, so it’s (relatively) fast.

    In test1, where the record is stored, each record has to be allocated in a new region of memory, which has to be allocated by the operating system, and copied to the cache, so it’s (relatively) slow.

    So the time is not taken up by list assignment, but by memory allocation.


    Here are another couple of tests that illustrate what’s going on, without the complicating factor of the csv module. In test3 we create a new 100-element list for each row, and store it. In test4 we create a new 100-element list for each row, but we don’t store it, we throw it away so that the memory can be reused on the next time round the loop.

    def test3(rows, f, n):
        for i in xrange(n):
            rows[i] = [i] * 100
    
    def test4(rows, f, n):
        for i in xrange(n):
            temp = [i] * 100
            rows[i] = None
    
    >>> test(3)
    9.2103338241577148
    >>> test(4)
    1.5666921138763428
    

    So I think the lesson is that if you do not need to store all the rows in memory at the same time, don’t do that! If you can, read them in one at a time, process them one at a time, and then forget about them so that Python can deallocate them.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I'm trying to read in some sample data from an XML file in a
Im trying to read some data from a binary file into a buffer allocated
I'm trying to read data from a text file, clear it, and then write
I am trying to read some data from a xml file, numbers are saved
I'm trying to read data from a.csv file to ouput it on a webpage
I am trying to read data from a file stream as shown below: fileStream.Read(byteArray,
I'm trying to read binary data from a specific offset. I write the data
I'm trying to make the app read data from a socket, but it takes
I'm trying to write a small Powershell function that will return some summary data
i am trying to read in some data from an excel: If (FileUpload1.PostedFile.ContentType =

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.