Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 6852059
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 27, 20262026-05-27T01:19:31+00:00 2026-05-27T01:19:31+00:00

In Python I have a file stream, and I want to copy some part

  • 0

In Python I have a file stream, and I want to copy some part of it into a StringIO. I want this to be fastest as possible, with minimum copy.

But if I do:

data = file.read(SIZE)
stream = StringIO(data)

I think 2 copies was done, no? One copy into data from file, another copy inside StringIO into internal buffer. Can I avoid one of the copies? I don’t need temporary data, so I think one copy should be enough

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-27T01:19:32+00:00Added an answer on May 27, 2026 at 1:19 am

    In short: you can’t avoid 2 copies using StringIO.

    Some assumptions:

    • You’re using cStringIO, otherwise it would be silly to optimize this much.
    • It’s speed and not memory efficiency you’re after. If not, see Jakob Bowyer’s solution, or use a variant using file.read(SOME_BYTE_COUNT) if your file is binary.
    • You’ve already stated this in the comments, but for completeness: you want to actually edit the contents, not just view it.

    Long answer: Since python strings are immutable and the StringIO buffer is not, a copy will have to be made sooner or later; otherwise you’d be altering an immutable object! For what you want to be possible, the StringIO object would need to have a dedicated method that read directly from a file object given as an argument. There is no such method.

    Outside of StringIO, there are solutions that avoid the extra copy. Off the top of my head, this will read a file directly into a modifiable byte array, no extra copy:

    import numpy as np
    a = np.fromfile("filename.ext", dtype="uint8")
    

    It may be cumbersome to work with, depending on the usage you intend, since it’s an array of values from 0 to 255, not an array of characters. But it’s functionally equivalent to a StringIO object, and using np.fromstring, np.tostring, np.tofile and slicing notation should get you where you want. You might also need np.insert, np.delete and np.append.

    I’m sure there are other modules that will do similar things.

    TIMEIT:

    How much does all this really matter? Well, let’s see. I’ve made a 100MB file, largefile.bin. Then I read in the file using both methods and change the first byte.

    $ python -m timeit -s "import numpy as np" "a = np.fromfile('largefile.bin', 'uint8'); a[0] = 1"
    10 loops, best of 3: 132 msec per loop
    $ python -m timeit -s "from cStringIO import StringIO" "a = StringIO(); a.write(open('largefile.bin').read()); a.seek(0); a.write('1')"
    10 loops, best of 3: 203 msec per loop
    

    So in my case, using StringIO is 50% slower than using numpy.

    Lastly, for comparison, editing the file directly:

    $ python -m timeit "a = open('largefile.bin', 'r+b'); a.seek(0); a.write('1')"
    10000 loops, best of 3: 29.5 usec per loop
    

    So, it’s nearly 4500 times faster. Of course, it’s extremely dependent on what you’re going to do with the file. Altering the first byte is hardly representative. But using this method, you do have a head start on the other two, and since most OS’s have good buffering of disks, the speed may be very good too.

    (If you’re not allowed to edit the file and so want to avoid the cost of making a working copy, there are a couple of possible ways to increase the speed. If you can choose the filesystem, Btrfs has a copy-on-write file copy operation — making the act of taking a copy of a file virtually instant. The same effect can be achieved using an LVM snapshot of any filesystem.)

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have a file that I want to include in Python but the included
I have a python program/file that I want to run repeatedly and calculate the
I have a file stream of an image in Python: \x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x04\x87... How do I
Possible Duplicate: comparing contents of two files using python I have a file name
I have a series of Python classes in a file. Some classes reference others.
Here's the context: I have a file (or stream), and I want to process
I have a python file testHTTPAuth.py which uses module deliciousapi and is kept in
I have a Python file with as content: import re import urllib class A(object):
I have a python application that relies on a file that is downloaded by
I have a batch file containing a python script using the Output template> %(NAME)s

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.