What is the most efficient way to delete an arbitrary chunk of a file,

Question

0

Asked: May 16, 20262026-05-16T07:55:13+00:00 2026-05-16T07:55:13+00:00

What is the most efficient way to delete an arbitrary chunk of a file,

0

What is the most efficient way to delete an arbitrary chunk of a file, given the start and end offsets? I’d prefer to use Python, but I can fall back to C if I have to.

Say the file is this

..............xxxxxxxx----------------

I want to remove a chunk of it:

..............[xxxxxxxx]----------------

After the operation it should become:

..............----------------

Reading the whole thing into memory and manipulating it in memory is not a feasible option.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-16T07:55:14+00:00

The best performance will almost invariably be obtained by writing a new version of the file and then having it atomically write the old version, because filesystems are strongly optimized for such sequential access, and so is the underlying hardware (with the possible exception of some of the newest SSDs, but, even then, it’s an iffy proposition). In addition, this avoids destroying data in the case of a system crash at any time — you’re left with either the old version of the file intact, or the new one in its place. Since every system could always crash at any time (and by Murphy’s Law, it will choose the most unfortunate moment;-), integrity of data is generally considered very important (often data is more valuable than the system on which it’s kept — hence, “mirroring” RAID solutions to ensure against disk crashes losing precious data;-).

If you accept this sane approach, the general idea is: open old file for reading, new one for writing (creation); copy N1 bytes over from the old file to the new one; then skip N2 bytes of the old file; then copy the rest over; close both files; atomically rename new to old. (Windows apparently has no “atomic rename” system call usable from Python — to keep integrity in that case, instead of the atomic rename, you’d do three step: rename old file to a backup name, rename new file to old, delete backup-named file — in case of system crash during the second one of these three very fast operations, one rename is all it will take to restore data integrity).

N1 and N2, of course, are the two parameters saying where the deleted piece starts, and how long it is. For the part about opening the files, with open('old.dat', 'rb') as oldf: and with open('NEWold.dat', 'wb') as newf: statements, nested into each other, are clearly best (the rest of the code until the rename step must be nested in both of them of course).

For the “copy the rest over” step, shutil.copyfileobj is best (be sure to specify a buffer length that’s comfortably going to fit in your available RAM, but a large one will tend to give better performance). The “skip” step is clearly just a seek on the oldf open-for-reading file object. For copying exactly N1 bytes from oldf to newf, there is no direct support in Python’s standard library, so you have to write your own, e.g:

def copyN1(oldf, newf, N1, buflen=1024*1024):
    while N1:
      newf.write(oldf.read(min(N1, buflen)))
      N1 -= buflen

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

What is the most efficient way to delete an arbitrary chunk of a file,

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply