At the outset this looks pretty simple, however this was an interview question and

Question

0

Asked: May 26, 20262026-05-26T00:35:37+00:00 2026-05-26T00:35:37+00:00

At the outset this looks pretty simple, however this was an interview question and

0

At the outset this looks pretty simple, however this was an interview question and the trick is as follows :

I wrote a simple code to copy Bytewise from one file to another and return count which is incremented in the while(!feof) loop. However, my interviewer said executing this loop for copying 1 GB file would take 1 hour cause its copying Bytewise, however this does not happen in real life. Could someone tell me how are huge files actually copied on computers, what is the underlying algorithm? Also, remember I need to return the number of bytes copied.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-26T00:35:38+00:00

He’s probably just plain wrong.

Unless you wrote the code in something like assembly language, reading/writing one character at a time will almost certainly have only a fairly minimal effect on overall speed. The reason is fairly simple: almost anything higher level than assembly language will do (at least some) buffering for you when to do character-oriented I/O.

Just for example, consider code in C like this:

#include <stdio.h>

int main(int argc, char **argv) { 
    FILE *infile = fopen(argv[1], "rb");
    FILE *outfile = fopen(argv[2], "wb");

    unsigned long count = 0;
    int ch;

    while (EOF != (ch=getc(infile))) {
        ++count;
        putc(ch, outfile);
    }
    printf("%lu bytes copied\n", count);
    return 0;
}

The reality is that this will probably run a little slower than a typical file copy, but only a little. The reason is fairly simple: at least assuming a halfway decent implementation of C, getc and putc (along with most of the rest of the standard I/O) will do buffering for you behind the scenes. In fact, getc and putc will often be implemented as macros, so most of the code will be expanded inline as well. Though it varies from one compiler to another, typical code will look something like this:

#define getc(f) f->__pos<f->__len?f->__buf[f->__pos++]:__filbuf()
#define putc(ch, f) f-__>pos<f->__len?f->__buf[f->__pos++]=ch:__flshbuf(f, ch)

This will be accompanied by code something like this:

#define BUF_SIZE 4096

typedef struct {
    char __buf[BUF_SIZE];
    size_t __pos;
    size_t __len=BUF_SIZE;
    int __file_number;
} FILE;

Now, it’s certainly true that you can improve on this:

Since you know you’re going to use the whole file sequentially, you can use a bigger buffer to reduce the number of round-trips to/from kernel mode.
Since you know you’re going to write the data exactly as it’s written, you can read into a buffer, then use exactly the same buffer for writing, instead of copy the data from one buffer to another.
Since you know you’re copying files, and chances are most of that data won’t be used again soon, you can probably tell your OS that it shouldn’t be cached.
If the source and destination are on physically separate disks, asychronous I/O may help by allowing the reading/writing to happen at the same time.

Note, however, that chances are these will increase development time quite a bit, and even at best you shouldn’t plan on seeing anything like the speed difference suggested by your interviewer. Even a 10x improvement is unlikely, not to mention the ~1000x suggested by your interviewer.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

At the outset this looks pretty simple, however this was an interview question and

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply