At the outset this looks pretty simple, however this was an interview question and the trick is as follows :
I wrote a simple code to copy Bytewise from one file to another and return count which is incremented in the while(!feof) loop. However, my interviewer said executing this loop for copying 1 GB file would take 1 hour cause its copying Bytewise, however this does not happen in real life. Could someone tell me how are huge files actually copied on computers, what is the underlying algorithm? Also, remember I need to return the number of bytes copied.
He’s probably just plain wrong.
Unless you wrote the code in something like assembly language, reading/writing one character at a time will almost certainly have only a fairly minimal effect on overall speed. The reason is fairly simple: almost anything higher level than assembly language will do (at least some) buffering for you when to do character-oriented I/O.
Just for example, consider code in C like this:
The reality is that this will probably run a little slower than a typical file copy, but only a little. The reason is fairly simple: at least assuming a halfway decent implementation of C,
getcandputc(along with most of the rest of the standard I/O) will do buffering for you behind the scenes. In fact, getc and putc will often be implemented as macros, so most of the code will be expanded inline as well. Though it varies from one compiler to another, typical code will look something like this:This will be accompanied by code something like this:
Now, it’s certainly true that you can improve on this:
Note, however, that chances are these will increase development time quite a bit, and even at best you shouldn’t plan on seeing anything like the speed difference suggested by your interviewer. Even a 10x improvement is unlikely, not to mention the ~1000x suggested by your interviewer.