I have a file (10-20MB) containing data, where each line is a single piece of data.
I have a C program that reads the file from the filesystem, and then based on command line input, it reads each line of the file, does a calculation on each line to determine if that line should be returned, and then return a subset of the data.
Assume that the program does an fread and reads the entire file into memory at the beginning, and then parses it directly from memory.
Would the program execute faster if, instead of reading it from the filesystem, I compiled the data into the program directly, by creating an array such as the following?
char *dataArray[] = {"data1", "data2", "data3"....};
Since the OS needs to read the entire binary from the filesystem, my gut feeling is that the execution time of both techniques would be similar, since reading from the filesystem would be the high order bit. However, would anyone have more definitive ideas on this?
Defining everything as a program literal will certainly be faster.
You do not need the relatively slow “open” call for the data file and you don’t need to move the data from the buffer to your storage.
This was a common optimization circa. 1970, and every programming/coding style book since then stongly recommends you do not do this. The actual performance increase is minimal and what you gain in performance you lose in maintainability and flexibility.
Should you want a quick maintainable optimisation for this type of problem then look at the “mmap” call which makes the buffer directly available to your program and minimises data movement.