I have a very large data file, and each record in this data file has 4 lines. I have written a very simple C program to analyze files of this type and print out some useful information. The basic idea of the program is this.
int main()
{
char buffer[BUFFER_SIZE];
while(fgets(buffer, BUFFER_SIZE, stdin))
{
fgets(buffer, BUFFER_SIZE, stdin);
do_some_simple_processing_on_the_second_line_of_the_record(buffer);
fgets(buffer, BUFFER_SIZE, stdin);
fgets(buffer, BUFFER_SIZE, stdin);
}
print_out_result();
}
This of course leaves out some details (sanity/error checking, etc), but that is not relevant to the question.
The program works fine, but the data files I’m working with are huge. I figured I would try to speed up the program by parallelizing the loop with OpenMP. After a bit of searching, though, it appears that OpenMP can only handle for loops where the number of iterations is know beforehand. Since I don’t know the size of the files beforehand, and even simple commands like wc -l take a long time to run, how can I parallelize this program?
Have you checked that your process is actually CPU-bound and not I/O-bound? Your code looks very much like I/O-bound code, which would gain nothing from parallelization.