I’m using the following program in C to filter a log file with about 200,000 lines. But the program stops responding after about 12000 lines. Any explanations why does this happen and any solution to it?
The code is compiled in GCC (windows).
PS: The code is executing properly and giving desired output for small files.
#include<stdio.h>
#include<string.h>
int check(char *url)
{
//some code to filter the data and return either 0 or 1 depending upon input
}
int main()
{
FILE *fpi, *fpo;
fpi=fopen("access.log","r");
fpo=fopen("edited\\filter.txt","w");
char date[11],time[9],ip[16],url[500],temp[3];
while(!feof(fpi))
{
printf(".");
fscanf(fpi," %s %s %s %s %s %s",date,time,temp,ip,temp,url);
if(check(url))
fprintf(fpo,"%s %s %s %s %s %s\n",date,time,temp,ip,temp,url);
}
fclose(fpi);
fclose(fpo);
printf("\n\n\nDONE! :)");
return 0;
}
It is possible that one of the lines in the input file contains a field that is larger than the string variable you pass to
fscanf(). It might result in a buffer overflow, which later results in an infinite loop somewhere. Just a speculation. I suggest you delimit%sin thefscanf()format string with the maximum length of the output string variable.For example, this will make sure that there are no buffer overflows and that the resulted strings terminate:
Also, you are reading temp twice. The latter read will override the former. Is this what you intended?
Another improvement, assuming that the input file is line-terminated, and each log is in a separate line, is to use
fgets()in order to read a line and only then usesscanf()on the intermediate buffer. This way you ensure that no formatting errors extend beyond a single line. Also, sscanf returns the number of read items, in your case – 6. It’s would be safer to check the return value.