I wrote this small program to find all occurrences of a substring in a larger string, or a needle in a haystack. When I run the program locally, it seems to work just fine. However, when I submit it to an online contest for judging, it gives a SIGBART error. I assumed it was because of poor memory-management, so I deleted the free() function calls, but then I got a Time Limit Exceeded error (but the SIGBART error disappeared). Does removing the free() calls slow the program? And are there any leaks in my program?
Here the contest I was talking about:
Needle in the Haystack
Here’s the code:
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#define RAW_INPUT_SIZE 10000
#define BOOL unsigned int
#define NO 0
#define YES 1
int main (int argc, char **argv)
{
int needleLength;
char *rawNeedle = (char *)malloc(RAW_INPUT_SIZE);
char *rawHaystack = (char *)malloc(RAW_INPUT_SIZE);
char *needle; // to be allocated later
char *haystack; // to be allocated later, but not deallocated
while (scanf("%i\n%s\n%s", &needleLength, rawNeedle, rawHaystack) != EOF)
{
needle = (char *)malloc(needleLength);
strncpy(needle, rawNeedle, needleLength);
haystack = strchr(rawHaystack, needle[0]);
int i = haystack - rawHaystack;
BOOL matchesFound = NO;
if (i + needleLength - 1 < strlen(rawHaystack))
{
while (haystack != NULL)
{
if (i + needleLength - 1 < strlen(rawHaystack))
{
char *substr = (char *)malloc(needleLength);
strncpy(substr, haystack, needleLength);
if (strcmp(needle, substr) == 0)
{
printf("%i\n", i);
matchesFound = YES;
}
free(substr);
substr = NULL;
}
haystack = strchr(haystack+1, needle[0]);
i = haystack - rawHaystack;
}
}
if (matchesFound == NO)
printf("\n");
free(needle);
needle = NULL;
}
free(rawNeedle);
free(rawHaystack);
rawNeedle = NULL;
rawHaystack = NULL;
return 0;
}
Transcription of input and output specification from the question
Input
The input consists of a number of test cases. Each test case is composed of three lines, containing:
- the length of the needle,
- the needle itself,
- the haystack.
The length of the needle is only limited by the memory available to your program, so do not make any assumptions – instead, read the length and allocate memory as needed. The haystack is not limited in size, which implies that your program should not read the whole haystack at once. The KMP algorithm is stream-based, i.e. it processes the haystack character by character, so this is not a problem.
The test cases come one after another, each occupying three lines, with no additional space or line breaks in between.
Output
For each test case your program should output all positions of the needle’s occurences within the haystack. If a match is found, the output should contain the position of the first character of the match. Characters in the haystack are numbered starting with zero.
For a given test case, the positions output should be sorted in ascending order, and each of these should be printed in a separate line. For two different test cases, the positions should be separated by an empty line.
Why use any memory allocation? If the specification includes a maximum needle length of 10,000, simply use local arrays:
Read directly into these; don’t copy them around.
It is not clear that your needle length includes the trailing null. Therefore, this does not allocate enough space and does not guarantee null termination, both of which can easily lead to SIGABRT problems.
Repeatedly using
strlen()on your haystack will make your program run slowly. You can calculate the lengths without needing to dostrlen()more than once on each of the needle and haystack.Unless you are guaranteed no blanks in the data, your
scanf()code will read less than you expect. You should always check that you get all the values you expect.You should look up the function
strstr().