FINAL EDIT: This turned out to be a stack overflow problem related to neither string functions nor malloc. The GDB output said the problem was a few lines above where it was and got me confused, but as soon as I took the time to run in Valgrind I got it figured out.
I’ve written a bidirectional breadth first search program for finding shortest paths within a very large directed graph (~6 million nodes). With a test input file of 100 nodes, everything works great. With the full inputs, much more memory is used, and the program then has a segmentation fault.
GDB says it’s segfaulting at the start of the search function, when I clean the result buffer at n = sprintf(result, "");. Here’s the relevant function:
char *bidirbfs(int x, int y, char *result){
int n;
n = sprintf(result, "");
...
Here’s the call to it and the allocation of the result buffer:
int main (){
int n=0;
char *result;
result = (char *)malloc(sizeof(char)*2000);
if(result == NULL){
printf("MALLOC FAILED!"); exit(1);}
//Methods for initializing graph
readStructureFromFile();
calcArticlesIn();
//Search the graph
result = bidirbfs(1,2, result);
printf("%s\n", result);
...
}
Again, with small inputs everything works. When I use the full-size inputs, the program reads everything in fine, but then segfaults.. When I instead use a very similar call to strncpy to empty the array, I get the same behavior, so it seems like it’s a general problem with string functions. I’m not sure what could be going on.
It seems like sprintf doesn’t like the pointer it’s getting, which makes me wonder if malloc is doing something odd. When using the full inputs, malloc gets called 13 million times*, so I wonder if it might be showing odd behavior because of this and overwriting the string buffer with something weird. At the same time, I’m very hesitant to blame the library.
Any ideas what could be going on?
*Sadly I think this is actually necessary. Each element in the graph has an array for inbound edges and outbound edges. Each array’s size is unknown until the inputs are read, so it has to be dynamically allocated to the correct size by malloc.
EDIT: Valgrind returned the following. I’m working on figuring out what it might mean, but at first glance it might actually be some kind of stack overflow.
==27263== Warning: client switching stacks? SP change: 0xbea50634 --> 0xbb815340
==27263== to suppress, use: --max-stackframe=52671220 or greater
==27263== Invalid write of size 4
==27263== at 0x8048D78: bidirbfs (load_data.c:184)
==27263== by 0x80491CD: main (load_data.c:304)
==27263== Address 0xbb815348 is on thread 1's stack
==27263==
==27263==
==27263== Process terminating with default action of signal 11 (SIGSEGV)
==27263== Access not within mapped region at address 0xBB815348
==27263== at 0x8048D78: bidirbfs (load_data.c:184)
==27263== If you believe this happened as a result of a stack
==27263== overflow in your program's main thread (unlikely but
==27263== possible), you can try to increase the size of the
==27263== main thread stack using the --main-stacksize= flag.
==27263== The main thread stack size used in this run was 8388608.
==27263==
==27263== Process terminating with default action of signal 11 (SIGSEGV)
==27263== Access not within mapped region at address 0xBB81533C
==27263== at 0x401F4DD: _vgnU_freeres (vg_preloaded.c:58)
==27263== If you believe this happened as a result of a stack
==27263== overflow in your program's main thread (unlikely but
==27263== possible), you can try to increase the size of the
==27263== main thread stack using the --main-stacksize= flag.
==27263== The main thread stack size used in this run was 8388608.
==27263==
==27263== HEAP SUMMARY:
==27263== in use at exit: 1,021,539,288 bytes in 13,167,791 blocks
==27263== total heap usage: 13,167,792 allocs, 1 frees, 1,047,874,864 bytes allocated
==27263==
==27263== LEAK SUMMARY:
==27263== definitely lost: 0 bytes in 0 blocks
==27263== indirectly lost: 0 bytes in 0 blocks
==27263== possibly lost: 0 bytes in 0 blocks
==27263== still reachable: 1,021,539,288 bytes in 13,167,791 blocks
==27263== suppressed: 0 bytes in 0 blocks
==27263== Rerun with --leak-check=full to see details of leaked memory
==27263==
==27263== For counts of detected and suppressed errors, rerun with: -v
==27263== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 12 from 7)
EDIT 2:
Final solution: it was a stack overflow. Right after the sprintf statement I created an a array whose size is proportional to the number of nodes. Since I wasn’t using malloc, this was created directly on the stack, overflowing it. Changing to use malloc solved the problem and everything now runs as expected. Thanks to everyone for the suggestions!
Run your program in valgrind. See what it says. I bet you’ll find the output enlightening.