I’m trying to sort an array whose elements are read from a file which is approximately 5 GB in size and contains approximately 500000 data elements.
After a data size of 300.000.000, the program gives an error during sorting due to segmentation fault and terminates.
I think the problem occurs due to insufficient memory space allocated to program. How can I change it in my C code?
Could you help me about this? Thank you.
int arraysize = atoi(argv[1]);
int* array = malloc(sizeof(int)*arraysize);
int* temp = malloc(sizeof(int)*arraysize);
int i;
FILE *fi;
char buffer[20];
fi = fopen("DATASET.dat", "r");
for(i=0; i<arraysize; i++){
fgets(buffer, 20, fi);
array[i] = atoi(buffer);
}
fclose(fi);
//function is called to perform the sorting
mergesort_array(array, arraysize, temp);
In general, whenever you allocate memory, check that the allocation succeeded:
Then, a possibility would be to implement mergesorting on disk, using a file of integers (you can mmap() the file, too).
But I find it strange that an allocation of 300000 integers on the heap – 4.8 megabytes at the most, using 64-bit integers – can cause an allocation error, so I think this is something in the mergesort implementation; maybe something having to do with a recursive implementation.
I’d start with compiling the program with full debug information, and checking the core dump with
gdb.A “simple” malloc problem
Having to handle a very large array of ASCII strings representing numbers, you could start by first converting it to a file of integers.
Now you have an
INTEGER.datfile which is made of integers of fixed size. It is, to all intents and purposes, a file copy of an array in memory. Same goes for the temporary array.And you can tell the system to treat that file as if it was an array in memory.