I have a large body of legacy code that I inherited. It has worked fine until now. Suddenly at a customer trial that I cannot reproduce inhouse, it crashes in malloc. I think that I need to add instrumentation e.g on top of malloc I have my own malloc that stores some meta information about each malloc e.g. who has made the malloc call. When it crashes, I can then look up the meta information and see what was happening. I had done something similar years ago but cannot recall it now…I am sure people have come up with better ideas. Will be glad to have inputs.
Thanks
Is memory allocation broken?
Try valgrind.
Malloc is still crashing.
Okay, I’m going to have to assume that you mean
SIGSEGV(segmentation fault) is firing inmalloc. This is usually caused by heap corruption. Heap corruption, that itself does not cause a segmentation fault, is usually the result of an array access outside of the array’s bounds. This is usually nowhere near the point where you callmalloc.mallocstores a small header of information "in front of" the memory block that it returns to you. This information usually contains the size of the block and a pointer to the next block. Needless to say, changing either of these will cause problems. Usually, the next-block pointer is changed to an invalid address, and the next timemallocis called, it eventually dereferences the bad pointer and segmentation faults. Or it doesn’t and starts interpreting random memory as part of the heap. Eventually its luck runs out.Note that
freecan have the same thing happen, if the block being released or the free block list is messed up.How you catch this kind of error depends entirely on how you access the memory that
mallocreturns. Amallocof a singlestructusually isn’t a problem; it’smallocof arrays that usually gets you. Using a negative (-1 or -2) index will usually give you the block header for your current block, and indexing past the array end can give you the header of the next block. Both are valid memory locations, so there will be no segmentation fault.So the first thing to try is range checking. You mention that this appeared at the customer’s site; maybe it’s because the data set they are working with is much larger, or that the input data is corrupt (e.g. it says to allocate 100 elements and then initializes 101), or they are performing things in a different order (which hides the bug in your in-house testing), or doing something you haven’t tested. It’s hard to say without more specifics. You should consider writing something to sanity check your input data.