I have a multithreaded program running which crashes after a day or two. Moreover the gdb backtrace of the core dump does not lead anywhere. There are no symbols at the point where it crashes.
Now the machine that generates the core file has a physical memory of 3 Gigs and 5 Gigs swap space. But the core dump that we get is around 25 Gigs. Isn’t the core dump actually memory dump? Why is the core dump large?
And can anyone give me more lead on how to debug in such situation?
If you are running a 64-bit OS then you can have file-backed mappings that exceed many times the amount of available physical memory + swap space.
Since kernel version 2.6.23, Linux provides a mechanism to control what gets included in the core dump file, called core dump filter. The value of the filter is a bit-field manipulated via the
/proc/<pid>/coredump_filterfile (seecore(5)man page):0x01) – anonymous private mappings (e.g. dynamically allocated memory)0x02) – anonymous shared mappings0x04) – file-backed private mappings0x08) – file-backed shared mappings (e.g. shared libraries)0x10) – ELF headers0x20) – private huge pages0x40) – shared huge pagesThe default value is
0x33which corresponds to dumping all anonymous mappings as well as the ELF headers (but only if kernel is compiled withCONFIG_CORE_DUMP_DEFAULT_ELF_HEADERS) and the private huge pages. Reading from this file returns the hexadecimal value of the filter. Writing a new hexadecimal value tocoredump_filterchanges the filter for the particular process, e.g. to enable dump of all possible mappings one would:(where
<pid>is the PID of the process)The value of the core dump filter is iherited in child processes created by
fork().Some Linux distributions might change the filter value for the
initprocess early in the OS boot stage, e.g. to enable dumping the file-backed mappings. This would then affect any process started later.