I have a very complex cross-platform application. Recently my team and I have been running stress tests and have encountered several crashes (and core dumps accompanying them). Some of these core dumps are very precise, and show me the exact location where the crash occurred with around 10 or more stack frames. Others sometimes have just one stack frame with ?? being the only symbol!
What I’d like to know is:
- Is there a way to increase the probability of core dumps pointing in the right direction?
- Why isn’t the number of stack frames reported consistent?
- Any best practice advise for managing core dumps.
Here’s how I compile the binaries (in release mode):
- Compiler and platform: g++ with glibc-2.3.2-95.50 on CentOS 3.6 x86_64 — This helps me maintain compatibility with older versions of Linux.
- All files are compiled with the -g flag.
- Debug symbols are stripped from the final binary and saved in a separate file.
- When I have a core dump, I use GDB with the executable which created the core, and the symbols file. GDB never complains that there’s a mismatch between the core/binary/symbols.
Yet I sometimes get core dumps with no symbols at all! It’s understandable that I’m linking against non-debug version of libstdc++ and libgcc, but it would be nice if at least the stack trace shows me where in my code did the faulty instruction call originate (although it may ultimately end in ??).
There can be many reasons for that, among others:
-fomit-frame-pointeror asm units that do soNote that the second point may occur simply by, for example, glibc being compiled in such a way. Having the debug info for such system libraries installed could mitigate this (something like what the glibc-debug{info,source} packages are on openSUSE).
gdb has more control over the program than glibc, so glibc’s
backtracecall would naturally be unable to print a backtrace if gdb cannot do so either.But shipping the source would be much easier 🙂