We get core files from running our software on a Customer’s box. Unfortunately because we’ve always compiled with -O2 without debugging symbols this has lead to situations where we could not figure out why it was crashing, we’ve modified the builds so now they generate -g and -O2 together. We then advice the Customer to run a -g binary so it becomes easier to debug.
I have a few questions:
- What happens when a core file is generated from a Linux distro other than the one we are running in Dev? Is the stack trace even meaningful?
- Are there any good books for debugging on Linux, or Solaris? Something example oriented would be great. I am looking for real-life examples of figuring out why a routine crashed and how the author arrived at a solution. Something more on the intermediate to advanced level would be good, as I have been doing this for a while now. Some assembly would be good as well.
Here’s an example of a crash that requires us to tell the Customer to get a -g ver. of the binary:
Program terminated with signal 11, Segmentation fault.
#0 0xffffe410 in __kernel_vsyscall ()
(gdb) where
#0 0xffffe410 in __kernel_vsyscall ()
#1 0x00454ff1 in select () from /lib/libc.so.6
...
<omitted frames>
Ideally I’d like to solve find out why exactly the app crashed – I suspect it’s memory corruption but I am not 100% sure.
Remote debugging is strictly not allowed.
Thanks
It the executable is dynamically linked, as yours is, the stack GDB produces will (most likely) not be meaningful.
The reason: GDB knows that your executable crashed by calling something in
libc.so.6at address0x00454ff1, but it doesn’t know what code was at that address. So it looks into your copy oflibc.so.6and discovers that this is inselect, so it prints that.But the chances that
0x00454ff1is also in select in your customers copy oflibc.so.6are quite small. Most likely the customer had some other procedure at that address, perhapsabort.You can use
disas select, and observe that0x00454ff1is either in the middle of instruction, or that the previous instruction is not aCALL. If either of these holds, your stack trace is meaningless.You can however help yourself: you just need to get a copy of all libraries that are listed in
(gdb) info sharedfrom the customer system. Have the customer tar them up with e.g.Then, on your system:
A much better approach is:
-g -O2 -o myexe.dbgstrip -g myexe.dbg -o myexemyexeto customerscore, usemyexe.dbgto debug itYou’ll have full symbolic info (file/line, local variables), without having to ship a special binary to the customer, and without revealing too many details about your sources.