I’d like the compiler to output a file containing the pointers to all global variables in the source code it is compiling, and also the sizes of them.
Is this possible? Is there a way to do it in any c compiler?
Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.
Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.
Lost your password? Please enter your email address. You will receive a link and will create a new password via email.
Please briefly explain why you feel this question should be reported.
Please briefly explain why you feel this answer should be reported.
Please briefly explain why you feel this user should be reported.
This information is available in the symbol table of the binary, though it might not mean what you expect it to.
The compiler takes one or more source files, compiles the code to object code, and generates an object file (.o on Unix, .obj on Windows). All variables and functions referenced in the source file are mentioned in the symbol table. Variables and functions that are defined in the source file have specific addresses and sizes, while symbols not defined in the source file are marked as undefined and must be linked later. All symbols are listed relative to a particular section. Common sections are “.text” for executable code, “.bss” for variables that are initialized to zero when the program starts, and “.data” for variables initialized with non-zero values.
The linker takes one or more object files, combines the sections (putting all of code and data from each object file into one big section for code and data), and writes an output file. This output file may be an executable, or it may be a shared library. An executable on disk still doesn’t have a pointer for each variable; it still stores the offset from the beginning of the section to the variable.
When an executable is run, the operating system’s dynamic loader reads the executable, finds each section, and allocates memory for that section. (It may also set up different permissions on each section — the “.text” segment is often marked as read-only, and (on processors that support it) data segments are sometimes marked as non-executable.) Only then does a variable get a pointer — when the code needs to access a particular variable, it adds the address of the beginning of the section to the offset from the beginning of the section to get the pointer.
You can use various tools to investigate each binary’s symbol table. The GNU toolchain’s
objdump(used on Linux) is one such tool.For a simple C hello-world program:
I compile (but don’t link) it on my Linux box:
Now I can look at the symbol table:
The first column is the address of each symbol, relative to the beginning of the section. Each symbol has various flags, and some of the symbols are used as hints to the rest of the toolchain and the debugger. (If I built with debugging symbols, I’d see many entries devoted to them as well.) My simple program has only one variable:
The fifth column tells me the symbol
messageis size 0xe — 14 bytes.