I have been playing around with yasm in an attempt to grasp a basic understanding of x86 assembly. From my tests, it seems you call functions from the kernel by setting the EAX register with the number of the function you want. Then, you push the function arguments onto the stack and issue a syscall (0x80) to execute the instruction. This is Mac OS X / BSD style, I know Linux uses registers to hold arguments instead of using the stack. Does this sound right? Is this the basic idea?
I am a little confused because where are the functions documented? How would I know what arguments, and in what order, to push them onto the stack? Should I look in syscall.h for the answers? It seems there would be a specific reference for supported kernel calls other than C headers.
Also, do standard C functions like printf() rely on the kernel’s built-in functions for say, writing to stdout? In other words, does the C compiler know what the kernel functions are and is it trying to “figure out” how to take C code and translate it to kernel functions (which the assembler then translates to machine code)?
C code -> C compiler -> kernel calls / asm -> assembler -> machine binary
I’m sure these are really basic questions, but my understanding of everything that happens after the C compiler is rather muddy.
System Call Documentation
Make sure you have the XCode Developer Tools installed for the UNIX manpages for Mac OS X and then run
man 2 introon the commandline. For a list of system calls, you can use syscall.h (which is useful for the system call numbers) or you can runman 2 syscalls. Then to look up each specific system call, you can runman 2 syscall_namei.e. for read, you can runman 2 read.UNIX manpages are a historically significant documentation reference for UNIX systems. Pretty much any low-level POSIX function or system call will be documented using them, as well as most commands. Section 2 covers just system calls, and so when you run
man 2 pagename, you’re asking for the manpage in the system calls section. Section 3 also deals with library functions, so you can runman 3 sprintfthe next time you want to read about sprintf.How C Libraries relate to System Calls
As for how C libraries implement their functionality, usually they build everything on top of system calls, especially in UNIX-like operating systems. malloc internally uses mmap() or brk() on a lot of platforms to get a hold of the actual memory for your process and I/O functions will often use buffers with read, write calls. If there’s some other mechanism or library providing the needed functionality, they may also choose to use those instead (i.e. some C libraries for DOS may make use of direct BIOS interrupts instead of calling only DOS interrupts, whereas C libraries for Windows might use Win32 API calls).
Often only a subset of the library functions will need system calls or underlying mechanisms to be implemented though, since the remainder can be written in terms of that subset.
To actually know what’s going on with your specific implementation, you should investigate what’s happening in a debugger (just keep stepping into all the function calls) or browse the source code of the C library you’re using.
How your C code using C libraries relates to machine code
In your question you also suggested:
This is combining two very different concepts. Functions and function calls are supported at the machine code and assembly level, so your C code has a very direct mapping to machine code:
That is, the compiler translates your function calls in C to function calls in Assembly and system calls in C to system calls in Assembly.
However on most platforms, that machine code contains references to shared libraries and functions in those libraries, so your machine code might have a function that calls other functions from a shared library. The OS then loads that shared library’s machine code (if it hasn’t been loaded yet for something else) and then runs the machine code for the library function. Then if that library function calls system calls via interrupts, the kernel receives the system call request and does low-level operations directly with the hardware or the BIOS.
So in a protected mode OS, your machine code can be seen as doing the following:
You can, of course, call system calls directly in your program as well, skipping the need for any libraries, but unless you’re writing your own library, there’s usually very little need to do this. If you want even lower-level access, you have to write kernel-level code such as drivers or kernel subsystems.