How can we implement the system call using sysenter/syscall directly in x86 Linux? Can anybody provide help? It would be even better if you can also show the code for amd64 platform.
I know in x86, we can use
__asm__(
" movl $1, %eax \n"
" movl $0, %ebx \n"
" call *%gs:0x10 \n"
);
to route to sysenter indirectly.
But how can we code using sysenter/syscall directly to issue a system call?
I find some material http://damocles.blogbus.com/tag/sysenter/ . But still find it difficult to figure out.
First of all, you can’t safely use GNU C Basic
asm("");syntax for this (without input/output/clobber constraints). You need Extended asm to tell the compiler about registers you modify. See the inline asm in the GNU C manual and the inline-assembly tag wiki for links to other guides for details on what things like"D"(1)means as part of anasm()statement.You also need
asm volatilebecause that’s not implicit for Extendedasmstatements with 1 or more output operands.I’m going to show you how to execute system calls by writing a program that writes
Hello World!to standard output by using thewrite()system call. Here’s the source of the program without an implementation of the actual system call :You can see that I named my custom system call function as
my_writein order to avoid name clashes with the "normal"write, provided by libc. The rest of this answer contains the source ofmy_writefor i386 and amd64.i386
System calls in i386 Linux are implemented using the 128th interrupt vector, e.g. by calling
int 0x80in your assembly code, having set the parameters accordingly beforehand, of course. It is possible to do the same viaSYSENTER, but actually executing this instruction is achieved by the VDSO virtually mapped to each running process. SinceSYSENTERwas never meant as a direct replacement of theint 0x80API, it’s never directly executed by userland applications – instead, when an application needs to access some kernel code, it calls the virtually mapped routine in the VDSO (that’s what thecall *%gs:0x10in your code is for), which contains all the code supporting theSYSENTERinstruction. There’s quite a lot of it because of how the instruction actually works.If you want to read more about this, have a look at this link. It contains a fairly brief overview of the techniques applied in the kernel and the VDSO. See also The Definitive Guide to (x86) Linux System Calls – some system calls like
getpidandclock_gettimeare so simple the kernel can export code + data that runs in user-space so the VDSO never needs to enter the kernel, making it much faster even thansysentercould be.It’s much easier to use the slower
int $0x80to invoke the 32-bit ABI.As you can see, using the
int 0x80API is relatively simple. The number of the syscall goes to theeaxregister, while all the parameters needed for the syscall go into respectivelyebx,ecx,edx,esi,edi, andebp. System call numbers can be obtained by reading the file/usr/include/asm/unistd_32.h.Prototypes and descriptions of the functions are available in the 2nd section of the manual, so in this case
write(2).The kernel saves/restores all the registers (except EAX) so we can use them as input-only operands to the inline asm. See What are the calling conventions for UNIX & Linux system calls (and user-space functions) on i386 and x86-64
Keep in mind that the clobber list also contains the
memoryparameter, which means that the instruction listed in the instruction list references memory (via thebufparameter). (A pointer input to inline asm does not imply that the pointed-to memory is also an input. See How can I indicate that the memory *pointed* to by an inline ASM argument may be used?)amd64
Things look different on the AMD64 architecture which sports a new instruction called
SYSCALL. It is very different from the originalSYSENTERinstruction, and definitely much easier to use from userland applications – it really resembles a normalCALL, actually, and adapting the oldint 0x80to the newSYSCALLis pretty much trivial. (Except it uses RCX and R11 instead of the kernel stack to save the user-space RIP and RFLAGS so the kernel knows where to return).In this case, the number of the system call is still passed in the register
rax, but the registers used to hold the arguments now nearly match the function calling convention:rdi,rsi,rdx,r10,r8andr9in that order. (syscallitself destroysrcxsor10is used instead ofrcx, letting libc wrapper functions just usemov r10, rcx/syscall.)(See it compile on Godbolt)
Do notice how practically the only thing that needed changing were the register names, and the actual instruction used for making the call. This is mostly thanks to the input/output lists provided by gcc’s extended inline assembly syntax, which automagically provides appropriate move instructions needed for executing the instruction list.
The
"0"(callnum)matching constraint could be written as"a"because operand 0 (the"=a"(ret)output) only has one register to pick from; we know it will pick EAX. Use whichever you find more clear.Note that non-Linux OSes, like MacOS, use different call numbers. And even different arg-passing conventions for 32-bit.