I am implementing my own system call in linux. It is calling the rename system call inside it. It uses a user argument (below is the code) to pass the code to the rename.
Here is the basic code:
int sys_mycall(const char __user * inputFile) {
//
// Code to generate my the "fileName"
//
//
old_fs = get_fs();
set_fs(KERNEL_DS);
ans = sys_renameat(AT_FDCWD, fileName, AT_FDCWD, inputFile);
set_fs(old_fs);
return ans;
}
I have two doubts here.
- I am using the
old_fs = get_fs();,set_fs(KERNEL_DS);andset_fs(old_fs);to hack around the actual call tosys_renamebecause there was an error. I got the answer from this question: allocate user-space memory from kernel … Is this a right work around? - How to call otherwise a system call from a system call
EDIT:
int sys_myfunc(const char __user * inputFileUser) {
char inputFile[255];
int l = 0;
while(inputFileUser[l] != '\0') l++;
if(l==0)
return -10;
if(copy_from_user(inputFile,inputFileUser,l+1)< 0 ) return -20;
//
//GENERATE fileName here
//
//
char fileName[255];
return sys_renameat(AT_FDCWD, inputFile, AT_FDCWD, fileName);
}
The following still returns -1. Why? I copied the data to kernel space.
I wanted to show exactly how the correct way to achieve what footy wants, but my original answer grew too long, I decided to put the solution in a separate answer. I’ll split the code into parts, and explain what each fragment does.
Remember that since we reuse kernel code, the code in this post and the resulting function must be licensed under the GPLv2 license.
First, we start by declaring a one-parameter syscall.
In the kernel, stack space is a scarce resource. You do not create local arrays; you always use dynamic memory management. Fortunately, there are some very useful functions like
__getname(), so it is very little additional code. The important thing is to remember to release whatever memory you use when you are done with it.As this syscall is basically a variant of
rename, we reuse almost all of thefs/namei.c:sys_renameat()code. First, the local variable declarations. There are a lot, too; as I said, stack is scarce in kernel, and you won’t see much more local variables than this in any syscall function:The first change to the
sys_renameat()is on thechar *to = __getname();line above, already. It allocatesPATH_MAX+1bytes dynamically, and must be released using__putname()after it is no longer needed. This is the correct way to declare a temporary buffer for a file or directory name.To construct the new path (
to), we also need to be able to access the old name (from) directly. Because of the kernel-userspace barrier, we cannot just accessoldnamedirectly. So, we create an in-kernel copy of it:Although many C programmers have been taught that
gotois evil, this is the exception: error handling. Instead of having to remember all the cleanup we need to do (and we already need to do__putname(to)at minimum), we put the cleanup at the end of the function, and skip to the correct point,exitbeing the last one.errorholds the error number, of course.At this point of our function, we can access
from[0]up to the first'\0', or up to (and including)from[PATH_MAX], whichever is first. It is a normal kernel-side data, and is accessed in the ordinary fashion you would in any C code.You also have reserved the memory for the new name as
to[0]up to and includingto[PATH_MAX]. Remember to make sure it too is terminated using\0(into[PATH_MAX] = '\0'or an earlier index).After constructing the contents for
to, we need to do the path lookups. Unlikerenameat(), we cannot useuser_path_parent(). We can, however, look at whatuser_path_parent()does, and do the same work — adapting to our own needs, of course. It turns out it just callsdo_path_lookup()with error checking. So, the twouser_path_parent()calls and their error checks can be replaced withNote that
exit0is a new label not found in the originalrenameat(). We need a new label because atexit, we only haveto; but atexit0, we have bothtoandfrom. Afterexit0, we haveto,from, andoldnd, and so on.Next, we can reuse the bulk of
sys_renameat(). It does all the hard work at renaming. To conserve space, I’ll omit my ramblings on exactly what it does, since you can trust that ifrename()works, it’ll work too.At this point, all the work has been done, and only releasing the locks, memory, and so on taken by the code above, is left. If everything was successful at this point,
error == 0, and we do all cleanup. If we had a problem,errorcontains the error code, and we have jumped to the correct label to do the cleanup necessary to the point where the error occurred. If thevfs_rename()failed — it does the actual operation –, we doo all cleanup.However, compared to the original code, we got hold of
fromvery first (exit),tojust after (exit0), followed by the dentry lookups. So, we need to move releasing them to their correct locations (near the very end, since they were done first. Cleanups occur, of course, in the reverse order):And here we are done.
Of course, there are a lot of details to consider above in the parts we copied from
sys_renameat()— and like I said in the other answer, you should not just copy code like this, but refactor the common code into a helper function; that makes maintenance much easier. Fortunately, because we kept all the checks fromrenameat()— we do the path manipulation before any of therenameat()code was copied — we can be sure that all the necessary checks are done. It’s just as if the user specified the manipulated path herself and calledrenameat().If you were to do the modification after some checks have already been done, the situation would be much more complicated. You would have to think what those checks are, how your modifications impact on them, and almost always, re-do those checks.
To remind any reader, the reason you cannot just create a filename or any other string in your own syscall and then call another syscall, is that your just-created string resides on the kernel side of the kernel-userspace boundary, while syscalls expect the data to reside on the other, userspace side. While on x86 you can accidentally pierce the boundary from the kernel side, it does not mean you should do so: there are
copy_from_user()andcopy_to_user()and their derivatives likestrncpy_from_user()that must be used for this purpose. It is not a question of having to do magic to call another syscall, but about where (in-kernel, or userspace) the data supplied is.