I’m playing with kretprobes and I am facing a problem. I would like, in response to certain events from a user process (e.g. specific syscalls), read data from that process address space. Since in the kretprobe entry handler we’re in interrupt context, I can’t possibly get the user pages from here (it may sleep) so I defer the work in the system_rq (schedule_work()).
To be sure that the user process won’t change its memory before my deferred work is done, I put it in TASK_INTERRUPTIBLE and use set_tsk_need_resched(). I was expecting that during the iret, the flag would be tested and the scheduler would elect another task. It seems like it does not work like that and the user task is back on the cpu right after the interrupt, changing its memory before I had a chance to look at it.
Is there something else to do to ensure the task switch occurs directly after the iret?
Thanks in advance
Well I found today that it is actually the good way to do this. The problem I had with the process carrying on running was because I was not in interrupt context: the kprobe was optimized (i.e. a jmp instruction instead of an int3 on x86) which caused my code to be executed in user context in kernel land. This should have been handled smoothly if the kprobe_optimized() function had worked correctly, in which case we can call schedule() directly after setting the task to INTERRUPTIBLE instead of ireturning and letting the prologue of the interrupt handler check the flag
TIF_NEED_RESCHED. Indeed thekprobe_optimized()returns false in any cases if it is a kretprobe, which is due to the way a kretprobe is handled internally: it uses an aggregator of kprobes, which flag for optimized is set correctly for the aggregator but not for the kprobes within the list. I workarounded this by exporting the functionget_kprobe()and using it to retrieve the address of the kprobe aggregator, from which I am finally able to check correctly if it is optimized or not.I think the best way (performance wise) to fix this in the kernel is to replicate the optimized flag from the aggregator to each kprobe it lists. This way the
kprobe_optimized()will return the proper value. Another way to do this would be to add more code inkprobe_optimized()to check if this kprobe is part of an aggregator list and check the aggregator rather than the actual kprobe.Anyway this was fun!