In my program, I need to know which pages are being accessed by the program periodically, So after each 0.5 or 1 second I check which pages were accessed and calculate a checksum from those page values.
I use mprotect function to mark memory areas which need to be seen and install SIGSEGV signal handler for each thread. During start of each period, I set the protection to PROT_READ and then when a page fault occurs, I give both read and write access to the page after noting its address.
However, I notice that this method makes the execution of my program pretty slow. Also since I am doing it for each thread, it further degrades the performance. Is there any way to make this procedure faster. Especially, is it possible to make this at process level, so for example if Thread A induces a page fault, it gives write to that page and when Thread B accesses it, it already has write permissions.
For faster method, the special compiler pass can be constructed. It will instrument memory access with changing a flag in the shadow memory. E.g. for each read or write operation, the compiler pass will add special write operation to some thread-specific area (shadow memory).
There is a http://code.google.com/p/address-sanitizer/ project, which works as additional pass to LLVM compiler. The additional memory (shadow memory) is 8 time less then used memory. AddressSanitizer uses this pass to detect accesses to uninitialized memory: http://code.google.com/p/address-sanitizer/wiki/AddressSanitizerAlgorithm The slowdown is only 1.5x-3x.
For page grained flagging, the shadow memory will be very small (1-4 bytes of shadow for 4096 bytes of usual memory).
If you don’t want to do this at compiler or can’t do this (e.g. closed-source applications), you can use the existing COW technique from the kernel: http://en.wikipedia.org/wiki/Copy-on-write Kernel does COW for each fork via memory access flags. You can fork a process at time t1, stop the child, wait 1 or 2 seconds and then compare mappings of stopped child (it has no writes, so the mapping is the same as it was at t1) and of your process (changed pages are remapped). This variant is faster, but it gives info only about writes and not about each thread actions.
Also, you can hack the COW page fault handler in kernel. This hack will be harder, but it will have information about which thread did a write.