Say I have two instances of an application, with the same inputs and same execution sequence. Therefore, one instance is a redundant one and is used for comparing data in memory with the other instance, as a kind of error detection mechanism.
Now, I want all memory allocations and deallocations to happen in exactly the same manner in the two processes. What is the easiest way to achieve that? Write my own malloc and free? And what about memories allocated with other functions such as mmap?
I’m wondering what you are trying to achieve. If your process is deterministic, then the pattern of allocation / deallocation should be the same.
The only possible difference could be the address returned by
malloc. But you should probably not depend on them (the easiest way being not using pointers as key map or other data structure). And even then, there should only be difference if the allocation is not done throughsbrk(the glibc use anonymousmmapfor large allocations), or if you are usingmmap(as by default the address is selected by the kernel).If you really want to have exactly the same address, one option is to have a large static buffer and to write a custom allocator that does use memory from this buffer. This has the disadvantage of forcing you to know beforehand the maximum amount of memory you’ll ever need. In a non-PIE executable (
gcc -fno-pie -no-pie), a static buffer will have the same address every time. For a PIE executable you can disable the kernel’s address space layout randomization for loading programs. In a shared library, disabling ASLR and running the same program twice should lead to the same choices by the dynamic linker for where to map libraries.If you don’t know before hand the maximum size of the memory you want to use, or if you don’t want to recompile each time this size increase, you can also use
mmapto map a large anonymous buffer at a fixed address. Simply pass the size of the buffer and the address to use as parameter to your process and use the returned memory to implement your ownmallocon top of it.By using
MAP_FIXED, we are telling the kernel to replace any existing mappings that overlap with this new one atbuf_addr.(Editor’s note:
MAP_FIXEDis probably not what you want. Specifyingbuf_addras a hint instead ofNULLalready requests that address if possible. WithMAP_FIXED,mmapwill either return an error or the address you gave it. Themalloc_buffer != (void*)but_addrcheck makes sense for the non-FIXEDcase, which won’t replace an existing mapping of your code or a shared library or anything else. Linux 4.17 introducedMAP_FIXED_NOREPLACEwhich you can use to make mmap return an error instead of memory at the wrong address you don’t want to use. But still leave the check in so your code works on older kernels.)If you use this block to implement your own malloc and don’t use other non-deterministic operation in your code, you can have complete control of the pointer values.
This suppose that your pattern usage of malloc / free is deterministic. And that you don’t use libraries that are non-deterministic.
However, I think a simpler solution is to keep your algorithms deterministic and not to depend on addresses to be. This is possible. I’ve worked on a large scale project were multiple computer had to update state deterministically (so that each program had the same state, while only transmitting inputs). If you don’t use pointer for other things than referencing objects (most important things is to never use pointer value for anything, not as a hash, not as a key in a map, …), then your state will stay deterministic.
Unless what you want to do is to be able to snapshot the whole process memory and do a binary diff to spot divergence. I think it’s a bad idea, because how will you know that both of them have reached the same point in their computation? It is much more easier to compare the output, or to have the process be able to compute a hash of the state and use that to check that they are in sync because you can control when this is done (and thus it become deterministic too, otherwise your measurement is non-deterministic).