I’m working on some stuff where I want to memory map some large files containing numeric data. The problem is that the data can be a number of formats, including real byte/short/int/long/float/double and complex byte/short/int/long/float/double. Naturally handling all those types all the time quickly gets unwieldy, so I was thinking of implementing a memory mapping interface that can do real-time type conversion for the user.
I really like the idea of mapping a file so you get a pointer in memory back, doing whatever you need and then unmapping it. No bufferology or anything else needed. So a function that reads the data and does the type conversion for me would take a lot away from that.
I was thinking I could memory map the file being operated on, and then simultaneously mapping an anonymous file, and somehow catching page fetches/stores and doing the type conversion on demand. I’ll be working on 64-bit so this would give you a 63-bit address space in these cases, but oh well.
Does anyone know if this sort of mmap hooking would be possible, and if so, how might it be accomplished?
Yes(-ish). You can create inaccessible
mmapregions. Whenever anybody tries to touch one, handle theSIGSEGVraised by fixing its permissions, filling it, and resuming.(Untested, but something along these lines ought to work…)
If you don’t want to fill a whole page at once, that’s still doable I think… the third argument can be cast to a
ucontext_t *, with which you can decode the instruction being executed and fix it up as if it had performed the expected operation, while leaving the memorryPROT_NONEto catch further accesses… but it’ll be a lot slower since you’re trapping every access rather than just the first.