I think I have the opposite problem as described here. I have one process writing data to a log, and I want a second process to read it, but I don’t want the 2nd process to be able to modify the contents. This is potentially a large file, and I need random access, so I’m using python’s mmap module.
If I create the mmap as read/write (for the 2nd process), I have no problem creating ctypes object as a “view” of the mmap object using from_buffer. From a cursory look at the c-code, it looks like this is a cast, not a copy, which is what I want. However, this breaks if I make the mmap ACCESS_READ, throwing an exception that from_buffer requires write privileges.
I think I want to use ctypes from_address() method instead, which doesn’t appear to need write access. I’m probably missing something simple, but I’m not sure how to get the address of the location within an mmap. I know I can use ACCESS_COPY (so write operations show up in memory, but aren’t persisted to disk), but I’d rather keep things read only.
Any suggestions?
Ok, from looking at the mmap .c code, I don’t believe it supports this use case. Also, I found that the performance pretty much sucks – for my use case. I’d be curious what kind performance others see, but I found that it took about 40 sec to walk through a binary file of 500 MB in Python. This is creating a mmap, then turning the location into a ctype object with from_buffer(), and using the ctypes object to decipher the size of the object so I could step to the next object. I tried doing the same thing directly in c++ from msvc. Obviously here I could cast directly into an object of the correct type, and it was fast – less than a second (this is with a core 2 quad and ssd).
I did find that I could get a pointer with the following
This doesn’t get around the original problem – the mmap isn’t read-only, since I still need to use from_buffer for the first call. In this config, it still took around 40 sec to process the whole file, so it looks like the conversion from a pointer into ctypes structs is killing the performance. That’s just a guess, but I don’t see a lot of value in tracking it down further.
I’m not sure my plan will help anyone else, but I’m going to try to create a c module specific to my needs based on the mmap code. I think I can use the fast c-code handling to index the binary file, then expose only small parts of the file at a time through calls into ctypes/python objects. Wish me luck.
Also, as a side note, Python 2.7.2 was released today (6/12/11), and one of the changes is an update to the mmap code so that you can use a python long to set the file offset. This lets you use mmap for files over 4GB on 32-bit systems. See Issue #4681 here