I’ve been playing around with memory mapping today on VC++ 2008 and I still haven’t completely understood how to use it or if it’s correct for my purposes. My goal here is to quickly read a very large binary file.
I have a struct:
typedef struct _data
{
int number;
char character[512];
float *entries;
}Data;
which is written many many times into a file. the “entries” variable is an array of floating point decimals. After writing this file (10000 Data structs with each “entries” array being 90000 floats), I tried to memory map this file with the following function so that I could read the data faster. Here’s what I have so far:
void readDataMmap(char *fname, //name of file containing my data
int arraySize, //number of values in struct Data
int entrySize) //number of values in each "entries" array
{
//Read and mem map the file
HANDLE hFile = INVALID_HANDLE_VALUE;
HANDLE hMapFile;
char* pBuf;
int fd = open(fname, O_RDONLY);
if(fd == -1){
printf("Error: read failed");
exit(-1);
}
hFile = CreateFile((TCHAR*)fname,
GENERIC_READ, // open for reading
0, // do not share
NULL, // default security
OPEN_EXISTING, // existing file only
FILE_ATTRIBUTE_NORMAL, // normal file
NULL); // no template
if (hFile == INVALID_HANDLE_VALUE)
{
printf("First CreateFile failed"));
return (1);
}
hMapFile = CreateFileMapping(hFile,
NULL, // default security
PAGE_READWRITE,
0, // max. object size
0, // buffer size
NULL); // name of mapping object
if(hMapFile == ERROR_FILE_INVALID){
printf("File Mapping failed");
return(2);
}
pBuf = (char*) MapViewOfFile(hMapFile, // handle to map object
FILE_MAP_READ, // read/write permission
0,
0,
0); //Was NULL, 0 should represent full file bytesToMap size
if (pBuf == NULL)
{
printf("Could not map view of file\n");
CloseHandle(hMapFile);
return 1;
}
//Allocate data structure
Data *inData = new Data[arraySize];
for(int i = 0; i<arraySize; i++)inData[i].entries = new float[entrySize];
int pos = 0;
for(int i = 0; i < arraySize; i++)
{
//This is where I'm not sure what to do with the memory block
}
}
At the end of the function, after the memory is mapped and I’m returned a pointer to the beginning of the memory block “pBuf”, I don’t know what to do to be able to read this memory block back into my data structure. So eventually I would like to transfer this block of memory back into my array of 10000 Data struct entries. Ofcourse, I could be doing this completely wrong…
Dealing with a memory mapped file is really no different than dealing with any other kind of pointer to memory. The memory mapped file is just a block of data that you can read and write to from any process using the same name.
I’m assuming you want to load the file into a memory map and then read and update it at will there and dump it to a file at some regular or known interval right? If that’s the case then just read from the file and copy the data to the memory map pointer and that’s it. Later you can read data from the map and cast it into your memory aligned structure and use your structure at will.
If I was you I’d probably create a few helper methods like
data ReadData(void *ptr)and
void WriteData(data *ptrToData, void *ptr)Where
*ptris the memory map address and*ptrToDatais a pointer to your data structure to write to memory. Really at this point it doesn’t matter if its memory mapped or not, if you wanted to read from the file loaded into local memory you could do that too.You can read/write to it the same exact way you would with any other block data using memcpy to copy data from the source to the target and you can use pointer arithmetic to advance the location in the data. Don’t worry about the “memory map”, its just a pointer to memory and you can treat it as such.
Also, since you are going to be dealing with direct memory pointers you don’t need to write each element into mapped file one by one, you can write them all in one batch like
memcpy(mapPointer, data->entries, sizeof(float)*number)Which copies float*entries size from
data->entriesinto the map pointer start address. Obviously you can copy it however you want and wherever you want, this is just an example. See http://www.devx.com/tips/Tip/13291.To read the data back in what you would do is something similar, but you want to explicity copy memory addresses to a known location, so imagine flattening your structure out. Instead of
Where your pointers point to other memory elsewhere, copy the memory like this
So this way you can “re-serialize” the data from the memory map to your local. Remember, array’s are just pointers to memory. The memory doesn’t have to be sequential in the object since it could have been allocated at another time. You need to make sure to copy the actual data the pointers are pointing to to your memory map. A common way of doing this is to write the object straight into the memory map, then follow the object with all the flattened arrays. Reading it back in you first read the object, then increment the pointer by
sizeof(object)and read in the next array, then increment the pointer again byarraysizeetc.Here is an example:
You’ll probably also want to read up on data alignment when writing data from a struct to memory. Check working with packing structures, C++ struct alignment question, and data structure alignment.
If you don’t know the size of the data ahead of time when reading you should write the size of the data into a known position in the beginning of the memory map for later use.
Anyways, to address the fact of whether its right or not to use it here I think it is. From wikipedia
You’re going to load the whole thing into virtual memory and then the OS can page the file in and out of memory for you as you need it, creating a “lazy loading” mechanism.
All that said, memory maps are shared, so if its across process boundaries you’ll want to synchronize them with a named mutex so you don’t overwrite data between processes.