I am trying to understand how to use the package mmap
to access large csv files. More precisely, I’d like to
- Create a
mmapobject from acsvfile withmmap.csv(); - Save the file created by
mmap.csv()containing the data in binary format; - Be able to “map the binary data back to R” using the function
mmap().
Achieving 1. and 2. is easy: just use mmap.cv() and save the tempfile()
that contains the binary data, or modify mmap.cv() to accept an extra parameter
as output file (and modify the line tmpstruct <- tempfile() accordingly).
What I am having trouble with is 3. In particular, I need to construct a
C-struct for the records in the binary data from the mmap object.
Here is a simple reproducible example:
# create mmap object with its file
library(mmap)
data(cars)
m <- as.mmap(cars, file="cars.Rmap")
colnames(m) <- colnames(cars)
str(m)
munmap(m)
The information from str() can be used to construct the C-struct
record.struct that allows mapping the binary file cars.Rmap
via the function mmap.
> str(m)
<mmap:temp.Rmap> (struct) struct [1:50, 1:2] 4 ...
data :<externalptr>
bytes : num 400
filedesc : Named int 27
- attr(*, "names")= chr "temp.Rmap"
storage.mode :List of 2
$ speed:Classes 'Ctype', 'int' atomic (0)
.. ..- attr(*, "bytes")= int 4
.. ..- attr(*, "signed")= int 1
$ dist :Classes 'Ctype', 'int' atomic (0)
.. ..- attr(*, "bytes")= int 4
.. ..- attr(*, "signed")= int 1
- attr(*, "bytes")= int 8
- attr(*, "offset")= int [1:2] 0 4
- attr(*, "signed")= logi NA
- attr(*, "class")= chr [1:2] "Ctype" "struct"
pagesize : num 4096
dim :NULL
In this case, we need two 4-byte integers:
# load from disk
record.struct <- struct(speed = integer(), # int32(), 4 byte int
dist = integer() # int32(), 4 byte int
)
m <- mmap("temp.Rmap", mode=record.struct)
Inferring the right C-struct can be very impractical for “wide” csv files (i.e. files with tens or hundreds of columns). Here is my question:
How can one construct record.struct directly
from the mmap object m?
A more or less complete example of what you are asking – using mmap and mmap.csv
As a previous answer mentions, m$storage.mode is the mode you are needing.
You could go one step further and store the mode in a file using some naming convention of your devising. You could also create a custom binary object utilizing the len and off args to mmap.