I am a newbie to R. Assume the memory layout is the same for data frame and matrix.
In the following matrix
a=matrix(1:10000000,1000000,10)
it has 1M rows and 10 columns. Is the memory for row or for column sequential physically? Or is the physical memory first store [1,1],[2,1],[3,1],,[1M,1],[2,1] or [1,2],[1,2],..[1,10],[2,1]…?
Suppose the matrix with 10M element is of size 100M, and the L2 cache is 4M, then L2 cache can’t store all these 10M element. If we process the data sequentially, we will have less L2 cache missing ratio. For our case, we need to process row by row and read several columns at the same time, such as column A, B, C, and then create some result. If the layout of the memory is first store 10 items in 1st row, then store 10 items in the 2nd row, then the performance might be better.
If there any way to control the memory layout?
A matrix is simply a vector with a
dimattribute. The elements of the matrix are stored in the vector in column-major order. There is no way to change this.Therefore, if you need to operate row-by-row, it’s faster to transpose the matrix before looping over it.
If you want to dig deeper, the code for
colSums,rowSums,colMeans, androwMeansis in thedo_colsumfunction insrc/main/array.c.