I am running same code on unix (its a cluster) and on windows (intel core duo ,2Gb RAM). I can see there is significant difference in running time. I can see in unix, its using only one core, but on windows, it may be using two cores (i m not sure). My concern is the following:
Windows:
user system elapsed
207.12 8.82 472.04
Unix
user system elapsed
327.765 2.493 330.819
what I dont understand, why there is too much difference in cpu processing time and elapsed time for windows. I broke the code into segments and this happens only on reading and writing part (I/O), rest of calculations are very fast comparative to unix and doesnt have any difference in ‘user’ and ‘elapsed’ time
user system elapsed
48.765 0.00 52.69
I am not doing any thing special, but I m reading very big file some 300mb
indata <- read.csv(mutFile, sep="\t", header = TRUE)
How can I avoid this difference to improve overall performance?
To get high performance in reading a dataset, I would recommend buying a solid state drive (SSD). However, your other hardware (mainly your SATA controller) might be a bottle neck. Also, SSD’s are not cheap in terms of Gb/unit money. In general, the difference in performance can be explained by the difference in hardware (‘normal’ harddrive vs laptop harddrive). The solution is to spend money on a faster machine. Alternatively, like @JoshuaUlrich said, spend some time optimizing reading of text files to get good performance boosts with your current hardware.