For a personal project I need to run several machine learning algorithms against different texts in order to classify them.
I used to do this using RapidMiner but I decided to move all my development to R as I feel I have more control with it.
The issue I am seeing now (which I did not notice with RapidMiner) is that loading the models is taking a lot of time.
For example:
I have a model which checks if it the text refers to sports.
The model is 37.7 MB and it takes 8:34 with my 2.2 GH i7 Mac with 4GB of RAM
The way I am calling the model is the following:
fileNameMatrix = paste(query,query1,"-matrix.Rd", sep ="")
fileNameModel= paste(query,query1,"-model.Rd", sep ="")
load(fileNameMatrix)
load(fileNameModel)
The model was generated using RTextTools
Those query variables you read are because I need to call almost 20 models and compare them against different datasets. That is why although 8 minutes is not a lot, when I read all of them its almost 3 hours just on loading which makes my task almost useless considering its an almost real time task.
Which factors should I consider to reduce loading time if reducing the size of the model is not an option?
One other thing I consider suspicious is that while the matrix file is rather small 64KB the model is still 37.7MB. Is it possible that the model file is bigger than necessary? Have anyone experienced something similar using RTextTools?
This is one of my firsts tasks using models in R so excuse me if I am doing somethings which is obviously wrong.
Thanks a lot for your time and any tip in the right direction will be much appreciated!
Have you checked the RAM usage in your Activity Monitor? Compressed
RDatafiles are relatively tiny, but they uncompress to be massive. For instance, ann x nmatrix of all0‘s will take up essentially no space for anyn(that may explain your small matrix size). Your loaded model might then be huge; I have someRDatafiles that amount to maybe200 MBbut that cannot be loaded in memory inR. This could become a problem if you’re running low on RAM, as your computer may attempt to use drive space to load the files.