I am working on a project where I have a lot of analysts creating statistical models in R. They usually provide me with the model objects (.Rdata files) and I automate executing them for various datasets.
My problem is:
-
Can I use a database and save these .RData files there ? Any hints on doing this? ( I currently store the .Rdata files to disk and use a database to store location information)
-
I get a lot of R scripts from other analysts who have done some pre-processing of data before they create the models. Does anyone have experience using PMML to make this process repeatable without manual intervention ? PMML stores the pre-processing steps, modeling steps as markup tags, and would repeat the same on a new dataset.
Thank you for the suggestions and feedback.
-Harsh
Yes, this is possible using eg MySQL linked to R with the
RMySQLandDBIpackage, or via theRODBCorRJDBCpackage. I’m not 100% sure if they all support blobs, but worst case scenario you could use the ascii representation and put them in a text field.The trick is using the function
serialize()Now you can store or retrieve obj in a database. It’s actually no more than a vector of ascii (or binary) codes. ascii=F gives you a binary representation. After retrieving it, you use :
Edit : regarding the pmml, there’s a
pmmlpackage on CRAN. Maybe that one gets you somewhere?