I have a vectorization Q in R using matrices. I have 2 Cols that need to be regressed against each using certain indices. Data is
matrix_senttoR = [ ...
0.11 0.95
0.23 0.34
0.67 0.54
0.65 0.95
0.12 0.54
0.45 0.43 ] ;
indices_forR = [ ...
1
1
1
2
2
2 ] ;
Col1 in matrix is data for say MSFT and GOOG (3 rows each) and Col2 is the return from benchmark StkIndex, on corresponding dates. The data is in matrix format as it is sent from Matlab.
I currently use
slope <- by( data.frame(matrix_senttoR), indices_forR, FUN=function(x)
{zyp.sen (X1~X2,data=x) $coeff[2] } )
betasFac <- sapply(slope , function(x) x+0)
I’m using data.frame above as I could not use cbind(). If I use cbind() then Matlab gives an error as it doesn’t understand that format of data. I’m running these commands from inside Matlab (http://www.mathworks.com/matlabcentral/fileexchange/5051). You can replace zyp (zyp.sen) with lm.
BY is slow here (may be because of dataframes?). Is there a better way to do it? It takes 14secs+ for 150k rows of data. Can I instead use matrix-vectorization in R? Thanks.
I still think that you are overcomplicating things by moving from MATLAB to R and back. And passing 150k rows of data must be slowing things down considerably.
zyp.senis actually pretty trivial to port to MATLAB. Here you go:I checked this using the R’s
example(zyp.sen), and it gives the same answer.You should really do some further checking though, just to be sure.