I am trying to adapt some (to me) very complicated code to work with my data.
I think the crux of my problem is that some of my variables lose dimension when I begin with a two-dimensional matrix, and I need to know how to make variables retain their dimensions.
I am starting with two variables, e (a data.frame), a portion of which looks like this:
e <-
structure(list(X2hr = c(0.106, 0, 0, 0, 0.01, 0.042), X6hr = c(1,
0.083, 0.006, 0, 1, 0.967), X12hr = c(0.049, 0.057, 0.098, 0.405,
0.046, 0.029), X24hr = c(0.264, 0.301, 0.025, 0.15, 0.58, 0.487
), X36hr = c(0.284, 1, 0.114, 1, 0.671, 1), X48hr = c(0.274,
0.235, 0.299, 0.253, 0.617, 0.636), X72hr = c(0.098, 0.021, 1,
0.325, 0.283, 0.35)), .Names = c("X2hr", "X6hr", "X12hr", "X24hr",
"X36hr", "X48hr", "X72hr"), row.names = c("cgd1_10", "cgd1_100",
"cgd1_1000", "cgd1_1010", "cgd1_1020", "cgd1_1030"), class = "data.frame")
and m (a 2-dimensional matrix, with one column and 2913 rows), a portion of which looks like this:
m <-
structure(c(0, 0, 1.174805088, 1.174805088, 0, 0), .Dim = c(6L,
1L), .Dimnames = list(c("cgd1_10", "cgd1_100", "cgd1_1000", "cgd1_1010",
"cgd1_1020", "cgd1_1030"), "X4_1110_2.motif2"))
I load the glmnet package define two functions, IDC.glmnet and PBM.glmnet.getCoefs:
library(glmnet)
IDC.glmnet <- function(e, m, mode="coef", randomize=F, alpha=0.5) {
nona <- !is.na(e)
enona <- e[nona]
mnona <- m[nona,]
if(ncol(m)==1)
dim(mnona) <- c(sum(nona),ncol=1)
e.cv <- cv.glmnet( mnona, enona, nfolds=10)
l <- e.cv$lambda.min
#print(l)
if (randomize == TRUE) {
enona <- sample(enona)
}
e.fits <- glmnet( mnona, enona, family="gaussian", alpha=alpha, nlambda=100)
if (mode == "predict") {
cor.test(predict(e.fits, mnona, type="response", s=l), enona)$estimate
} else {
as.matrix(predict(e.fits, s=l, type="coefficients")[-1,])
}
}
PBM.glmnet.getCoefs <- function(e, m, alpha=0.05, randomize=F, center=FALSE) {
e.coef <<- apply(e, 2, IDC.glmnet, m, mode="coefficients",
alpha=alpha, randomize=randomize)
if (dim(e)[2] > 1) {
e.coef.s <- t(apply(e.coef, 1, scale, center=center))
} else {
e.coef.s <- e.coef
}
rownames(e.coef.s) <- colnames(m)
colnames(e.coef.s) <- colnames(e)
e.coef.s
}
Then I try to execute PBM.glmnet.getCoefs on my variables:
coefs <- PBM.glmnet.getCoefs(e, m)
And I get the following error message:
Error in t(apply(e.coef, 1, scale, center = center)) :
error in evaluating the argument 'x' in selecting a method for function 't':
Error in apply(e.coef, 1, scale, center = center) :
dim(X) must have a positive length
The problem occurs when I use a single-column matrix for m. If I have multiple columns, it works fine. But I can’t use multiple columns because it skews the results, and I really need to be able to use a single-column m. From my limited troubleshooting abilities, I think that this line in the PBM.glmnet.getCoefs function is where the trouble begins:
e.coef <<- apply(e, 2, IDC.glmnet, m, mode="coefficients",
alpha=alpha, randomize=randomize)
e.coef is a vector when I use a single-column m. Then since e.coef is dimensionless, I get the error in t(apply) listed above.
e.coef looks like this:
> e.coef
X2hr X6hr X12hr X24hr X36hr X48hr
0.025701875 0.004066947 0.043836383 0.020151361 0.003512643 -0.035211133
X72hr
-0.034503722
How can I make sure that e.coef retains the proper dimensions (1 row and 7 columns, column headings taken from top row of e, row values determined somewhere in the IDC.glmnet function)?
You correctly identified the line causing the issue. The problem is described in the Value section of
?apply: “‘apply’ returns a vector if ‘MARGIN’ has length 1”.So make this small change to ensure the dimensions are correct: