I was running discriminant analysis using R. The code is the following:
fit <- lda(group~ A+C1_1+C2+D1a_1+D2_1+D3_1+D3_2+D3_3+E1a_1+E1b_1+E1b_2+E2_1+E3_1+E3_2+E3_3+F2+G_1+G_2+G_3+G_4+H1_1+H2a_1+H2b_1+H3_1+H4_1_1+H1_2+H2a_2+H2b_2+ H3_2+H4_1_2+J1_1+J2_1+J3_1+K1a+K2_1+K2_2+K2_3+K2_4,data=data1)
But unfortunately I was getting the following error:
Error in x - group.means[g, ] : non-conformable arrays
This is the str(data1) output:
'data.frame': 210 obs. of 133 variables:
$ A : int 1 1 1 1 1 1 1 2 1 2 ...
$ C1_1 : int 22 29 12 12 25 15 30 20 30 15 ...
$ C2 : int 2 2 2 2 2 2 2 1 2 2 ...
$ D1a_1 : int 40 50 160 15 150 105 150 45 100 80 ...
$ D2_1 : int 100 100 100 100 100 100 100 90 95 100 ...
$ D3_1 : int 5 15 40 10 30 25 30 40 25 60 ...
$ D3_2 : int 10 30 30 15 30 25 60 40 20 10 ...
$ D3_3 : int 10 30 30 10 10 15 10 20 20 30 ...
$ E1a_1 : int 80 25 140 30 150 120 80 30 100 100 ...
$ E1b_1 : int 100 50 50 25 80 70 80 75 10 75 ...
$ E1b_2 : int 0 50 50 75 20 30 20 25 90 25 ...
$ E2_1 : int 20 60 75 70 60 80 75 100 60 80 ...
$ E3_1 : int 5 20 20 5 30 20 25 25 10 30 ...
$ E3_2 : int 10 20 40 15 30 20 50 50 10 30 ...
$ E3_3 : int 10 20 15 10 10 20 25 25 10 40 ...
$ G_1 : int 5 50 20 25 80 10 30 25 35 5 ...
$ G_2 : int 0 10 50 50 10 10 30 30 30 10 ...
$ G_3 : int 90 30 20 25 10 50 5 30 15 80 ...
$ G_4 : int 5 10 10 0 0 30 35 15 20 5 ...
$ H1_1 : int 1 3 3 2 3 2 3 3 2 3 ...
$ H2a_1 : int NA NA NA 1 NA 2 NA NA 1 NA ...
$ H2b_1 : int NA 2 1 NA 2 NA 1 1 NA 1 ...
$ H3_1 : int 2 2 2 2 2 3 3 3 2 2 ...
$ H4_1_1 : int 6 5 7 6 3 6 5 6 5 5 ...
$ J1_1 : int 4 6 4 4 4 4 6 7 3 3 ...
$ J2_1 : int 2 6 5 3 4 4 1 2 3 3 ...
$ J3_1 : int 4 5 3 3 4 4 6 7 3 4 ...
$ K1a : int 2 2 2 2 2 2 2 2 1 1 ...
$ K2_1 : int NA NA NA NA NA NA NA NA 0 0 ...
$ K2_2 : int NA NA NA NA NA NA NA NA 1 0 ...
$ K2_3 : int NA NA NA NA NA NA NA NA 0 1 ...
$ K2_4 : int NA NA NA NA NA NA NA NA 0 0 ...
[list output truncated
]]
Second, can anybody please tell me how to get the significance level of the variables used in the discriminant analysis.
Works fine with a randomly generated data set with no
NAvalues:I can reproduce the error if I add enough
NAs:results in
(if I knock out fewer I get warnings about collinearity instead)
For a start, try removing all variables with any
NAvalues (or those with more than a few) and see if you can get it to work.For the p value part of the question: googling “+r MASS lda discriminant analysis” leads to http://www.statmethods.net/advstats/discriminant.html and suggests (and provides a link to) MANOVA for these p values.
Based on a little bit of googling, it looks like people usually use MANOVA with Wilks’ lambda for tests in the context of LDA: for example, http://userwww.sfsu.edu/~efc/classes/biol710/discrim/discrim.pdf says
They go on to show an example of using Wilks’ lambda, although
?manovasays that the Pillai-Bartlett test (which is the default inmanova) may be better … in any case, it’s pretty easy to do the test.This of course is not exactly what you asked for — you asked (I think) for the significance level associated with individual variables rather than with the overall test. I can imagine you could do something via appropriately multiplicity-corrected logistic regression, but this is turning into a statistical rather than an R question. If you don’t get any further answers here you might consider asking an appropriately reformulated question on http://stats.stackexchange.com , referencing this question …