I was running discriminant analysis using R. The code is the following: fit <-

Question

0

Asked: May 25, 20262026-05-25T11:29:26+00:00 2026-05-25T11:29:26+00:00

I was running discriminant analysis using R. The code is the following: fit <-

0

I was running discriminant analysis using R. The code is the following:

fit <- lda(group~ A+C1_1+C2+D1a_1+D2_1+D3_1+D3_2+D3_3+E1a_1+E1b_1+E1b_2+E2_1+E3_1+E3_2+E3_3+F2+G_1+G_2+G_3+G_4+H1_1+H2a_1+H2b_1+H3_1+H4_1_1+H1_2+H2a_2+H2b_2+ H3_2+H4_1_2+J1_1+J2_1+J3_1+K1a+K2_1+K2_2+K2_3+K2_4,data=data1)

But unfortunately I was getting the following error:

Error in x - group.means[g, ] : non-conformable arrays

This is the str(data1) output:

'data.frame':   210 obs. of  133 variables:

 $ A               : int  1 1 1 1 1 1 1 2 1 2 ...

 $ C1_1            : int  22 29 12 12 25 15 30 20 30 15 ...
 $ C2              : int  2 2 2 2 2 2 2 1 2 2 ...
 $ D1a_1           : int  40 50 160 15 150 105 150 45 100 80 ...
 $ D2_1            : int  100 100 100 100 100 100 100 90 95 100 ...
 $ D3_1            : int  5 15 40 10 30 25 30 40 25 60 ...
 $ D3_2            : int  10 30 30 15 30 25 60 40 20 10 ...
 $ D3_3            : int  10 30 30 10 10 15 10 20 20 30 ...
 $ E1a_1           : int  80 25 140 30 150 120 80 30 100 100 ...
 $ E1b_1           : int  100 50 50 25 80 70 80 75 10 75 ...
 $ E1b_2           : int  0 50 50 75 20 30 20 25 90 25 ...
 $ E2_1            : int  20 60 75 70 60 80 75 100 60 80 ...
 $ E3_1            : int  5 20 20 5 30 20 25 25 10 30 ...
 $ E3_2            : int  10 20 40 15 30 20 50 50 10 30 ...
 $ E3_3            : int  10 20 15 10 10 20 25 25 10 40 ...
 $ G_1             : int  5 50 20 25 80 10 30 25 35 5 ...
 $ G_2             : int  0 10 50 50 10 10 30 30 30 10 ...
 $ G_3             : int  90 30 20 25 10 50 5 30 15 80 ...
 $ G_4             : int  5 10 10 0 0 30 35 15 20 5 ...
 $ H1_1            : int  1 3 3 2 3 2 3 3 2 3 ...
 $ H2a_1           : int  NA NA NA 1 NA 2 NA NA 1 NA ...

 $ H2b_1           : int  NA 2 1 NA 2 NA 1 1 NA 1 ...

 $ H3_1            : int  2 2 2 2 2 3 3 3 2 2 ...

 $ H4_1_1          : int  6 5 7 6 3 6 5 6 5 5 ...

 $ J1_1            : int  4 6 4 4 4 4 6 7 3 3 ...
 $ J2_1            : int  2 6 5 3 4 4 1 2 3 3 ...
 $ J3_1            : int  4 5 3 3 4 4 6 7 3 4 ...

 $ K1a             : int  2 2 2 2 2 2 2 2 1 1 ...

 $ K2_1            : int  NA NA NA NA NA NA NA NA 0 0 ...
 $ K2_2            : int  NA NA NA NA NA NA NA NA 1 0 ...
 $ K2_3            : int  NA NA NA NA NA NA NA NA 0 1 ...
 $ K2_4            : int  NA NA NA NA NA NA NA NA 0 0 ...

  [list output truncated

]]

Second, can anybody please tell me how to get the significance level of the variables used in the discriminant analysis.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-25T11:29:27+00:00

Works fine with a randomly generated data set with no NA values:

set.seed(101)
z <- matrix(runif(210*133),nrow=210)
zz <- data.frame(A=sample(1:2,size=210,replace=TRUE),z)
m <- MASS::lda(A~.,data=zz)

I can reproduce the error if I add enough NAs:

z2 <- z
z2[sample(length(z),size=2000)] <- NA
zz2 <- data.frame(A=sample(1:2,size=210,replace=TRUE),z2)
m <- MASS::lda(A~.,data=zz2)

results in

Error in x - group.means[g, ] : non-conformable arrays

(if I knock out fewer I get warnings about collinearity instead)

For a start, try removing all variables with any NA values (or those with more than a few) and see if you can get it to work.

For the p value part of the question: googling “+r MASS lda discriminant analysis” leads to http://www.statmethods.net/advstats/discriminant.html and suggests (and provides a link to) MANOVA for these p values.

Based on a little bit of googling, it looks like people usually use MANOVA with Wilks’ lambda for tests in the context of LDA: for example, http://userwww.sfsu.edu/~efc/classes/biol710/discrim/discrim.pdf says

Discriminant function analysis is broken into a 2-step process: (1) testing
significance of a set of discriminant functions, and; (2) classification. The first
step is computationally identical to MANOVA.

They go on to show an example of using Wilks’ lambda, although ?manova says that the Pillai-Bartlett test (which is the default in manova) may be better … in any case, it’s pretty easy to do the test.

> summary(manova(z~zz$A),test="Wilks")
           Df   Wilks approx F num Df den Df Pr(>F)
zz$A        1 0.38164  0.92587    133     76 0.6545
Residuals 208

This of course is not exactly what you asked for — you asked (I think) for the significance level associated with individual variables rather than with the overall test. I can imagine you could do something via appropriately multiplicity-corrected logistic regression, but this is turning into a statistical rather than an R question. If you don’t get any further answers here you might consider asking an appropriately reformulated question on http://stats.stackexchange.com , referencing this question …

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I was running discriminant analysis using R. The code is the following: fit <-

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply