This is a follow up question from R: t-test over all columns Suppose I

Question

0

Asked: May 31, 20262026-05-31T08:50:53+00:00 2026-05-31T08:50:53+00:00

This is a follow up question from R: t-test over all columns Suppose I

0

This is a follow up question from R: t-test over all columns

Suppose I have a huge data set, and then I created numerous subsets based on certain conditions. The subsets should have the same number of columns. Then I want to do t-test on two subsets at a time (outer loop) and then for each combination of subsets go through all columns one column at a time (inner loop).

Here is what I have come up with based on previous answer. This one stops with an error.

C <- c("c1","c1","c1","c1","c1",
   "c2","c2","c2","c2","c2",
   "c3","c3","c3","c3","c3",
   "c4","c4","c4","c4","c4",
   "c5","c5","c5","c5","c5",
   "c6","c6","c6","c6","c6",
   "c7","c7","c7","c7","c7",
   "c8","c8","c8","c8","c8",
   "c9","c9","c9","c9","c9",
   "c10","c10","c10","c10","c10")
X <- rnorm(n=50, mean = 10, sd = 5)
Y <- rnorm(n=50, mean = 15, sd = 6)
Z <- rnorm(n=50, mean = 20, sd = 5)
Data <- data.frame(C, X, Y, Z)

Data.c1 = subset(Data, C == "c1",select=X:Z)
Data.c2 = subset(Data, C == "c2",select=X:Z)
Data.c3 = subset(Data, C == "c3",select=X:Z)
Data.c4 = subset(Data, C == "c4",select=X:Z)
Data.c5 = subset(Data, C == "c5",select=X:Z)

Data.Subsets = c("Data.c1",
                 "Data.c2",
                 "Data.c3",
                 "Data.c4",
                 "Data.c5") 

library(plyr)

combo1 <- combn(length(Data.Subsets),1)
adply(combo1, 1, function(x) {

  combo2 <- combn(ncol(Data.Subsets[x]),2)
  adply(combo2, 2, function(y) {

      test <- t.test( Data.Subsets[x][, y[1]], Data.Subsets[x][, y[2]], na.rm=TRUE)

      out <- data.frame("Subset" = rownames(Data.Subsets[x]),
                    , "Row" = colnames(x)[y[1]]
                    , "Column" = colnames(x[y[2]])
                    , "t.value" = round(test$statistic,3)
                    ,  "df"= test$parameter
                    ,  "p.value" = round(test$p.value, 3)
                    )
      return(out)
  } )
} )

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-31T08:50:54+00:00

First of all, you can more easily define you dataset using gl, and by avoiding creating individual variables for the columns.

Data <- data.frame(
  C = gl(10, 5, labels = paste("c", 1:10, sep = "")),
  X = rnorm(n = 50, mean = 10, sd = 5),
  Y = rnorm(n = 50, mean = 15, sd = 6),
  Z = rnorm(n = 50, mean = 20, sd = 5)
)

Convert this to “long” format using melt from the reshape package. (You can also use the base reshape function.)

longData <- melt(Data, id.vars = "C")

Now Use pairwise.t.test to compute t tests on all pairs of X/Y/Z for for each level of C.

with(longData, pairwise.t.test(value, interaction(C, variable)))

Note that it is important to use pairwise.t.test rather than just lots of individual calls to t.test because you need to adjust your p values if you run lots of tests. (See, e.g., xkcd for explanation.)

In general, pairwise t tests are inferior to a regression so be careful about their usage.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

This is a follow up question from R: t-test over all columns Suppose I

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply