Let’s say that you have a normally distributed variable y with a 3-group categorical

Question

0

Asked: May 29, 20262026-05-29T09:11:24+00:00 2026-05-29T09:11:24+00:00

Let’s say that you have a normally distributed variable y with a 3-group categorical

0

Let’s say that you have a normally distributed variable y with a 3-group categorical predictor x that has the orthogonal contrasts c1 and c2. I am trying to create a program in R that, given x, c1, and c2, creates y such that c1 and c2 have effect sizes r1 and r2 specified by the user.

For example, let’s say that x, c1, c2, r1, and r2 were created like the following:

x <- factor(rep(c(1, 2, 3), 100))
contrasts(x) <- matrix(c(0, -.5, .5, -2/3, 1/3, 1/3), 
  nrow = 3, ncol = 2, dimnames = list(c("1", "2", "3"), c("c1", "c2")))

contrasts(x)
    c1         c2
1  0.0 -0.6666667
2 -0.5  0.3333333
3  0.5  0.3333333

r1 <- .09
r2 <- 0

I would like the program to create y such that the variance in y accounted for by c1 equals r1 (.09) and the variance in y accounted for by c2 equals r2 (0).

Does anybody know how I might go about this? I know that I should be using the rnorm function, but I’m stuck on which population means / sds rnorm should use when it does its sampling.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-29T09:11:25+00:00

Courtesy of some generous advice from my colleagues, I now have one function that creates simulated data given a specified number of groups, a set of contrasts, a set of regression coefficients, a specified N per cell, and a specified within-group variance

sim.factor <- function(levels, contr, beta, perCell, errorVar){
  # Build design matrix X
  X <- cbind(rep(1,levels*perCell), kronecker(contr, rep(1,perCell)))
  # Generate y
  y <- X %*% beta + rnorm(levels*perCell, sd=sqrt(errorVar))
  # Build and return data frame
  dat <- cbind.data.frame(y, X[,-1])
  names(dat)[-1] <- colnames(contr)
  return(dat)
}

I also wrote a function that, given a set of regression coefficients, N per cell, number of groups, set of orthogonal contrasts, desired delta-R^2 for a contrast of interest, returns the required within-group variance:

ws.var <- function(levels, contr, beta, perCell, dc){
  # Build design matrix X
  X <- cbind(rep(1,levels), contr)
  # Generate the expected means
  means <- X %*% beta
  # Find the sum of squares due to each contrast 
  var <- (t(means) %*% contr)^2 / apply(contr^2 / perCell, 2, sum)
  # Calculate the within-conditions sum of squares
  wvar <- var[1] / dc - sum(var)
  # Convert the sum of squares to variance
  errorVar <- wvar / (3 * (perCell - 1))
  return(errorVar)
}

After doing some testing as follows, the functions seem to generate the desired delta R^2 for contrast c1.

contr <- contr.helmert(3)
colnames(contr) <- c("c1","c2")
beta <- c(0, 1, 0)
perCell <- 50
levels = 3
dc <- .08
N <- 1000

# Calculate the error variance
errorVar <- ws.var(levels, contr, beta, perCell, dc)

# To store delta R^2 values
d1 <- vector("numeric", length = N)

# Use the functions
for(i in 1:N)
{
   d <- sim.factor(levels=3,
                   contr=contr,
                   beta=beta,
                   perCell=perCell,
                   errorVar=errorVar)
   d1[i] <- lm.sumSquares(lm(y ~ c1 + c2, data = d))[1, 2] # From the lmSupport package
}

m <- round(mean(d1), digits = 3)

bmp("Testing simulation functions.bmp")
hist(d1, xlab = "Percentage of variance due to c1", main = "")
text(.18, 180, labels = paste("Mean =", m))
dev.off()

Patrick

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Let’s say that you have a normally distributed variable y with a 3-group categorical

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply