Here is my problem, just hard for me… I want to generate multiple datasets,

Question

0

Asked: May 25, 20262026-05-25T10:22:43+00:00 2026-05-25T10:22:43+00:00

Here is my problem, just hard for me… I want to generate multiple datasets,

0

Here is my problem, just hard for me…

I want to generate multiple datasets, then apply a function to these datasets and output corresponding output in single or multiple dataset (whatever possible)…

My example, although I need to generate a large number of variables and datasets

seed <- round(runif(10)*1000000)

datagen <- function(x){
set.seed(x)
var <- rep(1:3, c(rep(3, 3)))
yvar <- rnorm(length(var), 50, 10)
matrix <- matrix(sample(1:10, c(10*length(var)), replace = TRUE), ncol = 10)
mydata <- data.frame(var, yvar, matrix)
}

gdt <- lapply (seed,  datagen) 

# resulting list (I believe is correct term) has 10 dataframes: 
# gdt[1] .......to gdt[10]

# my function, this will perform anova in every component data frames and 
#output probability coefficients...  
anovp <- function(x){
          ind <- 3:ncol(x) 
          out <- lm(gdt[x]$yvar ~ gdt[x][, ind[ind]])
          pval <- out$coefficients[,4][2]
          pval <- do.call(rbind,pval) 
         }

plist <- lapply (gdt,  anovp) 

Error in gdt[x] : invalid subscript type 'list'

This is not working, I tried different options. But could not figure out…finally decided to bother experts, sorry for that…

My questions are:

(1) Is this possible to handle such situation in this way or there are other alternatives to handle such multiple datasets created?

(2) If this is right way, how can I do it?

Thank you for attention and I will appreciate your help…

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-25T10:22:43+00:00

You have the basic idea right, in that you should create a list of data frames and then use lapply to apply the function to each element of the list. Unfortunately, there are several oddities in your code.

There is no point in randomly generating a seed, then setting it. You only need to use set.seed in order to make random numbers reproducible. Cut the lines

seed <- round(runif(10)*1000000)

and maybe

set.seed(x)

rep(1:3, c(rep(3, 3))) is the same as rep(1:3, each = 3).

Don’t call your variables var or matrix, ~~since they will mask the names of those functions.~~ since it’s confusing.

3:ncol(x) is dangerous. If x has less than 3 columns it doesn’t do what you think it does.

… and now, the problem you actually wanted solving.

The problem is in the line out <- lm(gdt[x]$yvar ~ gdt[x][, ind[ind]]).

lapply passes data frames into anovp, not indicies, so x is a data frame in gdt[x]. Which throws an error.

One more thing. While you are rewriting that line, note that lm takes a data argument, so you don’t need to do things like gdt$some_column; you can just reference some_column directly.

EDIT: Further advice.

You appear to always use the formula yvar ~ X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9 + X10. Since its the same each time, create it before your call to lapply.

independent_vars <- paste(colnames(gdt[[1]])[-1:-2], collapse = " + ")
model_formula <- formula(paste("yvar", independent_vars, sep = " ~ "))

I probably wouldn’t bother with the anovp function. Just do

models <- lapply(gdt, function(data) lm(model_formula, data))

Then include a further call to lapply to play with the coefficients if necessary. The next line replicates your anovp code, but won’t work because model$coefficients is a vector (so the dimensions aren’t right). Adjust to retrieve the bit you actualy want.

coeffs <- lapply(models, function(model) do.call(rbind, model$coefficients[,4][2]))

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Here is my problem, just hard for me… I want to generate multiple datasets,

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply