I have made a start to create some training and test sets using 10

Question

0

Asked: June 3, 20262026-06-03T08:02:05+00:00 2026-06-03T08:02:05+00:00

I have made a start to create some training and test sets using 10

0

I have made a start to create some training and test sets using 10 fold crossvalidation for an artificial dataset:

rows <- 1000

X1<- sort(runif(n = rows, min = -1, max =1))
occ.prob <- 1/(1+exp(-(0.0 + 3.0*X1)))
true.presence <- rbinom(n = rows, size = 1, prob = occ.prob)

# combine data as data frame and save
data <- data.frame(X1, true.presence)

id <- sample(1:10,nrow(data),replace=TRUE)
ListX <- split(data,id) 
fold1 <- data[id==1,] 
fold2 <- data[id==2,] 
fold3 <- data[id==3,] 
fold4 <- data[id==4,] 
fold5 <- data[id==5,] 
fold6 <- data[id==6,] 
fold7 <- data[id==7,] 
fold8 <- data[id==8,] 
fold9 <- data[id==9,] 
fold10 <- data[id==10,] 

trainingset <- subset(data, id %in% c(2,3,4,5,6,7,8,9,10))
testset <- subset(data, id %in% c(1))

I am just wondering whether there are easier ways to achieve this and how I could perform stratified crossvalidation which ensures that the class priors (true.presence) are roughly the same in all folds?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-03T08:02:06+00:00

I’m sure that (a) there’s a more efficient way to code this, and (b) there’s almost certainly a function somewhere in a package that will just return the folds, but here’s some simple code that gives you an idea of how one might do this:

rows <- 1000

X1<- sort(runif(n = rows, min = -1, max =1))
occ.prob <- 1/(1+exp(-(0.0 + 3.0*X1)))
true.presence <- rbinom(n = rows, size = 1, prob = occ.prob)

# combine data as data frame and save
dat <- data.frame(X1, true.presence)

require(plyr)
createFolds <- function(x,k){
    n <- nrow(x)
    x$folds <- rep(1:k,length.out = n)[sample(n,n)]
    x
}

folds <- ddply(dat,.(true.presence),createFolds,k = 10)

#Proportion of true.presence in each fold:
ddply(folds,.(folds),summarise,prop = sum(true.presence)/length(true.presence))

   folds      prop
1      1 0.5049505
2      2 0.5049505
3      3 0.5100000
4      4 0.5100000
5      5 0.5100000
6      6 0.5100000
7      7 0.5100000
8      8 0.5100000
9      9 0.5050505
10    10 0.5050505

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have made a start to create some training and test sets using 10

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply