Let say I have the following data in R. training = factor(c(1,1,3,2,1,3,2,34,67,34)) test =

Question

0

Asked: June 5, 20262026-06-05T19:37:52+00:00 2026-06-05T19:37:52+00:00

Let say I have the following data in R. training = factor(c(1,1,3,2,1,3,2,34,67,34)) test =

0

Let say I have the following data in R.

training = factor(c(1,1,3,2,1,3,2,34,67,34))
test = factor(c(1,1,2,30,65,30))

(my data is much more complicated, this is a simplification)

I want to check if the levels in the test set exist in the training set, and if not to replace it by the nearest value in the training set.
For example, the levels 30 and 65 in test set do not exist in training set, so I want to replace them by 34 and 67 respectively.

Currently, I created the following code.

replacefactor <- function(dat,new_factor,near_factor) {
if (!(near_factor %in% levels(dat))){
    levels(dat) <- c(levels(dat),near_factor)
}
dat[dat==new_factor] <- near_factor
dat <- factor(dat)
}

test <- replacefactor(test,30,34)
test <- replacefactor(test,65,67)

It works, but I need to specify the levels by hand. This is not practical for me due to the size of my data.

I am not sure how I could find the nearest value in the training set.
I could then use a for loop to automate it.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-05T19:37:54+00:00

first get the levels that aren’t matched:

test.missing <- levels(test)[!levels(test) %in% levels(training)]

then write a function to run along them and find the nearest match:

myfun <- function(x, y) {
  levels(y)[which.min(abs(as.integer(levels(y)) - as.integer(x)))]
}

> unlist(lapply(test.missing, myfun, training))
[1] "34" "67"

Then this can be assigned to the correct levels:

levels(test)[!levels(test) %in% levels(training)] <- unlist(lapply(test.missing, myfun, training))

> levels(test)
[1] "1"  "2"  "34" "67"

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

Let say I have the following data in R. training = factor(c(1,1,3,2,1,3,2,34,67,34)) test =

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply