I have a large dataset and have defined outliers to be those values which

Question

0

Asked: June 14, 20262026-06-14T04:40:45+00:00 2026-06-14T04:40:45+00:00

I have a large dataset and have defined outliers to be those values which

0

I have a large dataset and have defined outliers to be those values which fall either above the 99th or below the 1st percentile.

I’d like to take the mean of those outliers with their previous and following datapoints, then replace all 3 values with that average in a new dataset.

If there’s anyone who knows how to do this I’d be very grateful for a response.

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-06-14T04:40:46+00:00

If you have a list of indices specifying the outliers location in the vector, e.g. using:

out_idx = which(df$value > quan0.99)

You can do something like:

for(idx in out_idx) {
  vec[(idx-1):(idx+1)] = mean(vec[(idx-1):(idx+1)])
}

You can wrap this in a function, making the bandwith and the function an optional parameter:

average_outliers = function(vec, outlier_idx, bandwith, func = "mean") {
   # iterate over outliers
   for(idx in out_idx) {
    # slicing of arrays can be used for extracting information, or in this case,
    # for assiging values to that slice. do.call is used to call the e.g. the mean 
    # function with the vector as input.
    vec[(idx-bandwith):(idx+bandwith)] = do.call(func, out_idx[(idx-bandwith):(idx+bandwith)])
  }      
  return(vec)
}

allowing you to also use median with a bandwith of 2. Using this function:

# Call average_outliers multiple times on itself,
# first for the 0.99 quantile, then for the 0.01 quantile.
vec = average_outliers(vec, which(vec > quan0.99))
vec = average_outliers(vec, which(vec < quan0.01))

or:

vec = average_outliers(vec, which(vec > quan0.99), bandwith = 2, func = "median")
vec = average_outliers(vec, which(vec < quan0.01), bandwith = 2, func = "median")

to use a bandwith of 2, and replace with the median value.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a large dataset and have defined outliers to be those values which

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply