When we are using any decision tree algorithm and our data set consists of

Question

0

Editorial Team

Asked: May 22, 20262026-05-22T23:42:36+00:00 2026-05-22T23:42:36+00:00

When we are using any decision tree algorithm and our data set consists of

0

When we are using any decision tree algorithm and our data set consists of numerical values.

I have found that the results provided by the program splits the node on values that are not even exist in the data set

Example:
Classifications Results

attrib2 <= 3.761791861252009 : groupA
attrib2 > 3.761791861252009 : groupB

where as the in my dataset there is no value for attrib2 like 3.76179.
Why it is like that?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-22T23:42:37+00:00

There are several ways to choose an attribute. And not all of them choose values in the data set.

A common one (though a bit simplistic) is to take the mean. It is possible that 3.76179… is the mean of all attrib2 of your data set.

For example, if your data set is 1 dimensional, and is made of the value -10, -9, .. -2, -1, 1, 2, ..9, 10 then a good splitting value would be 0, even though it’s not in your data set.

Another possibility, especially if you’re dealing with random forests (several decision trees) is that the splitting value is chosen at random, with a probability distribution centered around the median value. Some algorithms decide to split according to a gaussian centered on the mean/median value and with deviation equal to the standard deviation of the data set.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

When we are using any decision tree algorithm and our data set consists of

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply