I’m trying to do some machine learning stuff that involves a lot of factor-type

Question

0

Asked: May 27, 20262026-05-27T16:51:32+00:00 2026-05-27T16:51:32+00:00

I’m trying to do some machine learning stuff that involves a lot of factor-type

0

I’m trying to do some machine learning stuff that involves a lot of factor-type variables (words, descriptions, times, basically non-numeric stuff). I usually rely on randomForest but it doesn’t work w/factors that have >32 levels.

Can anyone suggest some good alternatives?

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-27T16:51:33+00:00

Tree methods won’t work, because the number of possible splits increases exponentially with the number of levels. However, with words this is typically addressed by creating indicator variables for each word (of the description etc.) – that way splits can use a word at a time (yes/no) instead of picking all possible combinations. In general you can always expand levels into indicators (and some models do that implicitly, such as glm). The same is true in ML for handling text with other methods such as SVM etc. So the answer may be that you need to think about your input data structure, not as much the methods. Alternatively, if you have some kind of order on the levels, you can linearize it (so there are only c-1 splits).

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I’m trying to do some machine learning stuff that involves a lot of factor-type

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply