Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 988249
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: May 16, 20262026-05-16T05:36:45+00:00 2026-05-16T05:36:45+00:00

One of the basic data types in R is factors. In my experience factors

  • 0

One of the basic data types in R is factors. In my experience factors are basically a pain and I never use them. I always convert to characters. I feel oddly like I’m missing something.

Are there some important examples of functions that use factors as grouping variables where the factor data type becomes necessary? Are there specific circumstances when I should be using factors?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-05-16T05:36:45+00:00Added an answer on May 16, 2026 at 5:36 am

    You should use factors. Yes they can be a pain, but my theory is that 90% of why they’re a pain is because in read.table and read.csv, the argument stringsAsFactors = TRUE by default (and most users miss this subtlety). I say they are useful because model fitting packages like lme4 use factors and ordered factors to differentially fit models and determine the type of contrasts to use. And graphing packages also use them to group by. ggplot and most model fitting functions coerce character vectors to factors, so the result is the same. However, you end up with warnings in your code:

    lm(Petal.Length ~ -1 + Species, data=iris)
    
    # Call:
    # lm(formula = Petal.Length ~ -1 + Species, data = iris)
    
    # Coefficients:
    #     Speciessetosa  Speciesversicolor   Speciesvirginica  
    #             1.462              4.260              5.552  
    
    iris.alt <- iris
    iris.alt$Species <- as.character(iris.alt$Species)
    lm(Petal.Length ~ -1 + Species, data=iris.alt)
    
    # Call:
    # lm(formula = Petal.Length ~ -1 + Species, data = iris.alt)
    
    # Coefficients:
    #     Speciessetosa  Speciesversicolor   Speciesvirginica  
    #             1.462              4.260              5.552  
    

    Warning message: In model.matrix.default(mt, mf, contrasts) :

    variable Species converted to a factor

    One tricky thing is the whole drop=TRUE bit. In vectors this works well to remove levels of factors that aren’t in the data. For example:

    s <- iris$Species
    s[s == 'setosa', drop=TRUE]
    #  [1] setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa
    # [11] setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa
    # [21] setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa
    # [31] setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa
    # [41] setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa
    # Levels: setosa
    s[s == 'setosa', drop=FALSE]
    #  [1] setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa
    # [11] setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa
    # [21] setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa
    # [31] setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa
    # [41] setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa
    # Levels: setosa versicolor virginica
    

    However, with data.frames, the behavior of [.data.frame() is different: see this email or ?"[.data.frame". Using drop=TRUE on data.frames does not work as you’d imagine:

    x <- subset(iris, Species == 'setosa', drop=TRUE)  # susbetting with [ behaves the same way
    x$Species
    #  [1] setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa
    # [11] setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa
    # [21] setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa
    # [31] setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa
    # [41] setosa setosa setosa setosa setosa setosa setosa setosa setosa setosa
    # Levels: setosa versicolor virginica
    

    Luckily you can drop factors easily with droplevels() to drop unused factor levels for an individual factor or for every factor in a data.frame (since R 2.12):

    x <- subset(iris, Species == 'setosa')
    levels(x$Species)
    # [1] "setosa"     "versicolor" "virginica" 
    x <- droplevels(x)
    levels(x$Species)
    # [1] "setosa"
    

    This is how to keep levels you’ve selected out from getting in ggplot legends.

    Internally, factors are integers with an attribute level character vector (see attributes(iris$Species) and class(attributes(iris$Species)$levels)), which is clean. If you had to change a level name (and you were using character strings), this would be a much less efficient operation. And I change level names a lot, especially for ggplot legends. If you fake factors with character vectors, there’s the risk that you’ll change just one element, and accidentally create a separate new level.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

One of the basic data structures in Python is the dictionary, which allows one
Considering the basic data types like char, int, float, double etc..in any standard language
A company's internal c++ coding standards document states that even for basic data types
I have started learning HTTPUNIT and found one basic example. In this example it
Here one more basic question asked in MS interview recently class A { public
I have a few model classes with basic one-to-many relationships. For example, a book
Very basic question - how to get one value from a generator in Python?
One update: I tried using the SummaryRow on the datagrid for its basic functionalities
Basic question : How to I create a bidirectional one-to-many map in Fluent NHibernate?
I have a basic question about implementing a Entity code first one to many

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.