Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8772015
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 13, 20262026-06-13T17:56:08+00:00 2026-06-13T17:56:08+00:00

A very unexpected behavior of the useful data.frame in R arises from keeping character

  • 0

A very unexpected behavior of the useful data.frame in R arises from keeping character columns as factor. This causes many problems if it is not considered. For example suppose the following code:

foo=data.frame(name=c("c","a"),value=1:2)
#   name val
# 1    c   1
# 2    a   2

bar=matrix(1:6,nrow=3)
rownames(bar)=c("a","b","c")
#   [,1] [,2]
# a    1    4
# b    2    5
# c    3    6

Then what do you expect of running bar[foo$name,]? It normally should return the rows of bar that are named according to the foo$name that means rows ‘c’ and ‘a’. But the result is different:

bar[foo$name,]
#   [,1] [,2]
# b    2    5
# a    1    4

The reason is here: foo$name is not a character vector, but an integer vector.

foo$name
# [1] c a
# Levels: a c

To have the expected behavior, I manually convert it to character vector:

foo$name = as.character(foo$name)
bar[foo$name,]
#   [,1] [,2]
# c    3    6
# a    1    4

But the problem is that we may easily miss to perform this, and have hidden bugs in our codes. Is there any better solution?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-13T17:56:09+00:00Added an answer on June 13, 2026 at 5:56 pm

    This is a feature and R is working as documented. This can be dealt with generally in a few ways:

    1. use the argument stringsAsFactors = TRUE in the call to data.frame(). See ?data.frame
    2. if you detest this behaviour so, set the option globally via

      options(stringsAsFactors = FALSE)
      
    3. (as noted by @JoshuaUlrich in comments) a third option is to wrap character variables in I(....). This alters the class of the object being assigned to the data frame component to include "AsIs". In general this shouldn’t be a problem as the object inherits (in this case) the class "character" so should work as before.

    You can check what the default for stringsAsFactors is on the currently running R process via:

    > default.stringsAsFactors()
    [1] TRUE
    

    The issue is slightly wider than data.frame() in scope as this also affects read.table(). In that function, as well as the two options above, you can also tell R what all the classes of the variables are via argument colClasses and R will respect that, e.g.

    > tmp <- read.table(text = '"Var1","Var2"
    + "A","B"
    + "C","C"
    + "B","D"', header = TRUE, colClasses = rep("character", 2), sep = ",")
    > str(tmp)
    'data.frame':   3 obs. of  2 variables:
     $ Var1: chr  "A" "C" "B"
     $ Var2: chr  "B" "C" "D"
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I think i follow instruction from this site very carefully http://source.android.com/source/downloading.html but when i
I'm getting some very unexpected (I think) behavior with the Typed Factory facility. Basically,
I am receiving unexpected behavior with a jQuery event handler. It is very similar
I've got very unexpected result from Pin Tool, my tool looks for CALL/RET instructions
Ok, this is very unexpected and it annoys me. I have function called default()
Working on a NodeJS project, I came a across this very unexpected behaviour that
I have a very simple query that's giving me unexpected results. Hints on where
Very new to python and can't understand why this isn't working. I have a
Very similar to this , however, in my case, the value of the label
I've got a very weird and unexpected problem. empty() is returning TRUE on a

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.