Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7889829
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 3, 20262026-06-03T06:11:53+00:00 2026-06-03T06:11:53+00:00

I asked a question about this a few months back , and I thought

  • 0

I asked a question about this a few months back, and I thought the answer had solved my problem, but I ran into the problem again and the solution didn’t work for me.

I’m importing a CSV:

orders <- read.csv("<file_location>", sep=",", header=T, check.names = FALSE)

Here’s the structure of the dataframe:

str(orders)

'data.frame':   3331575 obs. of  2 variables:
 $ OrderID  : num  -2034590217 -2034590216 -2031892773 -2031892767 -2021008573 ...
 $ OrderDate: Factor w/ 402 levels "2010-10-01","2010-10-04",..: 263 263 269 268 301 300 300 300 300 300 ...

If I run the length command on the first column, OrderID, I get this:

length(orders$OrderID)
[1] 0

If I run the length on OrderDate, it returns correctly:

length(orders$OrderDate)
[1] 3331575

This is a copy/paste of the head of the CSV.

OrderID,OrderDate
-2034590217,2011-10-14
-2034590216,2011-10-14
-2031892773,2011-10-24
-2031892767,2011-10-21
-2021008573,2011-12-08
-2021008572,2011-12-07
-2021008571,2011-12-07
-2021008570,2011-12-07
-2021008569,2011-12-07

Now, if I re-run the read.csv, but take out the check.names option, the first column of the dataframe now has an X. at the start of the name.

orders2 <- read.csv("<file_location>", sep=",", header=T)

str(orders2)

'data.frame':   3331575 obs. of  2 variables:
 $ X.OrderID: num  -2034590217 -2034590216 -2031892773 -2031892767 -2021008573 ...
 $ OrderDate: Factor w/ 402 levels "2010-10-01","2010-10-04",..: 263 263 269 268 301 300 300 300 300 300 ...

length(orders$X.OrderID)
[1] 3331575

This works correctly.

My question is why does R add an X. to beginning of the first column name? As you can see from the CSV file, there are no special characters. It should be a simple load. Adding check.names, while will import the name from the CSV, will cause the data to not load correctly for me to perform analysis on.

What can I do to fix this?

Side note: I realize this is a minor – I’m just more frustrated by the fact that I think I am loading correctly, yet not getting the result I expected. I could rename the column using colnames(orders)[1] <- "OrderID", but still want to know why it doesn’t load correctly.

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-03T06:11:54+00:00Added an answer on June 3, 2026 at 6:11 am

    read.csv() is a wrapper around the more general read.table() function. That latter function has argument check.names which is documented as:

    check.names: logical.  If ‘TRUE’ then the names of the variables in the
             data frame are checked to ensure that they are syntactically
             valid variable names.  If necessary they are adjusted (by
             ‘make.names’) so that they are, and also to ensure that there
             are no duplicates.
    

    If your header contains labels that are not syntactically valid then make.names() will replace them with a valid name, based upon the invalid name, removing invalid characters and possibly prepending X:

    R> make.names("$Foo")
    [1] "X.Foo"
    

    This is documented in ?make.names:

    Details:
    
        A syntactically valid name consists of letters, numbers and the
        dot or underline characters and starts with a letter or the dot
        not followed by a number.  Names such as ‘".2way"’ are not valid,
        and neither are the reserved words.
    
        The definition of a _letter_ depends on the current locale, but
        only ASCII digits are considered to be digits.
    
        The character ‘"X"’ is prepended if necessary.  All invalid
        characters are translated to ‘"."’.  A missing value is translated
        to ‘"NA"’.  Names which match R keywords have a dot appended to
        them.  Duplicated values are altered by ‘make.unique’.
    

    The behaviour you are seeing is entirely consistent with the documented way read.table() loads in your data. That would suggest that you have syntactically invalid labels in the header row of your CSV file. Note the point above from ?make.names that what is a letter depends on the locale of your system; The CSV file might include a valid character that your text editor will display but if R is not running in the same locale that character may not be valid there, for example?

    I would look at the CSV file and identify any non-ASCII characters in the header line; there are possibly non-visible characters (or escape sequences; \t?) in the header row also. A lot may be going on between reading in the file with the non-valid names and displaying it in the console which might be masking the non-valid characters, so don’t take the fact that it doesn’t show anything wrong without check.names as indicating that the file is OK.

    Posting the output of sessionInfo() would also be useful.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I had asked a question about this earlier, but it didn't get answered right
A few days ago I asked this question about jquery ajax function invoking action
I asked a question about Garbage Collection in Java in this topic . But
This is probably a long shot but I asked a question about converting one
This question has been asked in a C++ context but I'm curious about Java.
I had asked this question in a much more long-winded way a few days
I posted about this a few weeks ago, but I don't think I asked
I asked a similar question about this previously, but I did not specify that
I asked a question about different testing frameworks yesterday. This question can be found
Yesterday I asked this general question about decimals and their internal precisions. Here is

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.