Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 8619453
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 12, 20262026-06-12T06:20:36+00:00 2026-06-12T06:20:36+00:00

I am new to data analysis with R. I recently got a pre-formatted environmental

  • 0

I am new to data analysis with R. I recently got a pre-formatted environmental observation-model dataset, an example subset of which is shown below:

date                   site      obs    mod      site           obs    mod
2000-09-01 00:00:00    campus    NA     61.63    city centre    66     56.69
2000-09-01 01:00:00    campus    52     62.55    city centre    NA     54.75
2000-09-01 02:00:00    campus    52     63.52    city centre    56     54.65

Basically, the data include the time series of hourly observed and modelled concentrations of a pollutant at various sites in “reoccurring columns”, i.e., site – obs – mod (in the example I only showed 2 out of the total 75 sites). I read this “wide” dataset in as a data frame, and wanted to reshape it into the “narrower” format as:

date                   site           obs    mod
2000-09-01 00:00:00    campus         NA     61.63
2000-09-01 01:00:00    campus         52     62.55
2000-09-01 02:00:00    campus         52     63.52
2000-09-01 00:00:00    city centre    66     56.69
2000-09-01 01:00:00    city centre    NA     54.75
2000-09-01 02:00:00    city centre    56     54.65

I believed that I should use the package “reshape2” to do this. Firstly I tried to melt and then dcast the dataset:

test.melt <- melt(test.data, id.vars = "date", measure.vars = c("site", "obs", "mod"))

However, it only returned half of the data, i.e., records of the site(s) (“city centre”) following the first one (“campus”) were all cut off:

date                   variable    value
2001-01-01 00:00:00    site        campus
2001-01-01 01:00:00    site        campus
2001-01-01 02:00:00    site        campus
2001-01-01 00:00:00    obs         NA
2001-01-01 01:00:00    obs         52
2001-01-01 02:00:00    obs         52
2001-01-01 00:00:00    mod         61.63
2001-01-01 01:00:00    mod         62.55
2001-01-01 02:00:00    mod         63.52

I then tried recast:

test.recast <- recast(test.data, date ~ site + obs + mod)

However, it returned with error message:

Error in eval(expr, envir, enclos) : object 'site' not found

I have tried to search for previous questions but have not found similar scenarios (correct me if I am wrong). Could someone please help me with this?

Many thanks in advance!

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-12T06:20:38+00:00Added an answer on June 12, 2026 at 6:20 am

    You might be better off using base R reshape after doing some variable name cleanup.

    Here’s your data.

    test <- read.table(header = TRUE, stringsAsFactors=FALSE,
    text = "date             site  obs    mod    site             obs    mod
    '2000-09-01 00:00:00'  campus   NA  61.63    'city centre'    66     56.69
    '2000-09-01 01:00:00'  campus   52  62.55    'city centre'    NA     54.75
    '2000-09-01 02:00:00'  campus   52  63.52    'city centre'    56     54.65")
    test
    #                  date   site obs   mod      site.1 obs.1 mod.1
    # 1 2000-09-01 00:00:00 campus  NA 61.63 city centre    66 56.69
    # 2 2000-09-01 01:00:00 campus  52 62.55 city centre    NA 54.75
    # 3 2000-09-01 02:00:00 campus  52 63.52 city centre    56 54.65
    

    If you did this correctly, you should get names like I got: as @chase mentions in his answer, “recurring column names is a bit of an oddity and is not normal R behaviour”–so we’ve got to fix that.

    Note: Both of these options generate a “time” variable which you can go ahead and drop. You might want to keep it just in case you wanted to reshape back into a wide format.

    • Option 1: If you got names like I did (which you should have), the solution is simple. For the first site, just append “0” to the site name and use base R reshape:

      names(test)[2:4] <- paste(names(test)[2:4], "0", sep=".")
      test <- reshape(test, direction = "long", 
                      idvar = "date", varying = 2:ncol(test))
      rownames(test) <- NULL # reshape makes UGLY rownames
      test
      #                  date time        site obs   mod
      # 1 2000-09-01 00:00:00    0      campus  NA 61.63
      # 2 2000-09-01 01:00:00    0      campus  52 62.55
      # 3 2000-09-01 02:00:00    0      campus  52 63.52
      # 4 2000-09-01 00:00:00    1 city centre  66 56.69
      # 5 2000-09-01 01:00:00    1 city centre  NA 54.75
      # 6 2000-09-01 02:00:00    1 city centre  56 54.65
      
    • Option 2: If you really do have duplicated column names, the fix is still easy, and follows the same logic. First, create nicer column names (easy to do using rep()), and then use reshape() as described above.

      names(test)[-1] <- paste(names(test)[-1], 
                               rep(1:((ncol(test)-1)/3), each = 3), sep = ".")
      test <- reshape(test, direction = "long", 
                      idvar = "date", varying = 2:ncol(test))
      rownames(test) <- NULL
      
      ### Or, more convenient:
      # names(test) <- make.unique(names(test))
      # names(test)[2:4] <- paste(names(test)[2:4], "0", sep=".")
      # test <- reshape(test, direction = "long", 
      #                 idvar = "date", varying = 2:ncol(test))
      # rownames(test) <- NULL
      
    • Optional step: The data in this form are still not totally “long”. If that is required, all that is required is one more step:

      require(reshape2)
      melt(test, id.vars = c("date", "site", "time"))
      #                   date        site time variable value
      # 1  2000-09-01 00:00:00      campus    0      obs    NA
      # 2  2000-09-01 01:00:00      campus    0      obs 52.00
      # 3  2000-09-01 02:00:00      campus    0      obs 52.00
      # 4  2000-09-01 00:00:00 city centre    1      obs 66.00
      # 5  2000-09-01 01:00:00 city centre    1      obs    NA
      # 6  2000-09-01 02:00:00 city centre    1      obs 56.00
      # 7  2000-09-01 00:00:00      campus    0      mod 61.63
      # 8  2000-09-01 01:00:00      campus    0      mod 62.55
      # 9  2000-09-01 02:00:00      campus    0      mod 63.52
      # 10 2000-09-01 00:00:00 city centre    1      mod 56.69
      # 11 2000-09-01 01:00:00 city centre    1      mod 54.75
      # 12 2000-09-01 02:00:00 city centre    1      mod 54.65
      

    Update (to try to address some questions from the comments)

    1. The reshape() documentation is quite confusing. It’s best to work through a few examples to get an understanding of how it works. Specifically, “time” does not have to refer to time (“date” in your problem), but is more for, say, panel data, where records are collected at different times for the same ID. In your case, the only “id” in the original data is the “date” column. The other potential “id” is the site, but not in the way the data are organized.

      Imagine, for a moment, if your data looked like this:

      test1 <- structure(list(date = structure(1:3, 
          .Label = c("2000-09-01 00:00:00", 
          "2000-09-01 01:00:00", "2000-09-01 02:00:00"), class = "factor"), 
          obs.campus = c(NA, 52L, 52L), mod.campus = c(61.63, 62.55, 
          63.52), obs.cityCentre = c(66L, NA, 56L), mod.cityCentre = c(56.69, 
          54.75, 54.65)), .Names = c("date", "obs.campus", "mod.campus", 
      "obs.cityCentre", "mod.cityCentre"), class = "data.frame", row.names = c(NA, 
      -3L))
      test1
      #                  date obs.campus mod.campus obs.cityCentre mod.cityCentre
      # 1 2000-09-01 00:00:00         NA      61.63             66          56.69
      # 2 2000-09-01 01:00:00         52      62.55             NA          54.75
      # 3 2000-09-01 02:00:00         52      63.52             56          54.65
      

      Now try reshape(test1, direction = "long", idvar = "date", varying = 2:ncol(test1)). You’ll see that reshape() sees the site names as “time” (that can be overridden by adding “timevar = "site"” to your reshape command).

      When direction = "long", you must specify which columns vary with “time”. In your case, that is all the columns except for the first, hence my use of 2:ncol(test) for “varying“.

    2. test2? Where is that?

    3. Question under @Chase’s answer: I think you misunderstand how melt() is supposed to work. Basically, it tries to get you the “skinniest” form of your data. In this case, the skinniest form would be the “optional step” described above since date + site would be the minimum required to comprise a unique ID variable. (I would say that “time” can safely be dropped.)

      Once your data are in the format described in the “optional step” (we’ll assume that the output has been stored as “test.melt“, you can always easily pivot the table around in different ways. As a demonstration of what I mean by that, try the following and see what they do.

      dcast(test.melt, date + site ~ variable)
      dcast(test.melt, date ~ variable + site)
      dcast(test.melt, variable + site ~ date)
      dcast(test.melt, variable + date ~ site)
      

      It is not easy to have that flexibility if you stop at “Option 1” or “Option 2”.


    Update (a few years later)

    melt from “data.table” can now “melt” multiple columns in a similar way that reshape does. It should work whether or not the column names are duplicated.

    You can try the following:

    measure <- c("site", "obs", "mod")
    melt(as.data.table(test), measure.vars = patterns(measure), value.name = measure)
    #                   date variable        site obs   mod
    # 1: 2000-09-01 00:00:00        1      campus  NA 61.63
    # 2: 2000-09-01 01:00:00        1      campus  52 62.55
    # 3: 2000-09-01 02:00:00        1      campus  52 63.52
    # 4: 2000-09-01 00:00:00        2 city centre  66 56.69
    # 5: 2000-09-01 01:00:00        2 city centre  NA 54.75
    # 6: 2000-09-01 02:00:00        2 city centre  56 54.65
    
    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I am relatively new to programming. My work basically revolves around data and analysis.
How to create a new data type for Go which to can check/validate its
I have a table into which new data is frequently inserted. I need to
I am a new bee in hadoop - big data analysis. I am referring
I'm going to be designing a simple data analysis tool which processes different kinds
I'm relatively new to emacs and bayesian data analysis and the JAGS software. I
I've been using Rx on a new financial analysis project that receives all data
still fairly new to matlab, picked up this data analysis code from someone and
When adding a new data connection in Server Explorer in VS 2010 (with Azure
I normally save new data to the database like this: $this->MyTable->set(array( 'id' => $id,

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.