Sign Up

Sign Up to our social questions and Answers Engine to ask questions, answer people’s questions, and connect with other people.

Have an account? Sign In

Have an account? Sign In Now

Sign In

Login to our social questions & Answers Engine to ask questions answer people’s questions & connect with other people.

Sign Up Here

Forgot Password?

Don't have account, Sign Up Here

Forgot Password

Lost your password? Please enter your email address. You will receive a link and will create a new password via email.

Have an account? Sign In Now

You must login to ask a question.

Forgot Password?

Need An Account, Sign Up Here

Please briefly explain why you feel this question should be reported.

Please briefly explain why you feel this answer should be reported.

Please briefly explain why you feel this user should be reported.

Sign InSign Up

The Archive Base

The Archive Base Logo The Archive Base Logo

The Archive Base Navigation

  • SEARCH
  • Home
  • About Us
  • Blog
  • Contact Us
Search
Ask A Question

Mobile menu

Close
Ask a Question
  • Home
  • Add group
  • Groups page
  • Feed
  • User Profile
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Buy Points
  • Users
  • Help
  • Buy Theme
  • SEARCH
Home/ Questions/Q 7738565
In Process

The Archive Base Latest Questions

Editorial Team
  • 0
Editorial Team
Asked: June 1, 20262026-06-01T08:17:56+00:00 2026-06-01T08:17:56+00:00

I’m struggling with the following. If have a (big) data frame with the following:

  • 0

I’m struggling with the following.

If have a (big) data frame with the following:

  • several columns for which the combination of columns is a ‘unique’ combination, say ID
  • a time related column
  • a measure related column

I want to make sure that for each unique ID for each time interval a measure is available in the data frame. And if it is not, I want to add a 0 (or NA) measure for that time/ID.

To illustrate the problem, create the following test data frame:

test <- data.frame(
    YearWeek   =rep(c("2012-01","2012-02"),each=4),
    ProductID  =rep(c(1,2), times=4),
    CustomerID =rep(c("a","b"), each=2, times=2),
    Quantity   =5:12
)[1:7,]

  YearWeek ProductID CustomerID Quantity
1  2012-01         1          a        5
2  2012-01         2          a        6
3  2012-01         1          b        7
4  2012-01         2          b        8
5  2012-02         1          a        9
6  2012-02         2          a       10
7  2012-02         1          b       11

The 8th row is left out, on purpose. This way I simulate a ‘missing value’ (missing Quantity) for ID ‘2-b’ (ProductID-CustomerID) for the time value “2012-02”.

What I want to do is adjust the data.frame in such a way that for all time values (these are known, in this example just “2012-01” and “2012-02”), for all ID-combinations (these are not known upfront, but this is ‘all unique ID combinations in the data frame’, thus the unique set on the ID columns), a Quantity is available in the data frame.

This should result for this example (if we choose NA for the missing value, typically I want to have control on that):

  YearWeek ProductID CustomerID Quantity
1  2012-01         1          a        5
2  2012-01         2          a        6
3  2012-01         1          b        7
4  2012-01         2          b        8
5  2012-02         1          a        9
6  2012-02         2          a       10
7  2012-02         1          b       11
8  2012-02         2          b       NA

The ultimate goal is to create time series for these ID combinations and I therefore want to have Quantities for all time values. I need to do different aggregations (on time) and using different levels of ID’s from a big dataset

I tried several things, for instance with melt and cast from the reshape package. But so far I didn’t manage to do it. The next step is creating a function, with for-loops etc. but that is not really useful from a performance perspective.

Maybe there is an easier way to create time series instantly, giving a data.frame like test. Does anybody have an idea on this one??

Thanks in advance!

Note that in the actual problem there are more than two ‘ID columns’.


EDIT:

I should describe the problem further. There is a difference between the ‘time’ column and the ‘ID’ columns. The first (and great!) answer on the question by joran, maybe didn’t get a clear understanding from what I want (and the example I gave didn’t made the difference clear). I said above:

for all ID-combinations (these are not known upfront, but this is ‘all
unique ID combinations in the data frame’, thus the unique set on the
ID columns)

So I do not want ‘all possible ID combinations’ but ‘all ID combinations within the data’.
For each of those combinations I want a value for every unique time-value.

Let me make it clear by expanding test to test2, as follows

> test2 <- rbind(test, c("2012-02", 3, "a", 13))
> test2
  YearWeek ProductID CustomerID Quantity
1  2012-01         1          a        5
2  2012-01         2          a        6
3  2012-01         1          b        7
4  2012-01         2          b        8
5  2012-02         1          a        9
6  2012-02         2          a       10
7  2012-02         1          b       11
8  2012-02         3          a       13

Which means I want in the resulting data frame no ‘3-b’ ID combination, because this combination is not within test2. If I use the method of the first answer I will get the following:

> vals2 <- expand.grid(YearWeek = unique(test2$YearWeek),
                       ProductID = unique(test2$ProductID),
                       CustomerID = unique(test2$CustomerID))

> merge(vals2,test2,all = TRUE)
   YearWeek ProductID CustomerID Quantity
1   2012-01         1          a        5
2   2012-01         1          b        7
3   2012-01         2          a        6
4   2012-01         2          b        8
5   2012-01         3          a     <NA>
6   2012-01         3          b     <NA>
7   2012-02         1          a        9
8   2012-02         1          b       11
9   2012-02         2          a       10
10  2012-02         2          b     <NA>
11  2012-02         3          a       13
12  2012-02         3          b     <NA>

So I don’t want the rows 6 and 12 to be here.

To overcome this problem I found a solution in the one below. In here I split the ‘unique time column’ and the ‘unique ID combination’. The difference with above is thus the word ‘combination’ and not unique for every ID column.

> temp_merge <- merge(unique(test2["YearWeek"]),
                      unique(test2[c("ProductID", "CustomerID")]))

> merge(temp_merge,test2,all = TRUE)
   YearWeek ProductID CustomerID Quantity
1   2012-01         1          a        5
2   2012-01         1          b        7
3   2012-01         2          a        6
4   2012-01         2          b        8
5   2012-01         3          a     <NA>
6   2012-02         1          a        9
7   2012-02         1          b       11
8   2012-02         2          a       10
9   2012-02         2          b     <NA>
10  2012-02         3          a       13

What are the comments on this one?

Is this an elegant way, or are there better ways?

  • 1 1 Answer
  • 0 Views
  • 0 Followers
  • 0
Share
  • Facebook
  • Report

Leave an answer
Cancel reply

You must login to add an answer.

Forgot Password?

Need An Account, Sign Up Here

1 Answer

  • Voted
  • Oldest
  • Recent
  • Random
  1. Editorial Team
    Editorial Team
    2026-06-01T08:17:58+00:00Added an answer on June 1, 2026 at 8:17 am

    Use expand.grid and merge:

    vals <- expand.grid(YearWeek = unique(test$YearWeek),
                        ProductID = unique(test$ProductID),
                        CustomerID = unique(test$CustomerID))
    > merge(vals,test,all = TRUE)
      YearWeek ProductID CustomerID Quantity
    1  2012-01         1          a        5
    2  2012-01         1          b        7
    3  2012-01         2          a        6
    4  2012-01         2          b        8
    5  2012-02         1          a        9
    6  2012-02         1          b       11
    7  2012-02         2          a       10
    8  2012-02         2          b       NA
    

    The NAs can be replaced after the fact with whatever values you choose using subsetting and is.na.

    • 0
    • Reply
    • Share
      Share
      • Share on Facebook
      • Share on Twitter
      • Share on LinkedIn
      • Share on WhatsApp
      • Report

Sidebar

Related Questions

I have an array which has BIG numbers and small numbers in it. I
I have an autohotkey script which looks up a word in a bilingual dictionary
I have a text area in my form which accepts all possible characters from
I want to construct a data frame in an Rcpp function, but when I
Let's say I'm outputting a post title and in our database, it's Hello Y&#8217;all
I have a string like this: La Torre Eiffel paragonata all&#8217;Everest What PHP function
I have a .ini file as follows: [playlist] numberofentries=2 File1=http://87.230.82.17:80 Title1=(#1 - 365/1400) Example
link Im having trouble converting the html entites into html characters, (&# 8217;) i
I have just tried to save a simple *.rtf file with some websites and
I am trying to understand how to use SyndicationItem to display feed which is

Explore

  • Home
  • Add group
  • Groups page
  • Communities
  • Questions
    • New Questions
    • Trending Questions
    • Must read Questions
    • Hot Questions
  • Polls
  • Tags
  • Badges
  • Users
  • Help
  • SEARCH

Footer

© 2021 The Archive Base. All Rights Reserved
With Love by The Archive Base

Insert/edit link

Enter the destination URL

Or link to existing content

    No search term specified. Showing recent items. Search or use up and down arrow keys to select an item.