I have a data set of the form:
df <- data.frame(var1 = c("1976-07-04" , "1980-07-04" , "1984-07-04" ),
var2 = c('d', 'e', 'f'),
freq = 1:3)
I can expand this data.frame very quickly using indexing by:
df.expanded <- df[rep(seq_len(nrow(df)), df$freq), ]
I however want to have create a sequence instead of a replicate on the date and have the freq tell me the length of the this. i.e for row 3 i can create the entries to fill the exploded data.frame with:
seq(as.Date('1984-7-4'), by = 'days', length = 3)
Can anyone suggest a fast method for doing this? My method is to use various lapply functions to do this
I used a combination of Gavin Simpson’s answer and a previous idea for my solution.
ExtendedSeq <- function(df, freq.col, date.col, period = 'month') {
#' An R function to take a data fame that has a frequency col and explode the
#' the dataframe to have that number of rows and based on a sequence.
#' Args:
#' df: A data.frame to be exploded.
#' freq.col: A column variable indicating the number of replicates in the
#' new dataset to make.
#' date.col: A column variable indicating the name or position of the date
#' variable.
#' period: The periodicity to apply to the date.
# Replicate expanded data form
df.expanded <- df[rep(seq_len(nrow(df)), df[[freq.col]]), ]
DateExpand <- function(row, df.ex, freq, col.date, period) {
#' An inner functions to explode a data set and build out days sequence
#' Args:
#' row: Each row of a data set
#' df.ex: A data.frame, to expand
#' freq: Column indicating the number of replicates to make.
#' date: Column indicating the date variable
#' Output:
#' An exploded data set based on a sequence expansion of a date.
times <- df.ex[row, freq]
# period <- can edit in the future if row / data driven.
date.ex <- seq(df.ex[row, col.date], by = "days", length = times)
return(date.ex)
}
dates <- lapply(seq_len(nrow(df)),
FUN = DateExpand,
df.ex = df,
freq = freq.col,
col.date = date.col,
period = period)
df.expanded[[date.col]] <- as.Date(unlist(dates), origin = '1970-01-01')
row.names(df.expanded) <- NULL
return(df.expanded)
}
Personally i dont like the way i need to covert the dates back from the list and supply an origin based on this conversion in case this changes in teh future, but i really appreciate the ideas and help
Here is one way:
This iterates over the indices
1:nrow(df)(i.e. the row indices ofdf) applying the in-line functionfooto each row ofdf.foo()essentially just extendsvar2andfreqafreqnumber of times and uses yourseq()call for extendingvar1. The function makes some assumptions about the column orderings, names etc but you can modify that should you wish.The only other bit is that it is far more efficient to convert
var1to a"Date"object all in one rather than for each row in turn inextendDF(), hence first do a single conversion, here usingtransform():then call
extendDF()This gives: