I have a very big data frame consisting of data like this:
PENR ANFDAT ENDDAT
1 1110 1990-02-01 1998-10-29
2 1981 1998-02-19 1998-02-20
3 6317 1994-11-01 1999-06-30
4 2039 1998-12-01 1999-04-30
(get it from here)
df <- structure(list(PENR = c(1110L, 1981L, 6317L, 2039L), ANFDAT = structure(c(7336, 10276, 9070, 10561), class = "Date"), ENDDAT = structure(c(10528, 10277, 10772, 10711), class = "Date")), .Names = c("PENR", "ANFDAT", "ENDDAT"), row.names = c(1L, 2L, 3L, 4L), class = "data.frame")
ANFDAT stands for the beginning of a certain status, ENDDAT stends for the termination of this status
I want to display this data as a bar chart. Each bar should stand for a date. the height of each bar should represent the number of records (PENR) having the status on this date.
As the data frame is very big I look for an efficient way to achieve this.
[EDIT]
Seems like my question is misleading. Here’s what I try to do:
-
Generate a data frame for each date from
min(df$ANFDAT)tomax(df$ENDDAT). This can be done easily withdf1 <- data.frame(DATE = seq(min(df$ANFDAT), max(df$ENDDAT), by = "day")) - For each row in
df1$DATE, count the number of records indfthat haveANFDAT >= DATEandENDDAT <= DATE. Store the results indf1$RECORDS -
Generate a barplot out of
df1. This probably can be done like that (untested)df2 <- df1$RECORDS row.names(df2) <- df1$DATE barplot(df2$RECORDS)
My problem is to find an efficient way to do #2
You could use
sapplyover the unique dates to count the number of records.The whole procedure: