I have a CSV file with timestamps and certain event-types which happened at this

Question

0

Asked: May 25, 20262026-05-25T20:08:53+00:00 2026-05-25T20:08:53+00:00

I have a CSV file with timestamps and certain event-types which happened at this

0

I have a CSV file with timestamps and certain event-types which happened at this time.
What I want is count the number of occurences of certain event-types in 6-minutes intervals.

The input-data looks like:

date,type
"Sep 22, 2011 12:54:53.081240000","2"
"Sep 22, 2011 12:54:53.083493000","2"
"Sep 22, 2011 12:54:53.084025000","2"
"Sep 22, 2011 12:54:53.086493000","2"

I load and cure the data with this piece of code:

> raw_data <- read.csv('input.csv')
> cured_dates <- c(strptime(raw_data$date, '%b %d, %Y %H:%M:%S', tz="CEST"))
> cured_data <- data.frame(cured_dates, c(raw_data$type))
> colnames(cured_data) <- c('date', 'type')

After curing the data looks like this:

> head(cured_data)
                 date type
1 2011-09-22 14:54:53    2
2 2011-09-22 14:54:53    2
3 2011-09-22 14:54:53    2
4 2011-09-22 14:54:53    2
5 2011-09-22 14:54:53    1
6 2011-09-22 14:54:53    1

I read a lot of samples for xts and zoo, but somehow I can’t get a hang on it.
The output data should look something like:

date                       type   count
2011-09-22 14:54:00 CEST   1      11
2011-09-22 14:54:00 CEST   2      19
2011-09-22 15:00:00 CEST   1      9
2011-09-22 15:00:00 CEST   2      12
2011-09-22 15:06:00 CEST   1      23
2011-09-22 15:06:00 CEST   2      18

Zoo’s aggregate function looks promising, I found this code-snippet:

# aggregate POSIXct seconds data every 10 minutes
tt <- seq(10, 2000, 10)
x <- zoo(tt, structure(tt, class = c("POSIXt", "POSIXct")))
aggregate(x, time(x) - as.numeric(time(x)) %% 600, mean)

Now I’m just wondering how I could apply this on my use case.

Naive as I am I tried:

> zoo_data <- zoo(cured_data$type, structure(cured_data$time, class = c("POSIXt", "POSIXct")))
> aggr_data = aggregate(zoo_data$type, time(zoo_data$time), - as.numeric(time(zoo_data$time)) %% 360, count)
Error in `$.zoo`(zoo_data, type) : not possible for univariate zoo series

I must admit that I’m not really confident in R, but I try. 🙂

I’m kinda lost. Could anyone point me into the right direction?

Thanks a lot!
Cheers, Alex.

Here the output of dput for a small subset of my data. The data itself is something around 80 million rows.

structure(list(date = structure(c(1316697885, 1316697885, 1316697885, 
1316697885, 1316697885, 1316697885, 1316697885, 1316697885, 1316697885, 
1316697885, 1316697885, 1316697885, 1316697885, 1316697885, 1316697885, 
1316697885, 1316697885, 1316697885, 1316697885, 1316697885, 1316697885, 
1316697885, 1316697885), class = c("POSIXct", "POSIXt"), tzone = ""), 
    type = c(2L, 2L, 2L, 2L, 1L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 
    1L, 2L, 1L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 2L)), .Names = c("date", 
"type"), row.names = c(NA, -23L), class = "data.frame")

Report

Leave an answer
Cancel reply

You must login to add an answer.

Need An Account,

1 Answer

Editorial Team · Answer 1 · 2026-05-25T20:08:54+00:00

We can read it using read.csv, convert the first column to a date time binned into 6 minute intervals and add a dummy column of 1’s. Then re-read it using read.zoo splitting on the type and aggregating on the dummy column:

# test data

Lines <- 'date,type
"Sep 22, 2011 12:54:53.081240000","2"
"Sep 22, 2011 12:54:53.083493000","2"
"Sep 22, 2011 12:54:53.084025000","2"
"Sep 22, 2011 12:54:53.086493000","2"
"Sep 22, 2011 12:54:53.081240000","3"
"Sep 22, 2011 12:54:53.083493000","3"
"Sep 22, 2011 12:54:53.084025000","3"
"Sep 22, 2011 12:54:53.086493000","4"'

library(zoo)
library(chron)

# convert to chron and bin into 6 minute bins using trunc
# Also add a dummy column of 1's 
# and remove any leading space (removing space not needed if there is none)

DF <- read.csv(textConnection(Lines), as.is = TRUE)
fmt <- '%b %d, %Y %H:%M:%S'
DF <- transform(DF, dummy = 1,
         date = trunc(as.chron(sub("^ *", "", date), format = fmt), "00:06:00"))

# split and aggregate

z <- read.zoo(DF, split = 2, aggregate = length)

With the above test data the solution looks like this:

> z
                    2 3 4
(09/22/11 12:54:00) 4 3 1

Note that the above has been done in wide form since that form constitutes a time series whereas the long form does not. There is one column for each type. In our test data we had types 2, 3 and 4 so there are three columns.

(We have used chron here since its trunc method fits well with binning into 6 minute groups. chron does not support time zones which can be an advantage since you can’t make one of the many possible time zone errors but if you want POSIXct anyways convert it at the end, e.g. time(z) <- as.POSIXct(paste(as.Date.dates(time(z)), times(time(z)) %% 1)) . This expression is shown in a table in one of the R News 4/1 articles except we used as.Date.dates instead of just as.Date to work around a bug that seems to have been introduced since then. We could also use time(z) <- as.POSIXct(time(z)) but that would result in a different time zone.)

EDIT:

The original solution binned into dates but I noticed afterwards that you wish to bin into 6 minute periods so the solution was revised.

EDIT:

Revised based on comment.

Sign Up

Sign In

Forgot Password

The Archive Base Latest Questions

I have a CSV file with timestamps and certain event-types which happened at this

Leave an answerCancel reply

1 Answer

Leave an answer
Cancel reply