I’ve been working with R just a few months, I have a problem with a zoo series with data at each five minutes. The are no missing time points in the series, but there are some NaN values on data.
>str(SerieCompleta)
‘zoo’ series from 2011-01-01 to 2011-12-31 23:55:00
Data: num [1:104737, 1] 0 0 0 0 0 0 0 0 0 0 ...
- attr(*, "na.action")=Class 'omit' num [1:383] 2017 3745 5761 6786 6787 ...
Index: POSIXct[1:104737], format: "2011-01-01 00:00:00" "2011-01-01 00:05:00" ...
I need to find the maximum of groups of data, and groups of data should be separated by thirty or more consecutive minutes with zero values.
2011-01-02 05:15:00 0
2011-01-02 05:20:00 0
2011-01-02 05:25:00 0
2011-01-02 05:30:00 0
2011-01-02 05:35:00 0.1 |
2011-01-02 05:40:00 0.2 <--- maximum of group
2011-01-02 05:45:00 0.2 |
2011-01-02 05:50:00 0.1 |
2011-01-02 05:55:00 0.1 |
2011-01-02 06:00:00 0.1 |
2011-01-02 06:05:00 0.1 |
2011-01-02 06:10:00 0 |
2011-01-02 06:15:00 0 |
2011-01-02 06:20:00 0.1 |
2011-01-02 06:25:00 0
2011-01-02 06:30:00 0
2011-01-02 06:35:00 0
2011-01-02 06:40:00 0 thirty or more consecutive minutes with zero values on data
2011-01-02 06:45:00 0
2011-01-02 06:50:00 0
2011-01-02 06:55:00 0
2011-01-02 07:00:00 0.2 |
2011-01-02 07:05:00 2.5 <--- maximum of group
2011-01-02 07:10:00 0
Output should look like:
2011-01-02 05:40:00 0.2
2011-01-02 07:05:00 2.5
I don’t know if there’s a way to do this using an R feature. Thanks in advance for any suggestion.
I’ll call your data column
x(xincludes only the numeric data, not the date and times). I’ll further assume that you have no missing time points and that all your time points are 5 minutes apart. Here is a function that will return a two-column matrix, where each row contains the start and end indices of your groups (it ignores zeroes in the beginning and end):For your data, you will get
Now simply apply the
which.maxfunction on your groups to get indices with the maximum values: