I have the following problem. I have a data set that has the beginning (STRTTIME) and ending time (ENDTIME) of a trip in military time format. I want to figure out the number of trips in each 15 minute time increment. My goal is to determine the number of trips that take place in each 15 minute time period starting from 0000 to 2359 (96 time slices). I can write 96 dummy variables in excel and do it but I would rather have some code in either R or Python (I am learning both so my knowledge is rudimentary). I can put a counter and then increment but I am not sure how to deal with two time variables and find myself hitting a deadend. My example is below. Here is some sample data (in CSV format).
- Suppose a trip starts at 0805 and ends at 0840 then each 15 minute period will have following values:
- 0000-0015 – 0
- 0015-0030 – 0
- ….
- 0800-0815 – 2/3
- 0815-0830 – 1
- 0830-0845 – 2/3
- 0845-0900 – 0
- …
- 2330-2345 – 0
- 2345-2400 – 0
- Suppose another trip starts at 0810 and ends at 0850 then each 15 minute period will have the following values:
- 0000-0015 – 0
- 0015-0030 – 0
- ….
- 0800-0815 – 1/3
- 0815-0830 – 1
- 0830-0845 – 1
- 0845-0900 – 1/3
- …
- 2330-2345 – 0
- 2345-2400 – 0
- After processing these 2 records the values in the 15 minute period dummy fields will be as follows (i.e. it has incremented it by the value of the field in the previous record):
- 0000-0015 – 0
- 0015-0030 – 0
- ….
- 0800-0815 – 1
- 0815-0830 – 2
- 0830-0845 – 5/3
- 0845-0900 – 1/3
- …
- 2330-2345 – 0
- 2345-2400 – 0
Any code to do this is much appreciated.
As there is no answer in R yet, I will add one for that. I feel the solution might be a bit more elegant than python, but that is a matter of taste.
First, we will have to read the data:
Then, I would like to convert the times to decimal format. Therefore, I do use the hour and min provided and not the military format. That would not be a problem, though, as you could always convert the values using simple integer arithmetic.
Now generate the time intervals (which we will identify by their starttime)
That part is a bit tricky…
First we will check which trips end later than our interval end. All those trips we shall assign a 1, the trips that end before our interval we will assign a 0. If the trip ends within the interval, we will use assign the corresponding fraction between 0 and 1.
Notice the use of outer. Here, the function “-” (subtraction) is used for all combinations of endtimes and the intervals vector. All other operations are element wise. I suggest that you just test the operation step by step, then it should be obvious what is done.
Similarly, we will do this with the startintervals, but now we will use negative signs.
That enables us to generate a matrix that has a 1 whenever the interval is fully contained within the trip:
Finally, we may sum up over all trips and receive the number of trips within each interval: