I have an interesting puzzle. Suppose you have a numpy 2D array, in which each line corresponds to a measurement event and each column corresponds to different measured variable. One additional column in this array specifies the date at which the measurement was taken. The lines are sorted according to the time stamp. There are several (or many) measurements on each day. The goal is to identify the lines that correspond to a new day and subtract the respective values from the subsequent lines in that day.
I approach this problem by a loop that loops over the days, creating a boolean vector that selects the proper lines and then subtracting the first selected line. This approach works, but feels non-elegant. Are there better ways to do this?
Just a small example. The lines below define a matrix in which the first colum
is the day and the remaining two are the measured values
before = array([[ 1, 1, 2],
[ 1, 3, 4],
[ 1, 5, 6],
[ 2, 7, 8],
[ 3, 9, 10],
[ 3, 11, 12],
[ 3, 13, 14]])
at the end of the process I expect to see the following array:
array([[1, 0, 0],
[1, 2, 2],
[1, 4, 4],
[2, 0, 0],
[3, 0, 0],
[3, 2, 2],
[3, 4, 4]])
PS Please help me finding a better and more informative title for this post. I’m out of ideas
numpy.searchsortedis a convenient function for this:Longer explanation
If you take the first column, and search for itself you get the minimum indices for those particular values:
You can then use this to construct the matrix that you will subtract by indexing:
You need to make the first column
0so that they won’t be subtracted.Finally, subtract two matrices to get the desired output: