I need to do some real-time data analysis to monitor for operational errors. More specifically, I’m controlling a winch on a buoy which is lowering an instrument package down through the water. I need to detect if it has hit the bottom, and stop it if it has. I’ve got the following data: depth of sensor, rate at which winch is unspooling. I get updates at 1Hz and the entire process lasts about 5 minutes. If the sensor hits the bottom, the depth value will usually slow dramatically and eventually stop
It can be assumed that under ideal circumstances the rate of descent is linear, but due to waves, there can be a fair amount of noise.
I came up with this method:
'''
The variables sensor_depth, winch_velocity and sample_time are assumed to be updated in the background
by another thread.
'''
import numpy as np
from time import sleep
x_data = []
y_data = []
running_size = 10
while winch_is_running():
if new_sample():
x_data.append(sample_time)
y_data.append(sensor_depth)
# Get the slope for the entire procedure
A = np.vstack([x_data,np.ones(len(x_data))])
overall_slope,offset = np.linalg.lstsq(A,y_data)[0]
# Get the slope for a recent set of samples
A = np.vstack([x_data[-1*running_size],np.ones(running_size)])
recent_slope,offset = np.linalg.lstsq(A,y_data[-1*running_size])[0]
if overall_slope - recent_slope > allowed_slope_error:
stop_winch()
else: time.sleep(.2)
Does this make sense, or is there a better way?
Here’s some sample data from current system. It wasn’t a particularly rough day, and there was no bottom strike. The current system uses a Motorola 68k based TattleTale controller runing their version of basic. The bottom strike algorithm just compares every x samples, and if the difference isn’t big enough, it stops. While this works, it is prone to false positives when it is rough, and has poor response in calm conditions:
Temp Cond Sal DO DEPTH Turb Chlor
11/11/10 15:00:19 14.24 18.44 10.97 2.53 0.092 0.5 13.5
11/11/10 15:00:20 14.24 18.44 10.97 2.53 0.126 0.7 9.5
11/11/10 15:00:21 14.24 18.45 10.97 2.53 0.132 0.6 13.0
11/11/10 15:00:22 14.24 18.44 10.96 2.53 0.152 0.6 8.6
11/11/10 15:00:23 14.24 18.44 10.96 2.53 0.139 0.7 13.6
11/11/10 15:00:24 14.24 18.44 10.97 2.52 0.120 0.7 13.5
11/11/10 15:00:25 14.24 18.44 10.97 2.52 0.128 1.4 7.1
11/11/10 15:00:26 14.24 18.44 10.96 2.52 0.128 0.6 7.9
11/11/10 15:00:27 14.24 18.44 10.97 2.52 0.141 0.9 12.4
11/11/10 15:00:28 14.24 18.44 10.97 2.51 0.135 1.3 12.7
11/11/10 15:00:29 14.24 18.44 10.96 2.51 0.145 1.3 12.8
11/11/10 15:00:30 14.24 18.44 10.96 2.51 0.163 0.6 4.8
11/11/10 15:00:31 14.24 18.44 10.96 2.51 0.213 0.9 3.9
11/11/10 15:00:32 14.24 18.44 10.97 2.51 0.211 0.6 7.1
11/11/10 15:00:33 14.24 18.44 10.96 2.51 0.241 0.7 6.9
11/11/10 15:00:34 14.24 18.44 10.96 2.51 0.286 0.5 9.8
11/11/10 15:00:35 14.24 18.44 10.96 2.51 0.326 0.6 9.0
11/11/10 15:00:36 14.24 18.44 10.96 2.51 0.358 0.7 3.3
11/11/10 15:00:37 14.24 18.44 10.96 2.51 0.425 0.7 13.1
11/11/10 15:00:38 14.24 18.43 10.96 2.51 0.419 0.8 5.3
11/11/10 15:00:39 14.24 18.44 10.96 2.51 0.495 1.2 7.4
11/11/10 15:00:40 14.24 18.44 10.96 2.50 0.504 0.7 16.1
11/11/10 15:00:41 14.24 18.44 10.96 2.50 0.558 0.5 11.9
11/11/10 15:00:42 14.24 18.44 10.96 2.50 0.585 0.8 8.8
11/11/10 15:00:43 14.24 18.44 10.96 2.50 0.645 0.8 9.7
11/11/10 15:00:44 14.24 18.44 10.96 2.50 0.654 0.6 5.2
11/11/10 15:00:45 14.24 18.44 10.96 2.50 0.694 0.5 9.5
11/11/10 15:00:46 14.24 18.44 10.96 2.50 0.719 0.7 5.9
11/11/10 15:00:47 14.24 18.44 10.96 2.50 0.762 0.9 7.2
11/11/10 15:00:48 14.24 18.44 10.96 2.50 0.815 1.0 11.1
11/11/10 15:00:49 14.24 18.44 10.96 2.50 0.807 0.6 8.7
11/11/10 15:00:50 14.24 18.44 10.96 2.50 0.884 0.4 0.4
11/11/10 15:00:51 14.24 18.44 10.96 2.50 0.865 0.7 13.3
11/11/10 15:00:52 14.25 18.45 10.97 2.49 0.917 1.2 7.3
11/11/10 15:00:53 14.24 18.45 10.97 2.49 0.964 0.5 4.8
11/11/10 15:00:54 14.25 18.44 10.97 2.49 0.967 0.6 9.7
11/11/10 15:00:55 14.25 18.44 10.97 2.49 1.024 0.5 8.1
11/11/10 15:00:56 14.25 18.45 10.97 2.49 1.042 1.0 14.3
11/11/10 15:00:57 14.25 18.45 10.97 2.49 1.074 0.7 6.0
11/11/10 15:00:58 14.26 18.46 10.97 2.49 1.093 0.9 9.0
11/11/10 15:00:59 14.26 18.46 10.98 2.49 1.145 0.7 9.1
11/11/10 15:01:00 14.26 18.46 10.98 2.49 1.155 1.7 8.6
11/11/10 15:01:01 14.25 18.47 10.98 2.49 1.205 0.7 8.8
11/11/10 15:01:02 14.25 18.48 10.99 2.49 1.237 0.8 12.9
11/11/10 15:01:03 14.26 18.48 10.99 2.49 1.248 0.7 7.2
11/11/10 15:01:04 14.27 18.50 11.00 2.48 1.305 1.2 9.8
11/11/10 15:01:05 14.28 18.50 11.00 2.48 1.328 0.7 10.6
11/11/10 15:01:06 14.29 18.49 11.00 2.48 1.367 0.6 5.4
11/11/10 15:01:07 14.29 18.51 11.01 2.48 1.387 0.8 9.2
11/11/10 15:01:08 14.30 18.51 11.01 2.48 1.425 0.6 14.1
11/11/10 15:01:09 14.31 18.52 11.01 2.48 1.456 4.0 11.3
11/11/10 15:01:10 14.31 18.52 11.01 2.47 1.485 2.5 5.3
11/11/10 15:01:11 14.31 18.51 11.01 2.47 1.490 0.7 5.2
11/11/10 15:01:12 14.32 18.52 11.01 2.47 1.576 0.6 6.6
11/11/10 15:01:13 14.32 18.51 11.01 2.47 1.551 0.7 7.7
11/11/10 15:01:14 14.31 18.49 10.99 2.47 1.627 0.6 7.3
11/11/10 15:01:15 14.29 18.47 10.98 2.47 1.620 0.7 11.5
11/11/10 15:01:16 14.28 18.48 10.99 2.48 1.659 0.8 7.0
11/11/10 15:01:17 14.27 18.49 10.99 2.48 1.682 1.4 14.4
11/11/10 15:01:18 14.26 18.49 11.00 2.48 1.724 1.0 2.9
11/11/10 15:01:19 14.27 18.52 11.01 2.48 1.756 0.8 13.5
11/11/10 15:01:20 14.28 18.52 11.01 2.47 1.752 5.3 11.7
11/11/10 15:01:21 14.29 18.52 11.02 2.47 1.841 0.8 5.8
11/11/10 15:01:22 14.30 18.52 11.01 2.47 1.789 1.0 5.5
11/11/10 15:01:23 14.31 18.52 11.01 2.47 1.868 0.7 6.8
11/11/10 15:01:24 14.31 18.52 11.02 2.47 1.848 0.8 7.8
11/11/10 15:01:25 14.32 18.52 11.01 2.47 1.896 0.3 8.3
11/11/10 15:01:26 14.32 18.52 11.01 2.47 1.923 0.9 4.8
11/11/10 15:01:27 14.32 18.51 11.01 2.47 1.936 0.5 6.4
11/11/10 15:01:28 14.32 18.52 11.01 2.46 1.960 0.9 10.0
11/11/10 15:01:29 14.31 18.52 11.01 2.46 1.996 0.6 10.7
11/11/10 15:01:30 14.31 18.52 11.01 2.47 2.024 1.7 11.8
11/11/10 15:01:31 14.31 18.52 11.01 2.47 2.031 1.0 11.7
11/11/10 15:01:32 14.31 18.53 11.02 2.46 2.110 1.3 5.4
11/11/10 15:01:33 14.32 18.52 11.01 2.46 2.067 0.6 12.2
11/11/10 15:01:34 14.32 18.52 11.01 2.46 2.144 0.4 6.4
11/11/10 15:01:35 14.32 18.51 11.01 2.46 2.148 1.0 4.6
11/11/10 15:01:36 14.33 18.51 11.01 2.46 2.172 0.9 9.6
11/11/10 15:01:37 14.33 18.52 11.01 2.46 2.221 1.0 6.5
11/11/10 15:01:38 14.33 18.51 11.01 2.46 2.219 0.3 7.6
11/11/10 15:01:39 14.33 18.51 11.01 2.46 2.278 1.2 8.1
11/11/10 15:01:40 14.32 18.51 11.01 2.46 2.258 0.5 0.6
11/11/10 15:01:41 14.32 18.52 11.01 2.46 2.329 1.2 8.2
11/11/10 15:01:42 14.31 18.51 11.01 2.46 2.321 1.1 9.6
11/11/10 15:01:43 14.31 18.51 11.01 2.46 2.382 1.0 5.3
11/11/10 15:01:44 14.31 18.51 11.01 2.46 2.357 0.7 8.5
11/11/10 15:01:45 14.31 18.52 11.01 2.46 2.449 0.4 10.3
11/11/10 15:01:46 14.31 18.52 11.01 2.46 2.430 0.6 10.0
11/11/10 15:01:47 14.31 18.52 11.01 2.46 2.472 0.6 11.3
11/11/10 15:01:48 14.31 18.52 11.01 2.45 2.510 1.2 8.5
11/11/10 15:01:49 14.31 18.51 11.01 2.45 2.516 0.7 9.5
11/11/10 15:01:50 14.31 18.52 11.01 2.45 2.529 0.5 9.6
11/11/10 15:01:51 14.31 18.52 11.01 2.45 2.575 0.7 8.2
11/11/10 15:01:52 14.31 18.51 11.01 2.46 2.578 0.5 9.4
11/11/10 15:01:53 14.31 18.51 11.01 2.46 2.592 0.8 5.5
11/11/10 15:01:54 14.30 18.51 11.01 2.46 2.666 0.6 7.1
11/11/10 15:01:55 14.30 18.51 11.01 2.46 2.603 0.7 11.5
11/11/10 15:01:56 14.29 18.52 11.01 2.45 2.707 0.9 7.2
11/11/10 15:01:57 14.29 18.52 11.01 2.45 2.673 0.7 9.2
11/11/10 15:01:58 14.28 18.52 11.01 2.45 2.705 0.7 6.4
11/11/10 15:01:59 14.28 18.52 11.01 2.45 2.720 1.3 6.8
11/11/10 15:02:00 14.28 18.52 11.02 2.45 2.778 0.7 7.5
11/11/10 15:02:01 14.27 18.52 11.02 2.45 2.724 0.5 8.0
11/11/10 15:02:02 14.27 18.51 11.01 2.45 2.840 0.9 10.0
11/11/10 15:02:03 14.26 18.52 11.02 2.45 2.758 0.8 6.4
11/11/10 15:02:04 14.26 18.52 11.01 2.46 2.874 0.4 9.7
11/11/10 15:02:05 14.24 18.53 11.02 2.46 2.824 1.1 10.8
11/11/10 15:02:06 14.24 18.53 11.02 2.46 2.896 1.0 8.8
11/11/10 15:02:07 14.22 18.53 11.02 2.47 2.903 0.6 16.3
11/11/10 15:02:08 14.22 18.54 11.03 2.45 2.912 0.9 9.6
11/11/10 15:02:09 14.21 18.54 11.02 2.45 2.949 0.8 6.6
11/11/10 15:02:10 14.20 18.54 11.03 2.45 2.964 1.4 8.4
11/11/10 15:02:11 14.19 18.55 11.03 2.46 2.966 3.0 12.9
11/11/10 15:02:12 14.17 18.55 11.03 2.45 3.020 1.0 7.5
11/11/10 15:02:13 14.15 18.56 11.04 2.45 3.000 1.1 9.5
11/11/10 15:02:14 14.14 18.56 11.04 2.45 3.064 0.9 6.5
11/11/10 15:02:15 14.13 18.56 11.04 2.45 3.037 1.3 8.2
11/11/10 15:02:16 14.13 18.57 11.04 2.45 3.097 1.3 7.7
11/11/10 15:02:17 14.12 18.57 11.05 2.45 3.128 1.5 8.4
11/11/10 15:02:18 14.11 18.58 11.05 2.45 3.104 1.7 7.0
11/11/10 15:02:19 14.10 18.58 11.05 2.45 3.190 1.2 10.2
11/11/10 15:02:20 14.10 18.58 11.05 2.44 3.141 5.8 9.9
11/11/10 15:02:21 14.09 18.60 11.06 2.44 3.199 1.4 4.7
11/11/10 15:02:22 14.07 18.60 11.07 2.44 3.208 1.6 9.4
11/11/10 15:02:23 14.06 18.60 11.07 2.44 3.199 2.1 6.2
11/11/10 15:02:24 14.06 18.62 11.08 2.43 3.259 3.0 9.3
11/11/10 15:02:25 14.05 18.63 11.08 2.43 3.228 1.6 8.9
11/11/10 15:02:26 14.06 18.63 11.08 2.43 3.289 1.6 3.5
11/11/10 15:02:27 14.05 18.64 11.09 2.43 3.278 1.8 2.2
11/11/10 15:02:28 14.05 18.64 11.09 2.43 3.307 2.2 9.7
11/11/10 15:02:29 14.04 18.64 11.09 2.43 3.315 2.3 5.5
11/11/10 15:02:30 14.04 18.65 11.10 2.43 3.367 2.1 5.1
11/11/10 15:02:31 14.03 18.65 11.10 2.43 3.297 2.5 8.5
11/11/10 15:02:32 14.03 18.65 11.10 2.41 3.419 1.9 6.8
11/11/10 15:02:33 14.03 18.65 11.10 2.41 3.347 2.1 4.0
11/11/10 15:02:34 14.03 18.66 11.10 2.41 3.405 2.0 11.8
11/11/10 15:02:35 14.03 18.67 11.11 2.41 3.420 2.4 10.6
11/11/10 15:02:36 14.03 18.67 11.11 2.39 3.369 2.7 10.5
11/11/10 15:02:37 14.02 18.67 11.11 2.39 3.402 1.6 9.1
11/11/10 15:02:38 14.02 18.66 11.11 2.39 3.408 1.9 8.5
11/11/10 15:02:39 14.02 18.67 11.11 2.39 3.362 4.2 7.0
11/11/10 15:02:40 14.02 18.67 11.11 2.38 3.421 2.3 12.1
11/11/10 15:02:41 14.02 18.67 11.11 2.38 3.371 2.6 14.7
11/11/10 15:02:42 14.02 18.67 11.11 2.38 3.409 3.0 6.5
11/11/10 15:02:43 14.02 18.67 11.11 2.38 3.368 2.3 2.5
11/11/10 15:02:44 14.02 18.67 11.11 2.37 3.434 2.5 10.2
11/11/10 15:02:45 14.02 18.67 11.11 2.37 3.346 1.6 4.5
It was not a very interesting day from a data perspective either.
Your approach (comparing current derivative to mean derivative) is good, but could be improved. Most importantly, you really need to see your data before deciding how to analyze it:
These plots show:
A) Your original data. Note: the rate of descent is not constant due to the spool diameter changing, and thus a linear regression is probably not optimal. Also, there is extra data at the beginning while the spool is stopped that could throw off your slope measurements.
B) The derivative of your data. This is the data you are using to do your detection. Tell me: can you easily see the region where the average slope goes to zero?
C) The FFT of your data, showing a lot of power in the upper-half of the frequency range–this is where your noise lies. Since the noise only occupies the upper half of the frequency range, it should be fairly easy to filter out.
D) Your data after going through a gaussian lowpass filter with sigma=1.0 (scipy.ndimage.gaussian_filter(data, 1.0))
E) The derivative of D (much easier to see the bottom in this data)
F) The power spectrum of E, showing noise mostly removed.
So by filtering a little, it becomes fairly easy to visually detect the bottom. The questions, then, are 1) how to translate ‘visual detection’ into a reliable algorithm and 2) how to determine the optimal value of sigma. If it is too small, then the noise gets in your way. If it is too large, then the spool may run too long. The only way to answer either question is empirically–pull out as many of these data sets as you can and try new ideas until you get one that works for most, if not all of your data sets.
My first approach would be something like:
– Low-pass data as it arrives, using empirically pre-determined parameters. If this parameter is selected correctly, it should only be necessary to consider the very last data point that has arrived.
– trigger when you find a point that is close to zero, within some (empirically determined) threshold.
There’s a lot you could do to make this more ‘clever’, like automatically selecting a threshold based on prior noise. However, such tricks can be quite difficult to implement properly because they can be fooled by unexpected input. You are almost always better off applying what you already know about the data rather than asking the computer to guess for you.