Given a time series of sensor state intervals, how do I implement a classifier which learns from supervised training data to detect an incident based on a sequence of state intervals? To simplify the problem, sensor states are reduced to either true or false.
Update: I’ve found this paper (PDF) on Mining Sequences of Temporal Intervals which addresses a similar problem. Another paper (Google Docs) on Mining Hierarchical Temporal Patterns in Multivariate Time Series takes a novel approach, but deals with hierarchical data.
Example Training Data
The following data is a training example for an incident, represented as a graph over time, where /¯¯¯\ represents a true state interval and \___/ a false state interval for a sensor.
Sensor | Sensor State over time
| 0....5....10...15...20...25... // timestamp
---------|--------------------------------
A | ¯¯¯¯¯¯¯¯¯¯¯¯\________/¯¯¯¯¯¯¯¯
B | ¯¯¯¯¯\___________________/¯¯¯¯
C | ______________________________ // no state change
D | /¯\_/¯\_/¯\_/¯\_/¯\_/¯\_/¯\_/¯
E | _________________/¯¯¯¯¯¯¯¯\___
Incident Detection vs Sequence Labeling vs Classification
I initially generalised my problem as a two-category sequence labeling problem, but my categories really represented “normal operation” and a rare “alarm event” so I have rephrased my question as incident detection. Training data is available for “normal operation” and “alarm incident”.
To reduce problem complexity, I have discretized sensor events to boolean values, but this need not be the case.
Possible Algorithms
A hidden Markov model seems to be a possible solution, but would it be able to use the state intervals? If a sequence labeler is not the best approach for this problem, alternative suggestions would be appreciated.
Bayesian Probabilistic Approach
Sensor activity will vary significantly by time of day (busy in mornings, quiet at night). My initial approach would have been to measure normal sensor state over a few days and calculate state probability by time of day (hour). The combined probability of sensor states at an unlikely hour surpassing an “unlikelihood threshold” would indicate an incident. But this seemed like it would raise a false alarm if the sensors were noisy. I have not yet implemented this, but I believe that approach has merit.
Feature Extraction
Vector states could be represented as state interval changes occurring at a specific time and lasting a specific duration.
struct StateInterval
{
int sensorID;
bool state;
DateTime timeStamp;
TimeSpan duration;
}
eg. Some State Intervals from the process table:
[ {D, true, 0, 3} ]; [ {D, false, 4, 1} ]; ...
[ {A, true, 0, 12} ]; [ {B, true, 0, 6} ]; [ {D, true, 0, 3} ]; etc.
A good classifier would take into account state-value intervals and recent state changes to determine if a combination of state changes closely matches training data for a category.
Edit: Some ideas after sleeping on how to extract features from multiple sensors’ alarm data and how to compare it to previous data…
Start by calculating the following data for each sensor for each hour of the day:
- Average state interval length (for
trueandfalsestates) - Average time between state changes
- Number of state changes over time
Each sensor could then be compared to every other sensor in a matrix with data like the following:
- Average time taken for sensor B to change to a true state after sensor A did. If an average value is 60 seconds, then a 1-second wait would be more interesting than a 120-second wait.
- Average number of state changes sensor B underwent while sensor A was in one state
Given two sets of training data, the classifier should be able to determine from these feature sets which is the most likely category for classification.
Is this a sensible approach and what would be a good algorithm to compare these features?
Edit: the direction of a state change (false->true vs true-false) is significant, so any features should take that into account.
A simple solution would be collapse the time aspect of your data and take each timestamp as one instance. In this case, the values of the sensors are considered your feature vector, where each time step is labeled with a class value of category A or B (at least for the labeled training data):
This input data is fed to the usual classification algorithms (ANN, SVM, …), and the goal is to predict the class of unlabeled time series:
An intermediary step of dimensionality reduction / feature extraction could improve the results.
Obviously this may not be as good as modeling the time dynamics of the sequences, especially since techniques such as Hidden Markov Models (HMM) take into account the transitions between the various states.
EDIT
Based on your comment below, it seems that the best way to get less transitory predictions of the target class is to a apply a post-processing rule at the end of the prediction phase, and treating the classification output as a sequence of consecutive predictions.
The way this works is that you would compute the class posterior probabilities (ie: probability distribution that an instance belong to each class label, which in the case of binary SVM are easily derived from the decision function), then given a specified threshold, you check if the probability of the predicted class is above that threshold: if it is we use that class to predict the current timestamp, if not then we keep the previous prediction, and the same goes for future instances. This has the effect of adding a certain inertia to the current prediction.