I am doing some research into methods of comparing time series data. One of the algorithms that I have found being used for matching this type of data is the DTW (Dynamic Time Warping) algorithm.
The data I have, resemble the following structure (this can be one path):
Path Event Time Location (x,y)
1 1 2:30:02 1,5
1 2 2:30:04 2,7
1 3 2:30:06 4,4
...
...
Now, I was wondering whether there are other algorithms that would be suitable to find the closest match for the given path.
If two paths are the same length, say n, then they are really points in an 2n-dimensional space. The first location determines the first two dimensions, the second location determines the next two dimensions, and so on. For example, if we just take the three points in your example, the path can be represented as the single 6-dimensional point (1, 5, 2, 7, 4, 4). If we want to compare this to another three-point path, we can compute either the Euclidean distance (square root of the sum of squares of per-dimension distances between the two points) or the Manhattan distance (sum of the per-dimension differences).
For example, the boring path that stays at (0, 0) for all three times becomes the 6-dimensional point (0, 0, 0, 0, 0, 0). Then the Euclidean distance between this point and your example path is
sqrt((1-0)^2 + (5-0)^2 + (2-0)^2 + (7-0)^2 + (4-0)^2 + (4-0)^2) = sqrt(111) = 10.54. The Manhattan distance isabs(1-0) + abs(5-0) + abs(2-0) + abs(7-0) + abs(4-0) + abs(4-0) = 23. This kind of a difference between the metrics is not unusual, since the Manhattan distance is provably at least as great as the Euclidean distance.Of course one problem with this approach is that not all paths will be of the same length. However, you can easily cut off the longer path to the same length as the shorter path, or consider the shorter of the two paths to stay at the same location or moving in the same direction after measurements end, until both paths are the same length. Either approach will introduce some inaccuracies, but no matter what you do you have to deal with the fact that you are missing data on the short path and have to make up for it somehow.
EDIT:
Assuming that
path1andpath2are bothList<Tuple<int, int>>objects containing the points, we can cut off the longer list to match the shorter list as:Then, you can use the following code to find the Manhattan distance:
With the same assumptions as for the Manhattan distance, we can generate the Euclidean distance as: