Example:
I have 100 samples for some certain time period. But I can use only 10 values to draw the line chart. What algorithm I can use to calculate those 10 representative values to let the chart look similar if I’d use all the 100 exact samples to draw it.
The naive algorithm which calculates average of every next 10 samples do not reflect the peaks in the chart very well.
You could use the Douglas-Peucker algorithm to obtain an optimal under-sampled representation.
The algorithm builds an under-sampled set starting with just the end points of the original data set. At each step the point in the original data set that’s “furthest” (of maximum error) from the under-sampled representation is added to the under-sampled set. In this way the algorithm includes the important peaks in the original data set and constructs an under-sampled representation of minimum error.
Since you’re only allowed 10 points in your under-sampled set you could setup the algorithm to only grow the under-sampled set to size 10.
If you have an original data set that includes too many peaks there’s no way that you’ll be able to capture them all and satisfy the size constraint.
Hope this helps.