I am analyzing data from a wind turbine, normally this is the sort of thing I would do in excel but the quantity of data requires something heavy-duty. I have never used R before and so I am just looking for some pointers.
The data consists of 2 columns WindSpeed and Power, so far I have arrived at importing the data from a CSV file and scatter-plotted the two against each other.
What I would like to do next is to sort the data into ranges; for example all data where WindSpeed is between x and y and then find the average of power generated for each range and graph the curve formed.
From this average I want recalculate the average based on data which falls within one of two standard deviations of the average (basically ignoring outliers).
Any pointers are appreciated.
For those who are interested I am trying to create a graph similar to this. Its a pretty standard type of graph but like I said the shear quantity of data requires something heavier than excel.
Throw this version, similar in motivation as @hadley’s, into the mix using an additive model with an adaptive smoother using package
mgcv:Dummy data first, as used by @hadley
Fit the additive model using
gam(), using an adaptive smoother and smoothness selection via REMLPredict from our model and get standard errors of fit, use latter to generate an approximate 95% confidence interval
Plot everything and the Loess fit for comparison