I have developed a system in R for graphing large datasets obtained from wind turbines. I am now porting the process into Java. The results I get between the two systems are inconsistent.
As shown below:
- The dataset is first plotted using using R, and secondly using JFreeChart.
- The red line in both graphs correspond to my respective calculations in each language (which are detailed below).
- The brown dashed line in #1 corresponds to the blue line in #2, there are no discrepancies here, they are provided for reference
- The shaded area represent the data points, grey in #1 and red in #2.


I can explain the discrepancies between the (red) calculated lines and that is due to the fact that I am using different calculation methods.
In R the data is processed as follows, I wrote this code with a little help and have no idea what is going on here (but hey, it works).
df <- data.frame(pwr = pwr, spd = spd)
require(mgcv)
mod <- gam(pwr ~ s(spd, bs = "ad", k = 20), data = df, method = "REML")
summary(mod)
x_grid <- with(df, data.frame(spd = seq(min(spd) + 0.0001, maxi, length=100)))
pred <- predict(mod, x_grid, se.fit = TRUE)
x_grid <- within(x_grid, fit <- pred$fit)
lines(fit ~ spd, data = x_grid, col = "red", lwd = thickLineWidth)
In Java (SQL infact) I am using the method of bins to calculate the average at every 0.5 on the x-axis. The resulting data is plotted using a org.jfree.chart.renderer.xy.XYSplineRenderer I do not know too much about how the line is rendered.
SELECT
ROUND( ROUND( x_data * 2 ) / 2, 1) AS x_axis, # See https://stackoverflow.com/questions/5230647/mysql-rounding-functions
AVG( y_data ) AS y_axis
FROM
table
GROUP BY
x_axis
My take on the variance between the two graphs:
- Presence of a single outlier at 18 on the x_axis (most visible on the R graph) seems to have an enormous impact on the shape of the curve.
- Even between 5 and 15 on the x-axis it seems that the line in the R graph is more continuous, it doesn’t change trajectory as readily as that produced by Java.
- The ‘crater’ evident at 18 on the java x-axis has to ‘mounds’ each side of it, I believe this is due to polynomial effects in the rendering system.
These are things that I would like to eliminate.
So in an effort to understand the difference between the two graphs I have a few questions:
- Exactly what is going on in my R script?
- How can I, or, do I want to port the same process to my Java code?
- Can anyone explain the spline system used by JFreeCharts, is there another?
In the R code, you are (well I was when I showed the example) fitting an additive model to the power and speed data, where the relationship between the variables is determined from the data themselves. These models involve the use of splines to estimate the response function. In particular here I used an adaptive smoother with
k = 20the complexity of the smoother fitting. The more complex the smoother, the more wiggly the fitted function can be. An adaptive smoother is one where the degree of smoothness varies across the fitted function.Why is this important? Well, from your data, there are periods where the response does not vary with the speed variable, and periods where the response changes rapidly with a change in speed. We have an “allowance” of wigglyness to use up over the curve. With ordinary splines the wigglyness (or smoothness) is the same across the entire function. With an adaptive smooth we can use more of our wigglyness allowance in the parts of the function where the response is changing/varying most, and not spend any of the allowance where it is not needed in the parts where the response isn’t changing.
Below I annotate the code to explain what is being done at each step:
If you are not familiar with additive models and smoothers/splines then I recommend Ruppert, Wand and Carroll (2003) Semiparametric Regression. Cambridge University Press.